user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information...

70
1 COSMIN Risk of Bias tool to assess the quality of studies on reliability and measurement error of outcome measurement instrument user manual Version 1.0 dated January 2021 Lidwine B Mokkink Maarten Boers Cees van der Vleuten Donald L Patrick Jordi Alonso Lex M Bouter Henrica CW de Vet Caroline B Terwee Contact LB Mokkink, PhD Amsterdam UMC, Vrije Universiteit Amsterdam, Department of Epidemiology and Data Science Amsterdam Public Health research institute De Boelelaan 1117, 1081 BT Amsterdam The Netherlands Website: www.cosmin.nl Email: [email protected]

Transcript of user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information...

Page 1: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

1

COSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerrorofoutcomemeasurementinstrument

usermanual

Version1.0datedJanuary2021

LidwineBMokkinkMaartenBoers

CeesvanderVleutenDonaldLPatrickJordiAlonsoLexMBouter

HenricaCWdeVetCarolineBTerwee

ContactLBMokkink,PhDAmsterdamUMC,VrijeUniversiteitAmsterdam,DepartmentofEpidemiologyandDataScienceAmsterdamPublicHealthresearchinstituteDeBoelelaan1117,1081BTAmsterdamTheNetherlandsWebsite:www.cosmin.nlE‐mail:[email protected]

Page 2: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

2

ThedevelopmentoftheCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerrorwaspartoftheVENIprogrammewithprojectnumber91617098,fundedbyZonMw(TheNetherlandsOrganisationforHealthResearchandDevelopment).

Page 3: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

3

TableofContent

Foreword 5

1. Backgroundinformation 6

1.1 COSMINinitiativeandsteeringcommittee 6

1.2Howtocitethismanual 7

1.3DevelopmentoftheCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerror

7

1.4 Definitionsofreliabilityandmeasurementerror 7

1.5 FocusoftheCOSMINRiskofBiastool 8

1.6ThestructureoftheCOSMINRiskofBiastool 10

1.7 The“worst‐score‐counts”method 10

1.8 Relevanceoftheresearchquestion 11

1.9 UsingtheCOSMINRiskofBiastoolinasystematicreview 11

1.10Expertiserequiredforusingthetool 12

1.11UsingtheCOSMINRiskofBiastooltoassessstudiesonPROMsorObsROMs

12

1.12ARiskofBiastoolisnotastudydesignchecklist,norareportinggiudeline

13

2. PartA.Understandinghowastudyinformsusaboutthereliabilityandmeasurementerrorofanoutcomemeasurementinstrument

14

2.1 Componentsofoutcomemeasurementinstruments 14

2.2 Extractingtheelementsofacomprehensiveresearchquestion 20

2.3 ExampleofhowtousePartAoftheCOSMINRiskofBiastooltoassessthequalityofastudybySkeieetal.(2015)

27

3. PartB.Assessingtheriskofbiasofastudyonreliabilityormeasurementerror

31

3.1Elaborationonstandardsforstudiesonreliability 33

3.2Elaborationonstandardsforstudiesonmeasurementerror 40

3.3ExampleofhowtousePartBoftheCOSMINRiskofBiastooltoassessthequalityofastudybySkeieetal.(2015)

45

4. UsingtheCOSMINRiskofBiastoolinasystematicreviewofoutcomemeasurementinstruments

47

4.1Theeleven‐stepprocedureforconductingasystematicreviewofClinROMs,PerFOMs,orlaboratoryvalues

50

Page 4: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

4

Appendix1.DataExtractiontableofrelevantinformationforeachincludedstudyinasystematicreview.

60

Appendix2.RiskofBiasratingsperstandardperstudy 62

Appendix3.ExampleofaFlow‐chart 63

Appendix4.Exampleofreportingtableoncharacteristicsoftheincludedmeasurementinstruments.

64

Appendix5.Exampleofreportingtableoncharacteristicsofthestudypopulations. 65

Appendix6.OverviewTableofqualityandresultsofstudiesonreliabilityandmeasurementerror.

66

Appendix7.SummaryofFindingsTablesforReliabilityandMeasurementerror. 67

References 68

Page 5: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

5

ForewordTheCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerrorwasdevelopedtotransparentlyandsystematicallyassessthemethodologicalqualityofstudiesonreliabilityandmeasurementerrorofalltypesofoutcomemeasurementinstruments.ItisanextendedversionoftheCOSMINRiskofBiaschecklistfortheboxesreliabilityandmeasurementerrorforPROMs(1).Itwasdevelopedforclinician‐reportedoutcomemeasures(ClinROMs)(includinge.g.readingsbasedonimagingmodalitiesandratingsbasedonobservations),performance‐basedoutcomemeasurementinstruments(PerFOMs),orbiomarkers–alsocalledlaboratoryvalues(2,3).ThesemeasurementinstrumentsaremorecomplexthanPROMs,asnotonlypatientsareinvolved,butalsoprofessionals,andsometimes(complex)devices.Specificallyinstudiesonreliabilityandmeasurementerrortheseadditionalsourcesofvariationcomplicatethedesignofthesestudiesandmayinfluencetheirquality.Asdifferentsourcesofvariationcanplayarole,differentstudiescanbeconductedtoassessthereliabilityormeasurementerrorofanoutcomemeasurementinstrument.Toassessthequalityofsuchastudy,oneshouldunderstand(1)howtheresultsofapublishedstudyonreliabilityormeasurementerrorinformusaboutthereliabilityandmeasurementerroroftheoutcomemeasurementinstrumentunderstudy,and(2)whetherwecantrusttheresultfoundinthestudybyassessingtheriskofbiasofthestudy.ThesetwostepsarereflectedinthenewCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityormeasurementerrorofoutcomemeasurementinstruments(4).Thequalityassessmentofastudyonreliabilityormeasurementerrorcanbeconductedinthecontextofasystematicreviewofoutcomemeasurementinstruments.Insuchareviewallmeasurementpropertiesareconsidered,thequalityoftheeachstudyisassessed,theresultsofthestudiesareextracted,andpermeasurementpropertyanoverallconclusionisdrawnaboutthequalityoftheinstrumentbasedonallavailableevidenceforeachmeasurementinstrument.Subsequently,thequalityoftheevidenceisgraded,takingthenumber,quality,and(consistencyof)resultsofthestudiesintoaccount.Arecommendationforthemostsuitableinstrumentismade,basedonquality,feasibilityandinterpretabilityofeachinstrument.Asthisisnotaneasytasktoperform,weencouragetousesystematicandtransparentmethodswhenconductingsuchsystematicreviews.WedevelopedtheCOSMINmethodologyforconductingsystematicreviewsofPROMS(5),includingtheCOSMINRiskofBiaschecklist(1,6).Whenconductingasystematicreviewofothertypesofoutcomemeasurementinstruments,suchasClinROMs,PerFOMs,orlaboratoryvalues,thisnewlydevelopedCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerrorcanbeincorporatedintotheCOSMINmethodology.Inthismanualwewillexplainhowthisnewtoolshouldbeused.

Page 6: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

6

1. Backgroundinformation

1.1 COSMINinitiativeandsteeringcommittee

TheCOSMINinitiativeaimstoimprovetheselectionofhealthmeasurementinstrumentsbothinresearchandclinicalpracticebydevelopingtoolsforselectingthemostsuitableinstrumentforagivensituation.COSMINisaninternationalinitiativeconsistingofamultidisciplinaryteamofresearcherswithexpertiseinepidemiology,psychometrics,andqualitativeresearch,andinthedevelopmentandevaluationofoutcomemeasurementinstrumentsinthefieldofhealthcare,aswellasinperformingsystematicreviewsofoutcomemeasurementinstruments.ThistoolwasdevelopedinaDelphistudy(4).Thesteeringcommitteeofthisstudyconsistedof:LidwineBMokkinkMaartenBoersCeesvanderVleutenDonaldLPatrickJordiAlonsoLexMBouterHenricaCWdeVetCarolineBTerweeWeareverygratefultoallthepanelistsofthisstudy,whoprovideduswithmanyhelpfulandcriticalcommentsandarguments(inalphabeticalorder):M.A.D’Agostino,DorcasBeaton,SophievanBelle,SandraBeurskens,KristieBjornson,JanBoehnke,PatrickBossuyt,DonBushnell,StefanCano,SaskialeCessie,AlessandroChiarotto,MikeClark,JonDeeks,IrisEekhout,JimFarnsworthII,OkeGerke,SabineGoldhahn,RobertM.Gow,PhilipGriffiths,CristianGugiu,Jean‐BenoitHardouin,DesiréevanderHeijde,I‐ChanHuang,EllenJanssen,BrianJolly,LarsKonge,JanKottner,BrittanyLapin,HannekevanderLee,MariskaLeeflang,NancyMayo,SueMallett,JoyC.MacDermid,GeertMolenberghs,HolgerMuehlan,KoenNeijenhuijs,RaymondOstelo,LauraQuinn,DennisRevicki,JussiRepo,JohannesB.Reitsma,AnneW.Rutjes,MohsenSadatsafavi,DavidStreiner,MatthewStephenson,BerendTerluin,ZyphanieTyack,WernerVach,GemmaVilagutSaiz,MarcK.Walton,MatthijsWarrens,andDanielYeeTakFong.

Page 7: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

7

1.2 Howtocitethismanual

ThismanualaccompaniesthetooldevelopedintheDelphistudy.Please,refertothearticlewhenusingthemanualoftheCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerror.LBMokkink,MBoers,CPMvanderVleuten,LMBouter,JAlonso,DLPatrick,HCWdeVet,CBTerwee.COSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityormeasurementerrorofoutcomemeasurementinstruments:aDelphistudy.BMCMedicalResearchMethodology.2020;20(293).1.3 DevelopmentoftheCOSMINRiskofBiastooltoassessthequalityofstudieson

reliabilityandmeasurementerror

ThisCOSMINtoolwasdevelopedinaDelphistudy,containingthreerounds.Formoreinformationaboutthemethodsofthisstudy,werefertoMokkinketal.2020.InthisDelphistudywereachedconsensusonhowtoformulateacomprehensiveresearchquestionforstudiesonreliabilityandmeasurementerror,oncomponentsofoutcomemeasurementinstruments(whicharethepotentialsourcesofvariationrelevantinstudiesonreliabilityandmeasurementerror),andonstandardstoassessthequalityofastudyonreliabilityandmeasurementerrorofClinROMs,PerFOMs,orlaboratoryvalues.Basedonthoseresults,wedevelopedtheCOSMINRiskofBiastoolwhichcomprisestwoparts:1)sevenelementsthatmakeupacomprehensiveresearchquestionofthestudy,whichinformsusonhowthereliabilityandmeasurementerroroftheoutcomemeasurementinstrumentwasstudied,and2)standardsondesignrequirementsandpreferredstatisticalmethodsofstudiesonreliabilityandmeasurementerror,whichcanbeusedtoassessthequalityofthestudy.1.4 Definitionsofreliabilityandmeasurementerror

Reliabilityandmeasurementerrorareimportantmeasurementpropertiesofoutcomemeasurementinstruments.Reliabilityandmeasurementerroraredeterminedbasedonthesamestudydesignanddatacollection,butwithdifferentstatisticalmethods.Thesemeasurementpropertiesarethereforerelated,butdistinct.Reliabilityisdefinedastheproportionofthetotalvarianceinthemeasurementwhichisduetotruedifferencesbetweenpatients(7).Itreferstowhatextendaninstrumentisabletodistinguishbetweenpatients;areliabilitystudyinvestigatestheextenttowhichdifferentsourcesofvariationinfluencethemeasurement.Thisgivesdirectionforhowtoimprovethemeasurement,forexamplebystandardizationorrestrictionofthesourceofvariation.ReliabilitycanbecalculatedwithanIntra‐classCorrelation

Page 8: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

8

Coefficient(ICC),aGeneralizabilityCoefficientorwithakappa.Reliabilityparametersareexpressedasaproportionandliesbetween0and1.Measurementerrorisdefinedasthesystematicandrandomerrorofapatient’sscorethatisnotattributedtotruechangesintheconstructtobemeasured(7).Itreferstohowclosethescoresofrepeatedmeasurementsinstablepatientsare;suchstudiesinvestigatetheabsolutedeviationofthescoresortheamountoferrorofrepeatedmeasurementsinstablepatients.Incaseofcategoricaloutcomesitisalsocalled‘agreement’.ForcontinuousoutcomesmeasurementerrorisexpressedinthemeasurementunitsofthemeasurementinstrumentwithaStandardErrorofMeasurement(SEM)orLimitsofAgreement(LoA).Forcategoricaloutcomesagreementisexpressedaspercentagetotalagreementorpercentagesspecific(e.g.positiveandnegative)agreement.1.5 FocusoftheCOSMINRiskofBiastoolWefocusonoutcomemeasurementinstruments,definedasinstrumentsusedtomonitorthehealthstatusof(agroupof)peopleovertime,forexampleinaclinicaltrialorinclinicalpractice.

Severaltypesofmeasurementinstrumentsexist,suchaspatient‐reportedoutcomemeasure(PROM);observer‐reportedoutcomemeasures(ObsROMs;i.e.proxymeasures);clinician‐reportedoutcomemeasurementinstruments(ClinROMs)(includinge.g.readingsbasedonimagingmodalitiesandratingsbasedonobservations);performance‐basedoutcomemeasurementinstruments(PerFOMs);andbiomarkeroutcomes–alsocalledlaboratoryvalues(2).TheCOSMINRiskofBiastooltoassessreliabilityandmeasurementerrorisspecificallydevelopedforClinROMs,PerFOMs,andlaboratoryvalues(seeTable1forexamples).Theseoutcomemeasurementinstrumentstypicallyrequireinvolvementofoneormoreprofessionalstooperateequipmentortools,togiveinstructionstothepatient(e.g.toperformataskoraction)ortocometoascorethroughtheirclinicalexpertise(e.g.afterobservingapatientoranimage).Anoutcomemeasurementinstrumentcomprisesthewholemeasurementproceduretocometoascore,includingissuessuchasmaterials,communication(e.g.instructionsandmotivatingpatientsincaseofperformance‐basedtest),clinicaljudgment,performingatask.Allissuesrelevantforreliableandvalidmeasurementshouldbedescribedinthemeasurementprotocolofanoutcomemeasurementinstrument.

Page 9: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

9

Table1.ExamplesofClinROMs,PerFOMs,andlaboratoryvaluesClinician‐reportedoutcomemeasurementinstruments(ClinROMs)Clinician‐reportedratingoftheseverityofadiseaseorcondition.Forexample,theHamiltonAnxietyRatingScaletoassesstheseverityofanxietysymptomscomprises14itemsthatarescoredbyaclinician(8).AGlobalAssessmentoftheseverityofaconditionscorede.g.onasingle‐itemVisualAnalogueScalebyahealth‐careprofessional.Resultofclinicalexaminationof(patho)physiology,suchasbloodpressureoracountofswollenjoints.Clinicalreadingofdevice‐basedresults(oftenimaging),suchpowerDopplerultrasonographytoassessscardiacstructure,functionandhemodynamics(echocardiography)(9),orMRIusedtoevaluatecartilagedefectsize,depth,andsubchondralboneinordertoassesschondralandosteochondrallesionsattheknee(10).Performance‐basedoutcomemeasurementinstrument(PerFOMs)Aperformance‐basedwalkingtest(e.g.thetimed25‐footwalktest(11)),inwhichaprofessionalinstructsapatienttowalk25feetathisowncomfortablepacewithorwithoutawalkingaid.Timeneededtocover25feetismeasuredbytheprofessional.LaboratoryvalueorbiomarkerLaboratoryvaluesuchasHbA1c(glycatedhaemoglobin)measuredbytheturbidimetricinhibitionimmunoassay(TINIA)(12).DifferentversionsoroperationalizationsofoutcomemeasurementinstrumentsTomeasureaspecificconstruct,differentversionsofameasurementinstrumentmayexist.Forexample,theDoloplusisaclinicalassessmenttooltomeasurebehaviouralpainassessmentincognitivelyimpairedpatients,andisadministerede.g.bytheattendingnurse.TheoriginalDoloplus‐1contained15items,whiletheDoloplus‐2contains10items(13).Ameasurementinstrument(i.e.themeasurementprotocol)canbeoperationalizedinmanydifferentways,andeachoperationalizationcouldbeconsideredadifferentversion.Forexample,thespecificequipmentusedtomeasuretherangeofmotion(ROM)candiffer,e.g.,asimpleuniversalgoniometer(14)oranelectromagnetic3‐dimensionaltrackingsystem(15).Thelocationtobemeasuredcandiffer,e.g.,theneck(14)ortheshoulder(16).Thebackgroundoftheprofessionalinvolvedcandiffer,e.g.,arheumatologistoraradiologistwhoconductsthemeasurement,andtheseratersmayhavehaddifferentlevelsoftraining(17).Inprinciple,weconsidereachversionofanoutcomemeasurementinstrumentoreachdifferentoperationalizationofthemeasurementprotocolasaseparatemeasurementinstrument,untilevidenceisprovided(e.g.testingofmeasurementinvariance,orreliability)thattheversionsperformsimilarly.

Page 10: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

10

1.6 ThestructureoftheCOSMINRiskofBiastool

TheCOSMINRiskofBiastoolcomprisestwoparts.PartAhelpstounderstandhowtheresultsofapublishedstudyinformusaboutthereliabilityormeasurementerroroftheoutcomemeasurementinstrumentsunderstudy.PartBhelpstoassesswhetherwecantrusttheresultobtainedinthestudybyassessingtheriskofbiasofthestudy.PartAForagoodunderstandingofhowtheresultsofastudyinformsusaboutthereliabilityandmeasurementerroroftheinstrument,agoodunderstandingofthedesignofthestudyanditscorrespondingcomprehensiveresearchquestionisneeded.InpartAwedescribethesevenelementsthatwerecommendtobeextracted,andthattogethercanbeusedtoconstructacomprehensiveresearchquestionforeachanalysis.Inaddition,PartAofthetoolcontainsanoverviewofthecomponentsofoutcomemeasurementinstruments.Thesecomponentarethepotentialsourcesofvariationthatcaneitherbestudied(i.e.variedacrosstherepeatedmeasurements),orarekeptorassumedtobestable(i.e.standardized).PartB.Next,wedevelopedtwoboxeswithstandardsforstudiesonreliabilityandforstudiesonmeasurementerror,respectively.AsintheCOSMINRiskofBiaschecklistforPROMs(1),standardsrefertodesignrequirementsandpreferredstatisticalmethodsofstudiesonmeasurementproperties.Forexample,‘reliabilityandmeasurementerrorshouldbeassessedinpatientsthatareassumedtobestable’;or‘measurementerrorshouldbeassessedwiththestandarderrorofmeasurementorwiththelimitsofagreement’.Thestandardsarestatedasquestions:e.g.‘werepatientsstableintheinterimperiodontheconstructtobemeasured?’.Wereferto‘preferred’statisticalmethods.Wemeanby‘preferred’thatthesestatisticalmethodsareappropriatetousewhenevaluatingreliabilityormeasurementerrorofoutcomemeasurementinstruments,andarecommonlyused.Othermethodsmaybeappropriatetouseaswell(forexamplebi‐factormodelsorMulti‐TraitMulti‐Method(MTMM)analyses,ornewlydevelopedmethods).Itisnotourintentiontocomprehensivelydescribeallpossiblestatisticalmethods,rathertodescribetheadequatemethodsthatarecommonlyusedintheliterature.ItisuptotheuseroftheCOSMINtoolhowstudiesusingtheselesscommonlyusedmethodsareassessed.1.7 The“worst‐score‐counts”principle

Eachstandardinaboxisscoredonthefour‐pointscale,i.e.‘verygood’,‘adequate’,‘doubtful’,and‘inadequate’,seechapter3formoreinformation.SimilarasintheCOSMINRiskofBiaschecklistforPROMs(1),weusetheworst‐score‐countsmethod(18)tocometoaratingforthequalityofthestudyonreliabilityormeasurementerror.

Page 11: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

11

1.8 Relevanceoftheresearchquestion

Whilemanydifferentresearchquestionsconcerningthereliabilityormeasurementerrorofanoutcomemeasurementinstrumentcanbeinvestigated,therelevanceofastudyisnotunderquestionwhenusingthistool.Therelevanceofastudyreferstodifferentaspects.

‐ Choiceofthepotentialsource(s)ofvariationthathasbeenvariedovertherepeatedmeasurements.

‐ Choiceofthetargetpopulationofpatientsandprofessionals(whenapplicable)ofthestudy.

‐ Choiceofhowthemeasurementprotocolwasexecuted,whenapplicable.‐ Choiceofevaluatingthespecificmeasurementproperty,eitherreliabilityor

measurementerror.Oftenonlyreliabilityisreported,whilethemeasurementerrorcanbecalculatedusingthesamedata.

WhenusingthisCOSMINRiskofBiastool,theseaspectswillbeextractedfromthedesignofthestudy(inpartA).However,nojudgementwillbegivenabouttheappropriatenessofthechoicesmade.Thechoicesmadeintheresearchquestionandstudydesignbytheresearchersdeterminetheinterpretationandgeneralizabilityoftheresults.1.9 UsingtheCOSMINRiskofBiastoolinasystematicreview

TheCOSMINRiskofBiastoolisdevelopedtoassessthequalityofapublishedstudy.OneapplicationoftheCOSMINRiskofBiastoolistoassessthequalityofstudieswhenconductingasystematicreviewonmeasurementinstruments.COSMINdevelopedasystematicmethodologyforconductingsystematicreviewsofPROMs(5).Itconsistsofa10stepprocedure,inwhichtheCOSMINRiskofBiaschecklist(1)(containingstandardsforallninemeasurementproperties)canbeappliedtothestudiestoassessthequalityofeachstudy.TousetheCOSMINmethodologyforconductingsystematicreviewsofothertypesofinstruments–thatis:otherthanPROMs–weadvisetoreplacetheboxes6(Reliability)and7(Measurementerror)withtheCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerrorofoutcomemeasurementinstruments.MoreinformationabouthowtoconductasystematicreviewusingthenewCOSMINRiskofBiastoolcanbefoundinchapter4.

Page 12: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

12

1.10 Expertiserequiredforusingthetool

Toassessthequalityofastudyonreliabilityandmeasurementerror,i.e.foruseinasystematicreviewonthequalityofoutcomemeasurementisquitecomplexandtimeconsuming,anditrequiresexpertisewithintheresearchteamonseveralaspects.Werecommendthatatleastoneoftheteammembersshouldhaveexpertiseontheconstructtobemeasured,e.g.tounderstandwhatappropriatetimeintervalsarebetweenrepeatedmeasurements;onthemeasurementinstruments,e.g.tounderstandwhatconcomitantsourcesofvariationcouldbe(andtheseshouldberestrictedorstandardized–seeelement2inPartA);onthepatientpopulation,e.g.tounderstandwhetherpatientswerestablebetweenrepeatedmeasurementsorwhethersubgroupsofpatientscanbeconsideredinonestudy.Aclinicalexpertmightcombinetheseexpertises.Amethodologicalexpertshouldbepartoftheteammemberwithexpertiseonthetheoryofreliabilityandmeasurementerror,e.g.tounderstandwhetherthedesignisappropriatelyanalyzed(e.g.standards7).1.11 UsingtheCOSMINRiskofBiastooltoassessstudiesonPROMsorObsROMs

ThisnewCOSMINRiskofBiastoolisdevelopedspecificallyforClinROMs,PerFOMs,andlaboratoryvalues.However,itcanalsobeusedtoassessthequalityofstudiesonreliabilityormeasurementerrorofPROMsorobserver‐reportedoutcomemeasures(ObsROMs;i.e.observationsmade,appraised,andrecordedbyapersonotherthanthepatientwhodoesnotrequirespecializedprofessionaltraining(2),e.g.proxymeasures).However,forthesetwotypesofinstrumentsthetoolmayseemunnecessarilycomplex.Thefirststepinthetool(i.e.understandinghowtheresultsinformusonthequalityofthemeasurementinstrumentunderstudy)isoftenobvious,astheaimofreliabilitystudiesofPROMsandObsROMsismostoftentoassesstest‐retestreliabilityormeasurementerrorofthewholemeasurementinstrument(asthesemeasurementinstrumentscanonlybetakeninonego,andtheonlypotentialsourceofvarianceisoccasion).Thesecondstepinthetool(assessingthequalityofthestudyusingthestandards)willleadtothesameratingcomparedtousingthestandardsoftheRiskofBiaschecklistforPROMs.Thestandardsondesignrequirementsinbothtoolsarepartlythesame.However,thenewtypesofoutcomemeasurementinstrumentsforwhichweadaptedtheCOSMINchecklist(i.e.ClinROMs,PerFOMsandlaboratoryvalues),requireadditionalstandards,whicharenotusuallyapplicableforPROMsandObsROMs.(Ifitisapplicableinaspecificstudy,itcouldberatedusingthe‘otherflaws’standardintheRiskofBiaschecklistforPROMs).Theresponseoptionsforstandardsonpreferredstatisticalmethodsinthenewtoolaresomewhatdifferentlyformulated,butwillleadtothesameratingasthePROMRiskofBiaschecklist.

Page 13: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

13

1.12 ARiskofBiastoolisnotastudydesignchecklist,norareportingguideline

ThisCOSMINRiskofBiastoolisdevelopedtoassessthequality(i.e.riskofbias)ofapublishedstudyonreliabilityormeasurementerror.Thistoolisnotdevelopedasadesignchecklistorareportingguideline.Whendesigningorreportingastudyonreliabilityormeasurementerroradditionalitemsarerelevanttoconsiderorreport.Forexample,thesamplesizeofpatientsamplesandnumberofratersorrepeatedmeasurementsareimportantinthedesignofastudy,andwhenreportingspecificresultssuchasthevariancecomponents,95%confidenceintervalsaroundICCs,marginalwhenreportingkappa’s,oradditionalassumptionsarerequired.

Page 14: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

14

2. PartA.Understandinghowastudyinformsusaboutthereliabilityandmeasurementerrorofanoutcomemeasurementinstrument.

Ingeneral,thedesignofastudyonreliabilityandmeasurementerrorisaboutrepeatedmeasurementinstablepatients.Eachmeasurementisaccompaniedbysomeerror.Thiserroriscausedbysourcesofvariation,suchastheequipmentused,theprofessionalsinvolved,andothercomponentsofmeasurementinstruments.Forexample,thescoreonaninstrumentcanbeinfluencedbyhowtheratermotivatesthepatient,howthemachinewassetup,orbytheoccasion(e.g.firstandsecondoccasion,dayoftheweek,timeoftheday).Inchapter2.1wesystematicallydescribeallcomponentsofoutcomemeasurementinstruments,whicharethepotentialsourcesofvariationofanoutcomemeasurementinstrument.Manydifferentsourcesofvariationcanaffectthemeasurement,andeachofthemcanbestudiedusingadifferentstudydesigns.Eachstudydesignanswersadifferentresearchquestion,andeachresearchquestiongivesspecificinformationaboutthequalityofthemeasurementinstrument.Tounderstandhowastudycaninformusaboutthequalityofanoutcomemeasurementinstrumentwedescribeinchapter2.2sevenelementsofacomprehensiveresearchquestion.PartAofthetoolcontainstheoverviewsofthecomponentsofoutcomemeasurementinstruments(foroutcomemeasurementinstrumentsthatdoesnotinvolvebiologicalsampling,andthosethatinvolvebiologicalsampling,respectively),andthesevenelementsofacomprehensiveresearchquestion.Inchapter2.3weprovideanexampleinwhichweshowhowtousePartAofthetool,byapplyingittoapaperbySkeie(19).Inchapter2.2wewillusethisexample,too(amongotherexamples).

2.1 Componentsofoutcomemeasurementinstruments

Allmeasurementinstrumentsconsistofcomponents,suchasequipmentandpreparatoryactions.Wedevelopedtwotaxonomiesofcomponentsofoutcomemeasurementinstruments,oneforoutcomemeasurementinstrumentsthatdonotinvolvebiologicalsampling(i.e.ClinROMsandPerFOMs)(seeTable2),andoneforthosethatdo(i.e.thelaboratoryvalues,suchasbloodorurinetests,tissuebiopsy)(seeTable3).

Page 15: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

15

Table2.ComponentsofoutcomemeasurementinstrumentsthatdonotinvolvebiologicalsamplingComponent Elaboration Examples

Equipment Allequipmentnecessaryinthepreparation,theadministration,andtheassignmentofscoresoftheoutcomemeasurementinstrument

Questionnaireforms,computers,tablet,penandpaper;stairstepsofaspecificheight;deviceortools(suchasstopwatch,probe,tube);ultrasoundmachine,ultrasoundgels,MRIscanner;software.

Preparatoryactionsprecedingrawdatacollectionbyprofessionals,patients,andothers(ifapplicable)

1.Generalpreparatoryactions,suchasrequiredexpertiseortrainingforprofessionalstoprepare,administer,storeorassignthescores2.Specificpreparatoryactionsforeachmeasurement,suchas

preparationsofequipment,environment,storagebyprofessionalsa

preparationsofthepatientbbytheprofessional

Training,educationorexperiencerequired,certification.Preparationofequipment:calibrationofdevice/equipment,adjustsettingsofthemachine.Preparationoftheenvironment:lightconditions,roomtemperature,humidity,specificlengthofawalkingtrack.Preparationforstorage:designdatabaseandlogbookProvidegeneralandpreparatoryinstructionsforthepatients,suchasexplainingthetasks/actionthatneedtobeperformedincludingtimeschedule,safetyissuesandsideeffects;instructionsondiet(e.g.useofcaffeine),clothing(e.g.comfortableshoes,nojewelry,glassesordevices),performanceduringtests(e.g.performataskasusual;trytowalkasfastasyoucan;lieascalmaspossible);setsometrainingorperformafamiliarizationsession.Attachingelectrodestothebody,injectionwithradioactivesubstanceorcontrastdye,positioningthepatient,applyingultrasoundgel.

Page 16: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

16

Component Elaboration Examples

Preparationsundertakenbythepatients

Listentoandunderstandingtheinstructionsprovided;adherencetothepreparatoryinstructionssuchasfasting,resting,takingmedication,bowelpreparation,exercising,shaving.

Collectionofrawdata

Allactionsundertakenbypatientandprofessional(s)tocollectthedata,beforeanydataprocessing

Thepatientcompletingquestionsathome,oratthehospital;orperformingthetasks;theraterobservingortimingtheperformance;switchingtheimagingdeviceonandoff;positioningandmovingtheultrasoundprobe.

Dataprocessingandstorage

Allactionsundertakenontherawdatatostoreitinausable(electronic)formforlaterdatamanipulation(suchasscoreassignmentorstatisticalanalysis)

ThedigitallyconvertedsignalofaspecificbodyMRIscanwhichistemporarilystoredintheK‐space,issenttoanimageprocessorwhereamathematicalformula(i.e.Fouriertransformation)isapplied,leadingtoanimagewhichisdisplayedonamonitorandsavedonacomputer;Otherexamples:answersofquestionitemsarerecordedone.g.paperformsandstoredorLikertscaleformatresponseoptionsareconvertedintoa0‐4scoreanddirectlyenteredinacomputerdatabase.Performanceofdataqualitycheckse.g.doubleentryorvalidationchecksonthestored/entereddata.

Assignmentofthescore(s)

Methodsusedtoconvertprocesseddataintoascorecthatconstitutestheoutcomemeasurementinstrument.

Acalculationofamathematicalformulaortheapplicationofascoringsalgorithm(e.g.asetofrulestobefollowed)totheprocesseddata;aclinicianselectsthespecificimagesandjudgestheseverityandquantityofe.g.lesionsonthesetofimagesorcomparesittoareference;scoresadjustedfore.g.missingdataorpatientsusingdevicessuchasmobilityaids.

aProfessionalsarethosewhoareinvolvedinthepreparationortheperformanceofthemeasurement,inthedataprocessing,orintheassignmentofthescore;thismaybedonebyoneandthesameperson,orbydifferentpersons.bIntheCOSMINmethodologyweusetheword‘patient.’However,sometimesthetargetpopulationisnotpatients,bute.g.healthyindividuals,caregivers,clinicians,orbodystructures(e.g.joints,orlesions).Inthesecases,thewordpatientshouldbereadase.g.healthyvolunteer,clinician,ortherelevantbodystructure.cThescorecanbefurtherusedorinterpreted,byconvertingascoretoanotherscale,metricorclassification.Forexample,acontinuousscoreisclassifiedintoanordinalscore(e.g.mild/moderate/severe),ascoreisdichotomizedintobeloworaboveanormalvalue,patientsareclassifiedasrespondertotheintervention(e.g.whentheirchangeislargerthantheMinimalImportantChange(MIC)value).

Page 17: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

17

Table3.Componentsofoutcomemeasurementinstrumentsthatinvolvebiologicalsampling

Component Elaboration Examples

Equipment Allequipmentusedinthepreparation,theadministration,andthedeterminationofthevaluesoftheoutcomemeasurementinstrument

Collectiontools,suchasvenapunctureset,biopsytool;materialcontainers,suchasforbloodplasma(EDTAofheparintube),fortissue(containerforfrozenspecimensforimmunofluorescence,jarfilledwithformalin),forurinecollection(sterile,screw‐topcontainer),forstandardmicroscopictissueevaluation(fluidortissueforculture(sterilejar));laboratoryequipmentsuchascentrifuges,cabinets,andchromatographysystems,computers,software.

Preparatoryactionsprecedingsamplecollectionbyprofessionals,patients,andothers(ifapplicable)

1.Generalpreparatoryactions,suchasrequiredexpertiseortrainingforprofessionalstoprepare,administer,storeanddeterminethevalue

Training,educationorexperiencerequired,certification.

2.Specificpreparatoryactionsforeachmeasurement,suchas

preparationsofequipment,environment,andstoragebyprofessionalsa

preparationofthepatientbbytheprofessional

Preparationofequipment:calibrationofdevice/equipment,adjustsettingsofthemachine.Preparationoftheenvironment:lightconditions,roomtemperature,humidity.Preparationofstorage:set‐upallequipmentforstorage.Providegeneralandpreparatoryinstructionstothepatients,suchasexplainingthemeasurementprocedureincludingsafetyissuesandsideeffects;instructionsondiet;insertionandwithdrawalofacatheterintoabloodvessel.

Page 18: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

18

Component Elaboration Examples

Preparatoryactionsundertakenbythepatients

Listentoandunderstandingtheinstructionsprovided;adherencetothepreparatoryinstructionssuchasfasting,resting,takingmedication,exercising,shaving,washingofhands.

Collectionofbiologicalsample

Allactionsundertakentocollectthebiologicalsample,beforeanysampleprocessing

Takingabloodsampleortissuebiopsy,collectionofasampleofurine‘mid‐stream’inacontainer.

Biologicalsamplingprocessingandstorage

Allactionsundertakentobeabletopreserve,transport,andstorethebiologicalsamplefordetermination;and,ifapplicable,furtheractionsundertakenonthestoredsampletobeabletoconductthedeterminationofthebiologicalsample

Initialreactionofmaterialtoreagentincontainer(e.g.anticoagulationbyheparin).Bloodisdecomposed(bygravity)intoplasmaandbloodcells,andstoredataspecifictemperature.Tissueissnapfrozenbyimmersioninliquidnitrogen,orfixedinformalinembeddedin/processedtoparaffinforlong‐termstorage.Bloodiscollectedinatubecontaininganaqueoussolutiontetra‐sodiumsaltofethylene‐diamine‐tetra‐aceticacid(EDTA)andmixedwithairtolysetheerythrocytesandconverthemoglobintooxyhemoglobin.Cutsectionsorprepareasmearonaslide,tissuesarestainedbyimmunofluorescentmarkersspecificforcertainsurfaceantigens.Screwthelidoftheurinecontainershut,putinasealedplasticbagandstoreitinthefridgeataround4degreesCelsius,formax.24hours.

Determinationofthevalueofthebiologicalsample

Methodsusedforcountingorquantifyingtheamountofthesubstanceorentityofinterestc

Theabsorbanceofoxyhemoglobinat540nmthroughspectrophotometryquantifiesthehemoglobinconcentrationinthesample.Thepresenceofthemarkeronthecellsurfaceisdetectedandquantifiedbyfluorescencesignalintensity.Raterobserveseachslideandcountspositivecellsinanarea.Acalculationortheapplicationofamathematicalformulatothepreparedsample.

Page 19: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

19

aProfessionalsarethosewhoareinvolvedinthepreparationortheperformanceofthemeasurement,inthedataprocessing,orintheassignmentofthescore;thismaybedonebyoneandthesameperson,orbydifferentpersons;bIntheCOSMINmethodologyweusetheword‘patient.’However,sometimesthetargetpopulationisnotpatients,bute.g.healthyindividuals,caregivers,clinicians,orbodystructures(e.g.joints,orlesions).Inthesecases,thewordpatientshouldbereadase.g.healthyvolunteer,clinician,orrelevantbodystructure;cThevaluecanbefurtherprocessedintoaclinicalscore,ifapplicable,byalinearorsemi‐quantitativeconversion.Forexample,acontinuousscoreisclassifiedintoanordinalscore(e.g.mild/moderate/severe),ascoresisdichotomizedintobeloworaboveanormalvalue,patientsareclassifiedasresponderontreatment(e.g.whentheirchangeislargerthantheMinimalImportantChange(MIC)value).Asnonoisewilloccurfromthisconversion,thisisnotapotentialsourceofvariance,butratheraninterpretationofthevalue.Thereforewedonotincludethisphaseinthecomponentsforoutcomemeasurementinstrumentsthatinvolvebiologicalmaterials.

Page 20: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

20

2.2ExtractingtheelementsofacomprehensiveresearchquestionBeforewecancomprehensivelyassesstheinformationinastudyonthereliabilityormeasurementerrorofaninstrument,weneedtofullyunderstandthedesignofthestudyandreformulatetheresearchquestionintowhatwecalla‘comprehensiveresearchquestion’.Oftenthepublishedresearchquestionisnotspecificenoughtoratetheadequacyofthestudydesign.Forexample,ifthestatedaimoftheirstudyistoassessinter‐raterreliabilityofaninstrument,itisclearthatraterswillbevaried.However,withoutfurtherinformationitisnotclearwhethertheinterestisintheinter‐raterreliabilityofthewholemeasurementprocedure(e.g.bydifferentclinicians),oronlyinthereliabilityofapartofthemeasurementprocedure(e.g.onlytheassignmentofthescorebasedonanimage).Togetacompletepicture,werecommendtoextractsevenelementsfromthepublicationthattogethercanformthe‘comprehensiveresearchquestion’(seeTable4).Notethatonearticlecancontainmultiplequestions,eachrequiringanextractionofthesevenelements.Table4.Elementsofacomprehensiveresearchquestion.1 thenameoftheoutcomemeasurementinstrument2 theversionoftheoutcomemeasurementinstrumentorwayofoperationalizationofthe

measurementprotocol3 theconstructmeasuredbythemeasurementinstrument4 aspecificationwhetheroneisinterestedinareliabilityparameter(i.e.arelative

parametersuchasforcontinuousoutcomesanICC,Generalizabilitycoefficientφ,orKappaκ)oraparameterofmeasurementerror(i.e.anabsoluteparameterexpressedintheunitofmeasuremente.g.SEM,LoAorSDC;orforcategoricaloutcomesexpressedasagreementormisclassification,e.g.thepercentagespecificagreement).

5 aspecificationofthecomponentsofthemeasurementinstrumentthatwillberepeated(especiallywhenonlypartofthemeasurementinstrumentisrepeated,e.g.onlyassignmentofthescorebasedonthesameimages)

6 aspecificationofthesource(s)ofvariationthatwillbevaried(e.g.timeoroccasion,the(levelofexpertiseof)professionals,themachines,orothercomponentsofthemeasurement)

7 aspecificationofthepatientpopulationstudiedICC=Intraclasscorrelationcoefficient;SEM=standarderrorofmeasurement;LoA=LimitsofAgreement;SDC=smallestdetectablechange.

Page 21: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

21

ElaborationontheelementsofacomprehensiveresearchquestionElement1.ThenameoftheoutcomemeasurementinstrumentThenameoftheinstrumentshouldbeexactlyspecified.Sometimes,thisisreadilyapparent,e.g.the6minuteWalkingtest(6MWT)ortheNineHolePegTest(NHPT).Insomecases,ameasurementprotocolinvolvesmultiplemeasurementinstruments(e.g.theMultipleSclerosisFunctionalComposite(MSFC)includestheTimed25‐FootWalktest,theNineHolePegTest,andthePacedAuditorySerialAdditionTest(11)),whileinothercases(e.g.imaging)theremaynotyetbeaclearname.Notethatthenameofthemachineisnotthenameoftheoutcomemeasurementinstrument;oftenamachinecanbeusedtomeasureavarietyofparameters(e.g.Greyscaleultrasound[tomeasure]synovialthickening(synovialhypertrophy)orDopplerultrasound[tomeasure]increasedbloodflow(Synovialhyperemia)(19)),orapathologicalentitycanbemeasuredbydifferenttypesofimages(forexample,enthesitismeasuredbyultrasound(17)orbyMRI(20)).Werecommendtoincludethetypeofmeasurement(e.g.ultrasound)incombinationwiththeentitymeasuredasthenameofthescore(e.g.ultrasoundenthesitisscore).Element2.TheversionoftheoutcomemeasurementinstrumentorwayofoperationalizationofthemeasurementprotocolDetailsontheversion,andoperationalizationoftheoutcomemeasurementinstrumentshouldbeextracted.Detailsonspecificversionreferthee.g.thelengthofthetask(e.g.the2‐,6‐or12‐minutewalkingtest(21)),orthenumberofitemsincludedintheversion(e.g.Doloplus‐1orDoloplus‐2(13)),orthelanguageused(theEnglish(21)orDutchversion(22)ofthe6‐minutewalktest).Choicesinhowthemeasurementprotocolwasoperationalizedmayaffectthemeasurement,andshouldthusbemadeexplicit.Specifically,thecomponentsthatarepotentialsourcesofvariation,needtobelisted,forexample,specificcharacteristicsoftheequipmentused(e.g.brandandtypeofthemachine),andcharacteristicsoftheprofessionalsinvolvedinthemeasurement(e.g.backgroundandexperiences).Thetaxonomyofthecomponentsofmeasurementinstruments(seechapter2.1)canbeusedforthis.Element2referstocomponentsknownorexpectedtoinfluencethescorethatarenottheobjectofstudy.Toeliminatetheinfluenceofthesepotentialsourcesofvariationonthescoresobtained,thesecomponentsshouldhavebeenrestrictedorstandardizedinthestudy.Forexample,ifitisexpectedthatdifferenttypesorbrandsofmachinesmayinterferewiththescore,onlyonetypeandbrandofamachineisused(andreported).InthestudybySkeieetal(2015)onlytheMedisonAccuvixV10ultrasoundscannerwitha3–7MHzcurvilinearprobewasused(19)–inotherwords,thebrandandtypeofmachineandprobewasstandardized.Moreover,chiropractorswithrespectively4and8yearsofexperiencedindiagnosticultrasoundforthemusculoskeletalsystem,andwitha

Page 22: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

22

postgraduatediplomaindiagnosticultrasoundwereinvolvedinthemeasurements(19).Thus,thebackgroundoftheraterswasrestrictedtoaspecificprofession(i.e.chiropractors)withspecificdurationofexpertise(4/8yearsindiagnosticultrasound)havingreceivedspecifictraining.Inaddition,insomecasestheinstrumentprocedurerequiresmultiplereadings,andasummarystatistic(usuallythemean,butsometimesthemedian,maximumorminimum)iscalculatedasorusedtoassignthefinalscore(i.e.theresultsofthemeasurement).Awell‐knownexampleisbloodpressuremeasurementintheclinic.1Howthemeasurementistaken,shouldbespecified,asitisneededtoassessstandards7(seechapter3).ForpeoplefamiliarwiththeterminologyoftheGeneralizabilityTheory,theversionorthewayofoperationalizationofthemeasurementinstrumentreferstothefacetsofstratification,wherepatients(i.e.theobjectofmeasurement)arenestedinafacet(23).

Element3.TheconstructmeasuredbythemeasurementinstrumentToidentifyexactlywhichoutcomemeasurementinstrumentwasstudied,werecommendtoextracttheconstructmeasured,unlessitisclearfromthegivenname.Theconstructreferstowhatisbeingmeasured,i.e.the‘aspectofhealth’.Itisalsoreferredtoasthe‘conceptofinterest’orthe’intendedobjectivetobemeasured’.Whenthemeasurementinstrumentdoesnothaveaname,identifyingtheconstructcanhelptofullycharacterizetheoutcomemeasurementinstrument(whichwealsorecommendtomentioninthename,i.e.element1).Table5providessomeexamples.Notethatastudyonreliabilityormeasurementerrordoesnotprovideinformationaboutwhetherindeedtheconstructisbeingmeasured,forthatyouneedvalidityandaccuracystudies.

1 To measure blood pressure, the technician first palpates the radial artery, inflates the cuff until the pulse disappears, inflates an extra 20-30 mm Hg, and then slowly deflates until the pulse reappears. The pressure is noted, and the measurement begins: first, the stethoscope is placed on the brachial artery just medial and above the cubital fold. Then the cuff is reinflated. The pressure is quickly increased to 30 mm Hg above the previous reading, and then slowly deflated until the pulse sounds are detected (systolic blood pressure, measured in 2 mm increments), then further deflated until the sounds disappear (diastolic blood pressure). The cuff is fully deflated, then inflated again to repeat the measurement.

Page 23: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

23

Table5.Examplesofelements1,2,and3.

Element 1: name Element2:version/operationalization Element3:construct

Nineholepegtest(24)

Awoodenorplasticboardwith9holes(10mmdiameter,15mmdepth),placedapartby32mm(25)

Fingerdexterity

Ultrasound enthesitis score

Sonography images obtained by experienced sonographers using the Esaote Technos MPX machine

Enthesitis

HbA1cvaluebasedonimmune‐turbidimetry(12)

Turbidimetricinhibitionimmunoassay(TINIA),including2reagens(i.e.anti‐HbA1cantibody(R1),andbuffer/polyhaptenreagent(R2));Tetradecyltrimethylammoniumbromide(TTAB)isdetergent;Roche/Hitachicobascsystems.

HbA1c(glycatedhaemoglobin)

Element4.Specificationofthemeasurementpropertyofinterest

Whenthemeasurementpropertyofinterestisreliability,thestudywillreportrelativeparameterssuchasanICC,Generalizabilitycoefficientφ,orKappaκ.Whenthemeasurementpropertyofinterestismeasurementerror,thestudywillreportabsoluteparameters,eitherexpressedintheunitofmeasurement,suchasSEM,LOAorSDC,orexpressedasagreementormisclassification,e.g.thepercentagespecificagreement.

WerecommendtousetheCOSMINterminologytodeterminewhetherastudyassessedreliabilityormeasurementerror,regardlessofthetermsusedinthearticle,becauseconfusionpersistsaboutthecorrectapplicationoftheseterms.Forexample,wheninaparticulararticleitisstatedthat‘reliability’wasassessed,butthestandarderrorofmeasurement(SEM)orthelimitsofagreementarereported,theresultofthatstudyshouldbeconsideredasevidenceformeasurementerror(26).Whenanauthorstatestohaveevaluated‘agreementbetweenraters’usingthekappastatistic,theresultofthisstudyreferstothereliabilityoftheoutcomemeasurementinstrument(27).

Page 24: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

24

Element5.Specificationofthecomponentsofthemeasurementinstrumentthatwillberepeated.(Figure1)

Itshouldbeextractedwhethertheinterestofthestudyisinthereliabilityormeasurementerrorofthewholemeasurementprocedure(seeFigure1,studyA),oronlyinpartofthemeasurementprocedure(seeFigure1,studyB).Forexample,basedonanstaticimagethatwasmadeonceforapatient,onlytheassignmentofthescorewasrepeated,ortheperformanceofataskofeachpatientwasvideotaped,andonlythelastcomponent(i.e.assignmentofthescores)isrepeated.

Figure1.Whichpartofthemeasurementisrepeated.

Element6.Specificationofthecomponentsofthemeasurementinstrumentthatwillbevaried

Thecomponentofthemeasurementinstrumentthatisbeingvariedacrossthemeasurementsisthemainfocusofthestudy.Examplesaretimeoroccasion(test‐retest,orintra‐rater),theprofessionals(inter‐rater),orthemachines(inter‐machineorinter‐device)(28).Forexample,inFigure1ratersarevaried:raterAconductsthefirstmeasurementandraterBconductsthesecondmeasurementforeachpatients.

Page 25: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

25

Inthedesignofthestudyoneormoresourcescanbeconsidered.Forexample,boththemachineandtheraterwhoconductsthewholemeasurementarevariedacrosstherepeatedmeasurements(seeFigure2,studyA).Thetaxonomiesofcomponentsofmeasurementinstruments(seechapter2.1)canbeusedtoconsidervariouspotentialsourcesofvariation.

Figure2.Designsinwhichcomponentsarevariedacrossrepeatedmeasurements

Alternatively,theresearcherscanassumethatacomponent(e.g.preparationorassignmentofthescore)is‘stable’,inotherwords,thattheraterwhopreparesthemeasurementorwhoassignsthescorewillnotintroduceerrorinthispartofthemeasurement(indicatedingreyinFigure2studyBandC),andinvestigateonlytheinfluenceofthecomponents(e.g.)equipment,preparation,collectionofrawdataanddataprocessingandstorage.

InthedesignsshowninFigure1and2weassumethatallpatientsweremeasuredthisway.Thisiscalledacrosseddesign(29).However,so‐callednesteddesignsarepossible,too(seeFigure3).Inthesedesigns,partofthepatientsaremeasuredfollowingmeasurementconditionsAandotherpatientsaremeasuredusingmeasurementconditionsB.InFigure3anestedinter‐raterreliabilitydesignisshown,wheresomeofthepatientsaremeasuredfirstbyraterAandnextbyraterB(i.e.measurementconditionA),whileotherpatientsaremeasuredfirstbyRaterCandnextbyraterD(i.e.measurementconditionB),etc.Thesedesignsareappropriatetouse,andinthecalculationoftheICC,thiscouldbetakenintoaccount.Forexample,bycalculating

Page 26: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

26

variancecomponentspermeasurementcondition,andnextpoolthesevariancecomponents(weightedbysamplesize)acrossthemeasurementconditions(e.g.(30)),orbyusingaone‐wayrandomeffectsmodel(31).

Figure3.Nestedinter‐raterreliabilitydesign.

ForpeoplefamiliarwiththeterminologyoftheGeneralizabilityTheory,thecomponentsthatarebeingvariedacrossmeasurementsarecalledtherandomorfixedfacetsofGeneralizability(23).

Element7.Patientpopulation

Thereliabilitydependsonthehomogeneityorheterogeneityofthestudypopulation.Therefore,thesample(anditssubgroups)includedinthestudyshouldbeextractedandassessedbytheuserofthistool.InthestudybySkeieetal(2015)therecruitedsampleconsistedoflowbackpatients,patientswithotherspinalcomplaints,butalsoofpain‐freesubjects.Thislattergroupcouldhaveincreasedthevariancebetweenpatients,andsubsequently,influencedtheresults(i.e.increasedtheICC)ofthereliabilitystudy.

IntheCOSMINmethodologyweusethewordpatient.However,sometimesthestudypopulationofinterestconsistsofhealthyindividuals,bodystructures(e.g.joints,kidneys),cliniciansorcaregivers.Inthesecases,thewordpatientshouldbereadase.g.healthypersonorcaregiver.

ForpeoplefamiliarwiththeterminologyoftheGeneralizabilityTheory,thepatientpopulationreferstotheobjectofmeasurementorthefacetsofdifferentiation(23).

Page 27: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

27

2.3ExampleofhowtousePartAoftheCOSMINRiskofBiastooltoassessthequalityofastudybySkeieetal.(2015)

InthischapterweprovideanexampleofhowtousetheCOSMINtool–PartAusingapaperbySkeieetal.(19).Togetafullunderstandingofthestudy,werecommendtofirstreadtheintroductionandmethodsectionofthepaper.Inthispaperfourdifferentstudiesaredescribed.Hereweusethefirsttwosubstudies,andprovideasummaryofthesetwostudies.

Inthispaper,thelumbarmultifidusmuscle(LMM)thicknessscore(study1)andcontractionscore(study2)wasinvestigatedbyultrasound.Themeasurementproceedsasfollows:apatientisaskedtolaydowninaspecificposition,andtheprobeisplacedonaveryspecificbodypart.Thisyieldsanon‐screenimage.Subsequently,amarkerisplacedonaspecificstructure(i.e.theapexofthefacetjoint)identifiedontheimage.Instudy1,astillimageisrecorded,andthefirstraterplacesthesecondmarkeronanotherspecificstructure(i.e.processusmammillaris)onthisimage,andmeasuresthedistancebetweenthemarkerswiththecallipersoftware.ThetwomarkerscorrespondwiththethicknessoftheLMM.Thefirstraterrepeatsthesecondmarkerplacementanddistancemeasurementonthestillimagetwice,foratotalofthreemeasurements.Thepatientleaves.Next,basedontheverysamestillimage(withonlythefirstmarkervisible)asecondraterplacesthesecondmarkeronthescreenandmeasuresthedistanceatotalofthreetimes.Next,alldataistransferredtoaseparatepaperbyrater1whocalculatesameanvalueperpatientperrater.ThismeanvalueistheLMMthicknessscore.Therepeatedplacementofthesecondmarkeronthestillimageandapplicationofthecalipertooltomeasurethedistancebetweenthetwomarkersispartofonemeasurement(19).ThisprocedureisdepictedinFigure3,study1.

Figure3.StudydesignsofSkeieetal.

Page 28: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

28

Instudy2,foreachpatienteachoftheratersindependentlygeneratedoneimageoftheLMMintherestingstateandoneimageoftheLMMincontractedstate.Usingasplit‐screenofthetwostillimagesofbothstates,eachratermeasuredthickness(i.e.caliper‐assesseddistancebetweenthemarkers)ofthetwostatesthreetimes.Next,rater1transferredthedatatoaseparatepaperandcalculatedmeanvalues of the thickness of each state. Next,rater1calculatedthe‘LMMcontractionscore’astheexactchangeinthickness(contractedLMMminusrestingLMM)(19).ThisprocedureisdepictedinFigure3,study2.

BasedonthethoroughelaborationofthestudyperformedanddescribedbySkeieandcolleagues,weextracttheelementsofacomprehensiveresearchquestion.

Table6.ExampleofhowtousePartAoftheCOSMINRiskofBiastoolbasedonthestudybySkeie(19).

Element Instruction Study1 Study21.Nameoftheinstrument

Alternatively:typeofinstrumentandparameter

Ultrasoundmeasurementofthelumbarmultifidusmuscle(LMM)thicknessscore

UltrasoundmeasurementoftheLMMcontractionscore

2.Versionorwayofoperationalization

Allrelevantcomponentsthatareknownorexpectedtoinfluencethescore,andwhicharestandardizedorrestricted(facetofstratification(23))

Equipment:MedisonAccuvixV10ultrasoundscannerwitha3–7MHzcurvilinearprobe;Preparatoryactions:twochiropractorswith4respectively8yearsofexperienceindiagnosticultrasoundforthemusculoskeletalsystem,withapostgraduatediplomaindiagnosticultrasound;stillon‐screenimageswereobtainedwiththesubjectsinapronepositionwithapillowplacedundertheabdomentoflattenthelumbarlordosis.Preparation:Imagewason‐screengeneratedandamarkerwasplacedontheimageonthemamillaryprocessoftheleveltobemeasured.Unprocesseddatacollection:Thesecondmarkerwasplacedontheon‐screenimage,andthedistancewascomputedbythecallipersoftware.Thispartwasrepeatedthreetimes.

Preparation:Inrestingposition,animagewason‐screengeneratedandamarkerwasplacedontheimageonthemamillaryprocessoftheleveltobemeasured.Next,incontractedstate(LMMcontractionwasinducedbyacontralateralarmliftingtask),animagewason‐screengenerated,too,andamarkerwasplacedontheimage.

Page 29: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

29

Element Instruction Study1 Study2Dataprocessingandstorage:Dataistransferredtoaseparatepaperbyrater1.

Unprocesseddatacollection:basedonthesplit‐screenofbothimages,thesecondmarkerwasplacesoneachimage,andthedistance(perimage)wascalculatedbythecallipersoftware.Thispartwasrepeatedthreetimes.Dataprocessingandstorage:Dataistransferredtoaseparatepaperbyrater1.

Assignmentofthescore:Rater1calculatedameanvalueperpatientperrater.

Assignmentofthescore:Rater1calculatesameanvalueperpatientperraterforbothstates.Next,theratercalculatedthe‘LMMcontractionscore’astheexactchangeinthickness(contractedLMMminusrestingLMM).

3.Construct Descriptionofwhatisbeingmeasured

LMMthickness LMMcontraction,whichischangeinLMMthicknessincontractedandrestingstate(contractedLMMminusrestingLMM).

4.Measurementproperty

Reliabilityandmeasurementerror

Reliabilityandmeasurementerror

5.Componentsthatwillberepeated

Eitherthewholemeasurement(i.e.allcomponents)ortheassignmentofthescore(i.e.lastcomponent)

Thewholemeasurementwillberepeated.However,thefocusofinterestinontheunprocesseddatacollection:placingofthesecondmarkerontheon‐screenimage(meanofthreetimes).

Thewholemeasurementwillberepeated.However,thefocusofinterestinonthepreparation(i.e.preparationandgenerationofimagesintherestingandcontractedstates,andtheplacingofthefirstmarker),andontheunprocesseddatacollection(placingofthe

Page 30: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

30

Element Instruction Study1 Study2secondmarkerontheon‐screenimage(meanofthreetimes).

6.Source(s)ofvariationvaried

Componentswhichisvariedacrossthemeasurements(i.e.focusofanalysis;facetofgeneralizability(23))

Raters(n=2;inter‐raterreliability)

Raters(n=2;inter‐raterreliability)

7.Patientpopulation

(i.e.facetofdifferentiation(23))

LBPpatients,patientswithotherspinalcomplaintssuchasmidbackpain,neckpain,and/orextremitypain,andpain‐freesubjects(n=30ineachexperiment,totaln=120)

  

Basedontheextractedinformation,acomprehensiveresearchquestioncanbeformulatedas:

Study1:Whatistheinter‐raterreliabilityofthedatacollectionphaseofthelumbarmultifidusmuscle(LMM)thicknessscorebasedonthemeanofthreemarkeddistancewiththecallipersoftwareonastillimageoftheultrasoundmeasurement,measuredusingtheMedisonAccuvixV10ultrasoundscannerwitha3–7MHzcurvilinearprobebypost‐graduateexperiencedchiropractors,inLBPpatients,patientswithotherspinalcomplaintssuchasmidbackpain,neckpain,and/orextremitypain,andpain‐freesubjects?

Study2:Whatistheinter‐raterreliabilityofpreparing,generating,anddatacollectionphasesofthelumbarmultifidusmuscle(LMM)contractionscore,basedonthemeanofthreemarkeddistancewiththecallipersoftwareonanon‐screenimageinrestingandincontractionstateoftheultrasoundmeasurement,measuredusingtheMedisonAccuvixV10ultrasoundscannerwitha3–7MHzcurvilinearprobebypost‐graduateexperiencedchiropractors,inLBPpatients,patientswithotherspinalcomplaintssuchasmidbackpain,neckpain,and/orextremitypain,andpain‐freesubjects?

Please,notethatwedonotrecommendtoreporttheresearchquestionalwaysasthisinonelongquestion.Though,weconsideritveryusefultodescribeallthisinformationclearly,e.g.inthemethodsectionofapaper.

Page 31: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

31

3. PartB.Assessingtheriskofbiasofastudyonreliabilityormeasurementerror

PartBoftheCOSMINRiskofBiastoolcontainstwoboxeswithstandardsthatcanbeusedtodeterminewhethertheresultofastudyonreliabilityormeasurementerror,respectively,canbetrusted.Standardsrefertothedesignrequirementsofthestudyortothepreferredstatisticalmethods.Thestandards1to5inbothboxesrefertodesignrequirements.Thesestandardsarethesameforstudiesonreliabilityandforstudiesonmeasurementerror,asthesamedesigncanbeusedforassessingbothmeasurementproperties.Threestandardsrefertothepreferredstatisticalmethodsforstudiesonreliabilityandtwostandardsrefertothepreferredstatisticalmethodsforstudiesonmeasurementerror.IntheCOSMINRiskofBiastool,weincludedstandardsconcerningthepreferredstatisticalmethodsthatareappropriatetousewhenevaluatingreliabilityormeasurementerrorofoutcomemeasurementinstruments(seealsosection1.6).Othermethodsmaybeappropriatetouseaswell(forexamplebi‐factormodelsorMulti‐TraitMulti‐Method(MTMM)analyses,ornewlydevelopedmethods).Itisnotourintentiontocomprehensivelydescribeallpossiblestatisticalmethods,rathertodescribetheadequatemethodsthatarecommonlyusedintheliterature.Eachboxalsocontainsastandardaskingiftherewereanyotherimportantmethodologicalflawsthatarenotcoveredbytheotherstandards(standard6),butthatmayhaveledtobiasedresultsorconclusions.Someflawsareratheruncommon,andtherefore,donotjustifyaseparatestandard.Inchapter3.1weprovideseveralexamplesfortheseflaws.Eachstandardwillbescoredonafour‐pointratingsystem(i.e.‘verygood’,‘adequate’,‘doubtful’,or‘inadequate’)inlinewiththeCOSMINRiskofBiaschecklistforPatient‐ReportedOutcomeMeasures(PROMs)(1).Subsequently,thelowestratinggiveninaboxdeterminesthefinalrating,i.e.thequalityofthestudy(thisiscalledtheworst‐score‐countsmethod(18)todeterminetheriskofbias).Sometimesaresponseoptionisindicatedingrey,meaningthattheresponseoptionisnotapplicableforthestandard,andusersshouldchoosebetweentheotheroptions.Final,somestandardscanberatedas‘notapplicable’.Ingeneral,astandardonadesignrequirementisratedas‘verygood’whenthereisevidenceorconvincingargumentswereprovidedthatthestandardismet;‘adequate’whenitisassumable,althoughnotexplicitlydescribed,thatthestandardismet;‘doubtful’whenitisunclearthatthestandardismet;and‘inadequate’whenthereisevidencethatthestandardisnotmet(18).Astandardaboutpreferredstatisticalmethodsisingeneralratedas‘verygood’whenapreferredmethodwasoptimallyused;‘adequate’whenthepreferredmethodwasused,

Page 32: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

32

butitwasnotoptimallyused,‘doubtful’whenitisunclearifapreferredmethodwasused,and‘inadequate’whenthestatisticalmethodsusedareconsideredinadequate.Theboxesforreliabilityandmeasurementerror,respectively,canbefoundhere.Below,anelaborationofeachstandardisdescribedforreliability(chapter3.1)andmeasurementerror(chapter3.2).Inchapter3.3weprovideanexampleforratingtheboxonreliabilityinthestudybySkeie,thatwasalsousedasanexampleinchapter2.3.

Page 33: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

33

3.1ElaborationonstandardsforstudiesonreliabilityTheboxonreliabilitycontainsfivestandardsaboutdesignrequirements,onestandards‘otherflaws’andthreestandardsaboutpreferredstatisticalmethods.Foreachstandardwegivesuggestionsforhowtoratethestandard.Standard1.Stabilityofthepatient verygood adequate doubtful inadequate NA

Werepatientsstableinthetimebetweentherepeatedmeasurementsontheconstructtobemeasured?

Yes(evidenceprovided)

Reasonstoassumestandardwasmet

Unclear No(evidenceprovided)

Notapplicable

Elaboration:Patientsshouldbestablewithregardtotheconstructtobemeasuredbetweentherepeatedmeasurements.Whenaninterventionsuchassurgeryormedicationisgivenintheinterimperiod,itislikelythat(manyof)thepatientshavechangedontheconstructtobemeasured.Inotherwords,theyarenotstable–andthestandardshouldberatedas‘inadequate’.Whentheaimistoassessthereliabilityoftheassignmentofthescore,e.g.usingstaticimagesorvideosoftheperformanceofataskasobjectofinterest(seeFigure1study2–page24),thisstandardisnotapplicableastheimagesandvideoswereacquiredonlyonce.Furthermore,themeasurementcaninterferewiththestabilityofthepatient.Forexample,thereshouldbeenoughtimeforpatientstorecoverfromexperiencedpainorfatiguebetweenrepeatedmeasurementsandpermitpatientstoreturntotheirinitialstate.Ifnot,thestandardshouldberatedas‘doubtful’,asitisunclearwhetherthepatientsarestableontheconstructtobemeasured.Whenevidenceorconvincingargumentsareprovidedthatthepatientswerestable,thestandardisscored‘verygood’.Standard2:Timeinterval verygood adequate doubtful inadequate

Wasthetimeintervalbetweenthemeasurementsappropriate?

Yes Doubtful,ORtimeintervalnotstated

No

Elaboration:Thetimeintervalbetweenthemeasurementsmustbeappropriate.Thedefinitionof“appropriate”dependsontheconstructtobemeasuredandthestudypopulation.Thetimeintervalshouldbelongenoughtopreventrecallbiasofpreviousscoresincaseofintra‐raterreliability,andshortenoughtoensurethatpatientshavenotchangedontheconstructtobemeasured.Forexamplesynovitiscanchangeinafewdays,whileachangeincartilageorbonestatuswouldtakeafewmonths.

Page 34: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

34

Standard3.Similarmeasurementconditions

verygood adequate doubtful inadequate

Werethemeasurementconditionssimilarforthemeasurements–exceptfortheconditionbeingevaluatedasasourceofvariation?

Yes(evidenceprovided)

Reasonstoassumestandardwasmet,ORchangewasunavoidable

Unclear No(evidenceprovided)

Elaboration:Eachrepeatedmeasurementshouldbeconductedwiththesamemeasurementprotocol–exceptforthesourceofvariationthatwasintentionallyvaried,i.e.element6ofthecomprehensiveresearchquestion(seechapter2.2).Forexample,iftheaimwastounderstandthevariationduetodifferentraters(i.e.inter‐raterreliability),onlytheratersshouldbevaried.Otherconcomitantsourcesofvariation(i.e.element2ofthecomprehensiveresearchquestion,seechapter2.2)shouldbekeptsimilar.Wasthestudyuptostandard?Wereallequipment,preparatoryactions,theenvironmentalconditions(e.g.temperature),andmethodsofprocessingthesameinbothmeasurements?Forexample,whenthepatientsareverylikelytoshowalearningeffect(forexampleonaperformance‐basedtest),theabsenceofafamiliarizationsessionshouldyieldaratingofdoubtfulorinadequateonthisstandard,asthefirstmeasurementcanthenbeconsideredtobethefamiliarizationsession,andthemeasurementconditionsarenotthesame.Adescriptionofsimilarityofthemeasurementconditionsoftherepeatedmeasurementscanbeconsideredasevidence.Standards4.AdministrationofmeasurementsIninstrumentsthatdonotinvolvebiologicalsampling,theadministrationreferstothecomponents‘Collectionofrawdata’and‘Dataprocessingandstorage’(seechapter2.1).Ininstrumentsinvolvingbiologicalsampling,itreferstothecomponents‘Collectionofbiologicalsampling’and‘Biologicalsamplingprocessingandstorage’(seechapter2.1). verygood adequate doubtful inadequate

Didtheprofessional(s)administerthemeasurementwithoutknowledgeofscoresorvaluesofotherrepeatedmeasurement(s)inthesamepatients?

Yes(evidenceprovided)

Reasonstoassumestandardwasmet

Unclear No(evidenceprovided)

Elaboration:Allmeasurementsshouldbeadministeredbytheprofessional(s)involvedwithoutthemhavingknowledgeofthescoresorvaluesofotherrepeatedmeasurementsonthesameoutcomemeasurementinstrument.Thismeansthatthemeasurementsshouldallbeadministeredwithoutknowledgeoftheprior(e.g.incaseofanintra‐raterreliabilitystudy)orother(e.g.incaseofaninter‐raterreliabilitystudy)score(s)orvalue(s)ontheinstrumentofinterest.

Page 35: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

35

Theratingofthisstandardisrathersubjective.Forexample,ifinastudytheratersindependentlyadministeredthemeasurement,andnonewereinvolvedinthecareofthepatients(makingitveryunlikelythattheratersreceivedadditionalinformationofthepatientsincludingknowledgeofthescore(s)ofotherrepeatedmeasurements),thiscanbeconsideredas‘evidenceprovided’,andtheratingis‘verygood’.Whentheotherscoreisknowntotheprofessionalwhileadministeringtherepeatedmeasurement,itmayinfluencethewaythemeasurementisadministered.Forexample,withaseverescoreobtainedwithanimagingtechnique,therepeatedmeasurementcanbeadministeredmorecarefully,andmoretimecanbeusedtolookatthepatient.Ifitisknownthatthiswasthecase,theratingis‘inadequate’.Whenthereisnoexplicitdescription,butitseemsveryunlikelythattheratersknewthescoresorvaluesofotherrepeatedmeasurements,itcanberatedas‘adequate’,or‘doubtful’.Insomesituationsthisstandardisnotapplicable,i.e.whentheadministration(i.e.collectionoftherawmaterialorbiologicalsample,dataorsamplingprocessingandstorage)isnotrepeatedinthestudy,butonlytheassignmentofthescoreorthedeterminationofthevalue(seeforexampleschapter2.2element5ofthecomprehensiveresearchquestion,orFigure1study2).Standard5.Assignmentofthescoreordeterminationofthebiologicalvalue

verygood adequate doubtful inadequate

Didtheprofessional(s)assignscoresordeterminevalueswithoutknowledgeofthescoresorvaluesofotherrepeatedmeasurement(s)inthesamepatients?

Yes(evidenceprovided)

Reasonstoassumestandardwasmet

Unclear No(evidenceprovided)

Elaboration:Thescoresonallmeasurementsshouldbeassignedorvaluesshouldbedeterminedbytheprofessional(s)involvedwithoutthemhavingknowledgeofthescoresorvaluesofotherrepeatedmeasurements.Thismeansthatassigningascoretoameasurementordeterminingthevalueofabiologicalsampleshouldbedonewithoutknowledgeoftheprior(e.g.incaseofanintra‐raterreliabilitystudy)orother(e.g.incaseofaninter‐raterreliabilitystudy)score(s)orvalue(s)ontheinstrumentofinterest.Althoughpartofthedeterminationofthevalueofabiologicalsamplecanbeanautomaticstep,theremaybehumanactionrequiredtodothisdetermination.Forexample,anurinepHleveltesttomeasuretheacidityoralkalinityofurinewherethecolorofthestripisinterpretedbytheprofessional.Theratingissimilarlyasexplainedforstandard4.

Page 36: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

36

Standard6.Otherimportantflaws verygood adequate doubtful inadequate

Werethereanyotherimportantflawsinthedesignorstatisticalmethodsofthestudy?

No Minorothermethodologicalflaws

Yes

Elaboration:Thisstandardisincludedbecausetheremightbeuncommondesignflawsthatarenotcoveredbyotherstandardsbutthatmaycauseadditionalriskofbias.Below,someexamplesareprovided.Whenvariousprofessionalsareinvolvedinthemeasurementinstrument,andoneoftheprofessionalsistheattendingphysicianofthepatient,thisphysicianhas(much)moreinformationaboutthepatientthantheotherprofessionals.Insomesituations–dependingontheaimofthestudyandthespecificconstructtobemeasured–thiscouldbeconsideredaflawbecauseoftheinfluenceonthescoresobtained.InthepreviouschapterwesawintheexampleofSkeiethatpartofthesamplecomprisedhealthypatients,whereastheauthorswereultimatelyinterestedinthesemeasurementsinlowbackpainpatients(19).Asthiswillincreasethevariancebetweenpatients,anditwillincreasetheresultsofthestudy(i.e.theICCorGCoefficient).Dependingonwherethisstudysitsinthedevelopmentoftheinstrument,thiscouldbedeemedproper(whenthefullrangeofthescoresisnotyetknown)oranimportantflawwhenthepurposeistodeterminethereliabilityofmeasurementintheclinicalsettingoflowbackpain.AfinalexamplereferstotheuseoftheICCmodelforaveragescores.Althoughdiscussedunderstandard7forreliability,itmaybethattheICCforthemeanscoreofthemeasurementsisreported,whereasinclinicalpracticethesinglescoreisused.Dependingonthepurposeofthestudythiscanbeproper(whenthemeanscoreisgoingtobeusedinfutureresearch)oranimportantflawwhenthestudyisaimedatprovingreliabilityonclinicalpractice(wherethesinglescoreisused).

ItisuptotheuseroftheCOSMINRiskofBiastoolwhetheraflawisconsideredminor(andisratedas‘doubtful’)orimportant(andisratedas‘inadequate’).Thescoresoftheotherflawsareincludedintheoverallscore/ratingbasedontheworstscorecountsprinciple.

Page 37: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

37

Standard7:Preferredstatisticalmethodsforcontinuousscores verygood adequate doubtful Inadequate

Forcontinuousscores:wasanintraclasscorrelationcoefficient(ICC)calculated?

ICCcalculated;themodelorformulawasdescribed,andmatchesthestudydesignandthedata

ICCcalculatedbutmodelorformulawasnotdescribedordoesnotoptimallymatchthestudydesignORPearsonorSpearmancorrelationcoefficientcalculatedWITHevidenceprovidedthatnosystematicdifferencebetweenmeasurementshasoccurred

PearsonorSpearmancorrelationcoefficientcalculatedWITHOUTevidenceprovidedthatnosystematicdifferencebetweenmeasurementshasoccurredORWITHevidenceprovidedthatsystematicdifferencebetweenmeasurementshasoccurred

Elaboration:Forcontinuousscorestheintraclasscorrelationcoefficient(ICC)ispreferredtoevaluatereliability.ICCsareafamilyofstatisticalparameters,includingGeneralizability(G)coefficients,andDecision(D)coefficients.Togeta“verygood”rating,theICCmodelusedinthereliabilitystudyshouldmatchthestudydesign(andtheaim)ofthestudythatisbeingassessed.Therefore,themodelorformulaoftheICCorGCoefficientusedshouldbedescribed.Itshouldbeclear,e.g.whetheracrossedornesteddesignwasused(seealsopage25/26),orwhetheraone‐wayrandomeffectsmodel,two‐orthree‐wayrandomormixedeffectsmodelwasused.Next,itshouldbecomparedtothestudydesignusingtheextractedinformationfromPartA,anddeterminedwhethertheICCorGCoefficientusedindeedmatchesthestudydesign.TheICCbasedonthetwo‐waymixedeffectsmodelofconsistency(31)(alsoreferredtoasICCmodel3.1(32)),andthePearsonorSpearmancorrelationcoefficientdonottakeasystematicdifferencebetweentherepeatedmeasurementsintoaccount,andarethereforeconsideredlessappropriate,asitcanleadtooverestimatingthereliability.Therefore,basedoninformationofasystematicdifferencebetweenthesourceofvariationconsidered(e.g.raters)either‘adequate’(whennoorverylittlesystematicdifferenceoccurred),or‘doubtful’(whentherewasasystematicdifferencebetweene.g.theraters)canberated.Whenthestudywasdesignedtoinvestigateaspecificsourceofvariation(e.g.inter‐rater),andthesystematicdifferencesbetweenthissourceofvariationintherepeatedmeasurementswastakenintoaccountintheformula(forexample,byusingtheICCrandomeffectsmodelforagreement(31),alsoreferredtoasModel2.1(32)ortheφcoefficient(seee.g.(23)),thestudycanberatedas‘verygood’.Whenastudyisdesignedwithoutanyspecificsourceofvariationisconsidered,theappropriateICCmodelisaone‐wayrandomeffectsmodel(31).Inthissituationtheuse

Page 38: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

38

ofaone‐wayrandomeffectsmodelcanberatedas‘verygood’,whiletheuseofothermodelscanberatedas‘adequate’.Next,theICCcanbecalculatedforasinglemeasurementoranaveragemeasurement(31).Ifasinglemeasurementisnormallyusedinclinicalpracticeortrials(andnottheaveragescoreofmultiplemeasurements,suchisdonebyabloodpressuremeasurement),theICCforsinglemeasuresshouldhavebeencalculated.TheICCaveragereferstothereliabilityoftheaveragedscoreofthemeasurements,andreferstotheuseoftheaveragedscoreonrepeatedmeasurements.WhentheICCforaveragemeasuresisreported,inthesituationthatusuallyasinglemeasurementistaken,werecommendthisstandardtoberatedas‘adequate’,asthemodeldoesnotoptimallymatchthedesignofthestudy.However,wealsorecommendinthissituation,toratestandard6(i.e.otherflaws),as‘doubtful’oreven‘inadequate’(seealsotheexampleatstandard6).Moreover,togeta‘verygood’rating,thedescribedICCorGcoefficientmodelorformulashouldmatchthedata.Ifthereisa(known)problemwithnormaldistributionofthedata(normality)whichisnotproperlytakenintoaccount,thestudycouldberatedas‘adequate’insteadof‘verygood’.Itisimpossibletodescribeallotherflawshere,ThereforeitisuptotheuseroftheCOSMINRiskofBiastooltodecidehowtheidentifiedflawshouldbescored.Relevantquestioninthisregardishowcertainandhowlargetheinfluenceisonthestudyresult.Standard8:Preferredstatisticalmethodsforordinalscores verygood adequate doubtful inadequate

Forordinalscores:wasa(weighted)kappacalculated?

Kappacalculated;theweightingschemewasdescribed,andmatchesthestudydesignandthedata

Kappacalculated,butweightingschemenotdescribedordoesnotoptimallymatchthestudydesign

Elaboration:Toassessreliabilityforordinalscores,Cohen’skappa(33‐35)isconsideredthepreferredstatisticalparameter.Nobetteralternativeisknown(4,36).Informationonthespecifickappausedshouldbedescribedintermsofwhetheraweightingschemewasusedandwhichschemewasused.Unweightedkappaconsidersanymisclassificationequallyinappropriate.However,amisclassificationoftwoadjacentcategoriesmaybelesserroneousasamisclassificationofcategoriesthataremoreapartfromeachother.Aweightedkappatakesthisintoaccount(e.g.usinglinearorquadraticweights(37)).Ifthegoalofthestudywastoconsideranymisclassificationasequallyimportant,anditwasstatedthattheunweightedkappawasused,thisstandardcanberateda‘verygood’.However,inothersituation(e.g.misclassificationofcategoriesmore

Page 39: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

39

apartfromeachotherisabiggerproblemthatmisclassificationofadjacentcategories)aspecificweightingschemeismorepreferred.Ifunweightedkappacalculatedinthatcasethestandardcouldberatedas‘adequate’.Standard9:Preferredstatisticalmethodsfordichotomousornominalscores

verygood adequate doubtful inadequate

Fordichotomous/nominalscores:wasKappacalculatedforeachcategoryagainsttheothercategoriescombined?

Kappacalculatedforeachcategoryagainsttheothercategoriescombined

Elaboration:Astudyonreliabilityofanoutcomemeasurementinstrumentwithdichotomousornominalscoresgetsa‘verygood’score,whenanunweightedkappawascalculatedofeachcategoryagainsttheothercategories(33).

Page 40: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

40

3.2Elaborationonstandardsforstudiesonmeasurementerror

Standards1to6oftheboxforstandardsforstudiesonmeasurementerrorarethesameasforstudiesonreliability.Foranelaborationoneachofthestandards,pleaseseeabove.Standard7:Preferredstatisticalmethodsforcontinuousscores

verygood adequate doubtful inadequate

Forcontinuousscores:wastheStandardErrorofMeasurement(SEM),SmallestDetectableChange(SDC),LimitsofAgreement(LoA)orCoefficientofVariation(CV)calculated?

SEM,SDC,LoAorCVcalculated;themodelorformulafortheSEM/SDCisdescribed;itmatchesthereviewerconstructedresearchquestionandthedata

SEM,SDC,LoAorCVcalculated,butthemodelorformulaisnotdescribedordoesnotoptimallymatchthereviewerconstructedresearchquestionandevidenceprovidedthatnosystematicdifferencehasoccurred

SEMconsistencySDCconsistencyorLoAorCVcalculated,withoutknowledgeaboutsystematicdifferenceorwithevidenceprovidedthatsystematicdifferencehasoccurred

SEMcalculatedbasedonCronbach’salpha,ORusingSDfromanotherpopulation

Elaboration:ForcontinuousscorespreferredmeasuresforthemeasurementerrorofasinglescorearetheSEM,LoAortheCoefficientofVariation(CV);theSDCispreferredasameasureforchangescores.Differentformulascanbeusedtocometocalculatethesevariousmeasures.Therefore,wewillfirstdescribetheirformulas.Subsequently,wewillexplainthestandardforstudiesusingSEMandSDCderivedfromvariancecomponentsanalyses.Next,wewilldiscussLoA,SEMandSDCusingtheSDdifference.Wewillexplainwhenignoringtheinfluenceofthesourceofvariationisappropriate.Andlast,wewilldiscusssomeothermethodsused,includingtheCV.Measuresthattakeallerrorintoaccount,includingthesystematicdifferencebetweenrepeatedmeasurements,basedonaone‐wayortwo‐wayeffectsmodel,are:

(1)

(2)

1.96 ∗ √2 ∗ 1.96 ∗ √2 ∗ (3)

Page 41: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

41

Measuresthatdonottakethesystematicdifferencebetweenrepeatedmeasurementsintoaccount:

(4)

1.96 ∗ √2 ∗ 1.96 ∗ √2 ∗ (5)

(6)

1.96 ∗ √2 ∗ 1.96 ∗ √2 ∗√

(7)

1.96 ∗ (8)

1.96 ∗ (9)

Togeta‘verygood’rating,theformulausedshouldmatchthestudydesign(andtheaim)ofthestudythatisbeingassessed.Therefore,itshouldbeclearwhattheaimis,andwhichmeasureorwhichformulawasusedinthestudybeingassessed.Measurementerrorderivedfromvariancecomponentsanalyses(formulas1‐5)Thespecificmodelusedshouldbeclearlydescribed,e.g.whetheraone‐wayrandomeffectsmodel,oratwo‐orthree‐wayrandomormixedeffectsmodelwasused,andwhetherallerror(exceptfromthevarianceduetovariationbetweenpatients)wasincludedinthecalculationofthemeasurementerror,orwhetherthesystematicerrorbetweenthesourceofvariationthatisbeingvariedinthedesignisignored(i.e.asoccurredwhencalculatingSEMconsistencyforsinglescores(formula4)andSDCconsistencyforchangescores(formula5)).Next,itshouldbecomparedtothestudydesignusingtheextractedinformationaboutthecomprehensiveresearchquestion(seePartAofthetool),anddeterminedwhetherthemethodusedindeedmatchesthestudydesign.Inotherwords,whentheaimofthestudywastoassessthemeasurementerrorofasinglescoreofanymeasurementtakeninclinicalpracticeoftrials,theaimistogeneralizetheresultsbeyond(e.g.)thespecificratersinvolvedinthestudy.Inthiscase,thesystematicerrorbetweenratersshouldbetakenintoaccount;theraters(inthisexample)shouldbeconsideredrandom;andallerrorshouldbetakenintoaccount(i.e.formulas1‐3)tomatchthedesignofthestudy(andthisisrated‘verygood’).Ifinthiscase,(withtheaimtogeneralizebeyondthespecificraters)theSEMconsistency(formula4)orSDCconsistency(formula5)wascalculated(i.e.ignoringasystematic

Page 42: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

42

differencebetweenraters),evidenceshouldbeprovidedthatno(oronlyverysmall)systematicdifferencehasoccurredbetweentheraters.Incaseofnoorverysmalldifferencesthestandardcanberatedas‘adequate’,astheSEMagreement(formula2)andSEMconsistency(formula4),orSDCagreement(formula3)andSDCconsistency(formula5)willbethesameorveryclose.Ifitisunclearwhethersystematicdifferencesoccurred(becauseitwasnotreported),thestandardisratedas‘doubtful’.MeasurementerrorderivedfromtheSDdifference(formulas6‐9)ThemeasurementerrorofasinglescoreorachangescorecanalsobecalculatedusingtheSDdifference.Thisreferstothestandarddeviationofthedifferenceofthescoresontherepeatedmeasurements(38,39).InaBlandandAltmanplottworepeatedmeasurementsperpatientareplotted:onthex‐axesthemeanscoreofthetwomeasurements,andonthey‐axesthemeandifferencebetweentherepeatedmeasurements(39).Althoughtheplotisdesignedinsuchawaythatsystematicdifferencescaneasilybeseen(i.e.thelineofthemeandifferencesinscores,andtheasymmetricallylocatedlimitsofagreementaroundthezero),thesystematicdifferenceisdisregardedwhentheSDCiscalculatedfromtheselimits(resultingintheSDCconsistency).Therefore,ifa(large)systematicerrorbetweentherepeatedmeasurementsoccurred,whiletheaimofthestudyistogeneralizebeyondthespecificsourceofvariation(e.g.raters),thestandardshouldberatedas‘doubtful’,astheresultsofthestudyisunderestimatingthemeasurementerror.Whenisameasureofconsistency(formulas4‐9)appropriate?Sometime,thesourceofvariationthatisbeingvariedacrossthemeasurementsisconsideredtobefixedinastudy.Thismeansthattheaimofthestudyisnottogeneralizebeyondthespecificstudyobjectsincludedinthestudy.Forexample,inastudyonlytworatersareconsidered(e.g.theratersMyrtheandBrechtje),andtheaimofthestudyiswhetherthesetworaterswillcometoequalscores(e.g.becausetheywillbetheonlytworatersinvolvedinthemeasurementsforaspecifictrial).IfasystematicerroroccursbetweenMyrtheandBrechtje(e.g.Myrthesystematicallyscores5pointshighercomparedtoBrechtje),thescoresobtainedinthetrialcaneasilybeadjustedbyextracting5pointsofeachmeasurementobtainedbyMyrthe.Inthisstudy,thesourceofvariation‘rater’isdeemedirrelevant(31),asthesystematicdifferencewillbeadjustedlateronwhenusingtheinstrumentbyeitherMyrtheorBrechtje.Inthisspecificsituation,theSEMconsistency,SDCconsistencyorthelimitsofagreementmatchtheaimanddesignofthestudy,soitcanberatedas‘verygood’.However,theseresultscannotbegeneralizedtootherraters,as‘rater’wasconsideredfixed.Therefore,thestudyislessrelevantinothersituations,especiallywhenthereisasystematicdifferencebetweentheraters.

Page 43: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

43

MeasurementerrorcalculatedusingtheformulaSD*(√1‐ICC)ThereisanotherformulawhichissometimesusedtocalculatetheSEMfromtheICCformula:SEM=SD*(√1‐ICC)(40).ThestandarddeviationreferstotheSDpooledofthesample,thatisofSDtestandSDretest.UsingthisformulaisonlyjustifiedifthedataforICCandSDarederivedfromthesamestudy.WhentheSDisbasedonanotherpopulation,thisisconsideredinadequate,astheSDofthisotherpopulationmaybesmaller,andsubsequently,themeasurementerrorissmaller.Moreover,sometimestheCronbach’salphaisinsertedintheformulainsteadoftheICC.Thisisconsideredinadequate,asthismeasureisbasedononefull‐scalemeasurementwhereitemsareconsideredastherepeatedmeasurements,insteadofatleasttwofull‐scalemeasurementsusingthetotalscoreinthecalculationoftheSEM.OftenCronbach’salphaishigherthanICC’sbasedonrepeatedmeasurements,thusleadingtosmallerSEMvalues.Byratingthisinadequate,theresultofthisstudycanstillbeconsidered,however,itisconsideredtobelesstrustworthy.Moreover,Cronbach’salphaissometimesusedinadequately,becauseitiscalculatedforascalethatisnotunidimensional,orbasedonaformativemodel.InsuchcasestheCronbach’salphacannotbeinterpreted.Otherparametersthatarebasedonsinglemeasurements,suchasthepersonseparationindex(orotherIRT‐basedmeasurementerrormeasures)ortheOmega,arenotcoveredbythemeasurementerroraccordingtotheCOSMINtaxonomy,butbyinternalconsistency.TheCoefficientofvariationCoefficientofvariation(CV)isalsoaparameterofmeasurementerror.Itisoftenusedinphysicsandtopresentthemeasurementerrorofadevice.Whendevelopinganewdevicethemeasurementerrorisassessedbymeasuringafixedsamplemany(e.g.50)times.TheSDofthesemeasurementsisthestandarderrorofmeasurements.Oftenthemeasurementerrorincreaseswithhighervalues.ForthesesituationCVisasuitablemeasure,asCVexpressestheSDaspercentageofthemeanvalue:informulaCV=SD/mean.Usually,itisexpressedinpercentage,forexample,themeasurementerroris2%ofthemeasuredvalue.TheassumptionunderlyingCVisthattheCVgivesaconstantvalueoverallvaluesofthemean,sothattheSDise.g.2%ofthemeanvalue,regardlessofameanvalueof10or100or1000.InaBlandandAltmanplot,wehadacontraryassumption,i.e.thattheSDofthedifferenceisconstantoverthemeanvalues,ontheX‐axis.Ifthedifferencesarelowerwithsmallvaluesandhigherwithlargevaluethehorizontallinesofthelimitsofagreementgiveawrongvalue:toolargeforthesmallvaluesandtoosmallforthelargemeanvalues.Inthatcaseoneshouldtransformthedata.Oftenanaturallogarithmor10loglogarithmtransformationisused.Thishastheadvantagethatthelimitsofagreementcanbedirectlyexpressedinacoefficientsofvariation(41).

Page 44: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

44

Standard8:Preferredstatisticalmethodsfordichotomous,nominal,orordinalscores

verygood adequate doubtful inadequate

Fordichotomous/nominal/ordinalscores:Wasthepercentagespecific(e.g.positiveandnegative)agreementcalculated?

%specificagreementcalculated

%agreementcalculated

Elaboration:Oftenkappaisconsideredasameasureofagreement,however,kappaisameasureofreliability(42).Anappropriateparameterofmeasurementerror(alsocalledagreement)ofdichotomous/nominal/ordinalscoresistheproportionofspecificagreement(42‐44).Itisameasurethatexpressestheagreementseparatelyforeachcategoryofthescore–thatispositiveandnegativeratingsagreementincasethescoreisdichotomous.

Page 45: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

45

3.3ExampleofhowtousePartBoftheCOSMINRiskofBiastooltoassessthequalityofastudybySkeieetal.(2015)

InthischapterweprovideanexampleofhowtousetheCOSMINtool–PartBusingagainthepaperbySkeieetal.(19).TofullyunderstandtheexplanationinTable7,werecommendtofirstreadtheintroductionandmethodsectionofthepaper,andthesummaryprovidedatpage27/28.Inthispaperfourdifferentstudiesaredescribed.Hereweusethefirsttwosubstudies.

Table7.ExampleofhowtousePartBoftheCOSMINRiskofBiastoolbasedonthestudybySkeie(19).

StandardsondesignrequirementsforReliabilityandMeasurementerrorDesignrequirements Ratingstudy1 Ratingstudy2 1 Werepatientsstableinthetimebetween

therepeatedmeasurementsontheconstructtobemeasured?

NA(measurementswerebasedonastillimage

Verygood.Measurementswereconductedinsuccession.

2 Wasthetimeintervalbetweentherepeatedmeasurementsappropriate?

NA Verygood.Thetimeinterval(i.e.thesecondraterstartedimmediatelyafterthefirsthadcompletedtheprocedure)hasprobablynotinfluencedthescores.

3 Werethemeasurementconditionsimilarfortherepeatedmeasurements–exceptfortheconditionbeingevaluatedasasourceofvariation?

Verygood Verygood

4 Didtheprofessional(s)administerthemeasurementwithoutknowledgeofscoresorvaluesofotherrepeatedmeasurement(s)inthesamepatients?

Verygood.Noneofthepreviousscoreswereavailable

Verygood.Noneofthepreviousscoreswereavailable

5 Didtheprofessional(s)assignthescoresordeterminedthevalueswithoutknowledgeofthescoresorvaluesofotherrepeatedmeasurement(s)inthesamepatients?

Verygood.Noneofthepreviousscoreswereavailable

Verygood.Noneofthepreviousscoreswereavailable

6 Werethereanyotherimportantflawsinthedesignorstatisticalmethodsofthestudy?

Forreliability:Doubtful.5of30persons(seeTable1ofthepaper)werepain‐freesubjects,whichcouldhavemajorlyincreasedthevariationbetweenthepatients,andsubsequentlytheICC

Forreliability:Verygood.(inthisstudynopain‐freepersonswereincluded,seeTable1ofthepaper)

Formeasurementerror:verygood.Heterogeneityofthesampleisconsideredlessaproblem,asthevariationbetweenpatientsisnotincludedintheparameter.

Page 46: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

46

StandardsonpreferredstatisticalmethodsforReliability Ratingstudy1 Ratingstudy2

7 Forcontinuousscores:wasanIntraclass

CorrelationCoefficient(ICC)calculated?

Adequate.ICCtwo‐waymixedsinglemeasures(3.1)andtwo‐waymixedaveragemeasures(3.2)werecalculated.ThisistheICCconsistency,whichdoesnottakethesystematicerrorbetweenratersintoaccount.Thestudyaimstogeneralizebeyondtheratersinvolved,therefore,theratersshouldnotbeconsideredfixed,andtheICCmodeldoesnotmatchoptimallytheresearchaimanddesign.BasedonthemeanofthemeasurementsprovidedinTable2,wecanconcludethatnosystematicdifferencebetweentheratersoccurred.TheICCtwo‐waymixedaveragemeasures(3.2)referstothepracticeinwhichtworaterswouldmeasureeachpatient(withtripleplacementofsecondmarker),andbothfinalscoreswereaveraged.Asthiswillnotbecommonpractice,wewillignorethisICC.Therepetitionofpartofthemeasurementisalreadypartofonemeasurement.

8 Forordinalscores:wasa(weighted)

Kappacalculated?

Notapplicable Notapplicable

9 Fordichotomous/nominalscores:was

Kappacalculatedforeachcategoryagainst

theothercategoriescombined?

Notapplicable Notapplicable

FinalRiskofBiasratingReliabilitystudies Doubtful Adequate

StandardsonpreferredstatisticalmethodsforMeasurementerrorRatingstudy1 Ratingstudy2

7 Forcontinuousscores:wastheStandard

ErrorofMeasurement(SEM),Smallest

DetectableChange(SDC),Limitsof

Agreement(LoA)orCoefficientofVariation

(CV)calculated?

Adequate,asthelimitsofagreementwerecalculated,whiletheaimwastogeneralizebeyondtheratersincludedinthisstudy,andprobablytherewasnosystematicdifferencebetweentheraters.

8 Fordichotomous/nominal/ordinalscores:

Wasthepercentagespecific(e.g.positiveand

negative)agreementcalculated?

Notapplicable Notapplicable

FinalRiskofBiasratingstudyonMeasurement

error

Adequate Adequate

Page 47: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

47

4. UsingtheCOSMINRiskofBiastoolinasystematicreviewofoutcomemeasurementinstruments

Researchersandclinicianswhoaredecidingonthemostsuitableoutcomemeasurementinstrumentforuseintheirstudy,canoftenchoosefrommultipledifferentinstruments.Theselectionshouldbebasedontheevidenceofthequalityoftheoutcomemeasurementinstruments(i.e.reliability,validity,andresponsiveness),aswellasonaspectsoffeasibilityandinterpretability.Ahigh‐qualitysystematicreviewonoutcomemeasurementinstrumentsgivesaclearoverviewofallimportantaspectstomakeyourchoice.Understandingthequalityofthestudiesandthequalityofthemeasurementinstrumentunderstudyisachallengingtask,specificallyforresearchersandclinicianswhoarelessfamiliarwiththemethodologytoevaluateallmeasurementproperties.Therefore,in2018,we(COSMINinitiative)publishedathoroughmethodologytoconductasystematicreviewofPROMs(5).Itconsistedofaten‐stepproceduretosummarizetheavailableevidencepermeasurementpropertyperincludedPROManddrawconclusionsoneachmeasurementpropertyperPROM.Andsubsequently,togiverecommendationsofthemostsuitablePROMforagivenpurpose,includingalsofeasibilityandinterpretabilityaspects.ThismethodologyalsoincludestheCOSMINRiskofBiaschecklisttoassessthequalityofstudiesonmeasurementpropertiesofPROMs(1),includingstandardsfordesignrequirementsandpreferredstatisticalmethodsorganizedinboxespermeasurementproperty.ToperformasystematicreviewonthequalityofClinROMs,PerFOMsandlaboratoryvalues,thesamemethodologycanbeused.However,werecommendsomeadaptations.TwoaspectsoftheCOSMINmethodologyforsystematicreviewsofPROMsaredifferentforClinROMs,PerFOMsorlaboratoryvalues:recommendationtousedifferentboxesforreliabilityandmeasurementerror,andtheadditionofanewstepThenewboxesInsystematicreviewsofClinROMs,PerFOMsorlaboratoryvaluestheCOSMINRiskofBiaschecklistforPROMs(1)canbeused,althoughtheboxesforreliabilityandmeasurementerrorshouldbereplacedwiththeCOSMINRiskofBiastooltoassessthequalityofastudyonreliabilityormeasurementerror(4).Standardsformostoftheremainingmeasurementproperties(i.e.contentvalidity,internalconsistency,constructvalidity,criterionvalidityandresponsiveness)developedforPROMscanbeusedforothertypesofmeasurementinstrumentsaswell.Somemeasurementpropertiesareonlyrelevantformulti‐iteminstrumentsbasedonareflectivemodel(i.e.structuralvalidityandinternalconsistency).Forsomeothermeasurementpropertiesonlythefinalscoreorvalueofameasurementinstrumentisconsidered(i.e.hypothesestesting

Page 48: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

48

forconstructvalidity,criterionvalidityandresponsiveness).Thequalityofstudiesonthesemeasurementpropertiesaresimilarlyassessedforalltypesofoutcomemeasurementinstruments,andtheexistingboxesfromtheCOSMINRiskofBiaschecklistforPROMscanbeused.AnadditionalstepInareliabilitystudyorastudyonmeasurementerrorofaPROMthefocusofinterestisusuallyonthequalityofthePROMasitisbeingusedinclinicalpractice(analyzedusingaone‐wayrandomeffectsmodel),orinthetest‐retestreliability(usingatwo‐wayrandomeffectsmodelofagreement).However,thefocusofinterestinareliabilitystudyofothertypesofmeasurementinstrumentsismuchmorediverse.Asexplainedinchapter2,therearemanypotentialsourcesofvariation(i.e.manydifferentwaystooperationalizethecomponentsofoutcomemeasurementinstruments)thatcouldbethefocusofinterestinastudyonreliability.Eachresultofallthosestudiestellsyousomethingaboutthequalityoftheinstrument(andgivessuggestionsforimprovementofthemeasurementbystandardizingorrestrictingthesourceofvariationwhichshowedthelargesterror).Basedonanoverviewofallthesestudies,anbest‐evidencemeasurementprotocolcanberecommended.InaCOSMINreviewsofClinROMs,PerFOMsorlaboratoryvalues,anadditionalstepisneededintheten‐stepprocedure(seeFigure3),specificallyintheassessmentofreliabilityandmeasurementerror.Towellinterprettheresultsofstudiesincludedinasystematicreview,youneedtodecidehowtheresultsofthestudyyouwanttoassessinformyouaboutthequalityofthemeasurementinstrument.Therefore,weseparatedtheassessmentofreliabilityandmeasurementerrorfromtheothermeasurementproperties.Changeinthemethodology

Basedonourexperienceusingthemethodology,wedecidedtoremovestep8(whichwas‘Evaluateinterpretabilityandfeasibility’)fromthemethodology.Aspectsofinterpretabilityandfeasibilityareonlyextracted(andsummarized)ratherthanevaluated.Therefore,thisstepisirrelevantinthemethodology.However,weconsideritveryusefultohaveaseparatestepondataextraction.Onceyouincludedallthestudiesinareview,wefirstrecommendyoutoextractallnecessaryinformationfromanarticle,beforeassessingtheriskofbias,andthequalityoftheinstrument.Relevantinformationtobeextractedreferstocharacteristicsoftheincludedmeasurementinstruments,informationonfeasibilityandinterpretability,characteristicsofthestudies,andtheresultsofthestudy.

Consequently,thestep‐numbersaredeviatingfromthestepnumberspresentedintheoriginal10‐stepprocedureoftheCOSMINmethodologytoconductasystematicreviewofPROMs(5).

Page 49: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

49

Figure3.Eleven‐stepprocedureforconductingasystematicreviewonanytypeofoutcomemeasurementinstrument

Page 50: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

50

4.1Theeleven‐stepprocedureforconductingasystematicreviewofClinROMs,PerFOMs,orlaboratoryvalues

Below,asummaryisgivenfortheeleven‐stepprocedure.IntheusermanualoftheCOSMINmethodologyforsystematicreviewsofPROMs(45)athoroughexplanationofeachstepisprovided.OnlythestepsthataredifferentforareviewofoutcomemeasurementinstrumentsotherthanPROMsaredescribedhereindetail.Pleasenotethatthenumberofthesteparechanged.

Themethodologyofasystematicreviewofoutcomemeasurementinstrumentsissubdividedintothreeparts(A,B,andC)(5).

Step1‐4:Performtheliteraturesearch

Thesteps1‐4arestandardprocedureswhenperformingsystematicreviews,andareinagreementwithexistingguidelinesforreviews(46,47):formulatingthespecificaimofthereview,andtheeligibilitycriteria,performingtheliteraturesearch,andselectingrelevantpublications.

Intheresearchquestion,andeligibilitycriteriafourkeyelementsshouldbeincluded:1)theconstruct;2)thepopulation;3)thetype(s)ofinstruments;and4)themeasurementpropertiesofinterest.

Inthesearchstrategywerecommendtoalsousethesekeyelements,exceptfromthetypeofinstruments,aswearenotawareofhighlysensitivesearchblocksfordifferenttypesofmeasurementinstruments.Searchfiltersfordifferentconstructsmaybefoundathttps://blocks.bmi‐online.nl/.Whenusingthesearchfilterforfindingstudiesonmeasurementproperties(48)ofCLinROMs,PerFOMsandlaboratoryvalues,werecommendtouseadditionalsearchtermsforfindingstudiesusingGeneralizabilitytheory.Thisstring,developedwiththehelpofaclinicallibrarian,canbeaddedwiththebolean“OR”tothesearchfilter.

PubmedsearchstringforfindingstudiesusingGeneralizabilitytheory:

G‐theory[tiab]OR"Gtheory"[tiab]OR"generalizabilitytheory"[tiab]OR"generalisabilitytheory"[tiab]

EMBASEsearchstringforfindingstudiesusingGeneralizabilitytheory:

‘g‐theory’:ti:abOR‘gtheory’:ti,abOR‘generalizabilitytheory’:ti,abOR‘generalisabilitytheory’:ti,ab

Page 51: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

51

Step5:Dataextraction

Onceyouincludedallrelevantarticles,youcheckperarticlewhichmeasurementpropertieswereevaluated(andsubsequentlydecidewhichCOSMINboxesarerelevanttobecompletedforthespecificarticle).Whenreadingthroughthearticle,atthispoint,werecommendyoutoextractallinformationfromthearticleaboutthecharacteristicsoftheincludedmeasurementinstruments(forsuggestionsofcharacteristicsseeappendix4),includingaspectsoffeasibilityandinterpretability(seebelow).Interpretabilityisdefinedasthedegreetowhichonecanassignqualitativemeaning(thatis,clinicalorcommonlyunderstoodconnotations)toaquantitativescoreorchangeinscoresofanoutcomemeasurementinstrument(7).Boththeinterpretabilityofsinglescoresandtheinterpretabilityofchangescoresisinformativetoreportinasystematicreview.Theinterpretationofsinglescorescanbeoutlinedbyprovidinginformationonthedistributionofscoresinthestudypopulationorotherrelevantsubgroups,asitmayrevealclusteringofscores,anditcanindicatefloorandceilingeffects.TheinterpretabilityofchangescorescanbeenhancedbyreportingM(C)ICvalues.However,thereisanongoingdebateabouthowthesevaluesshouldbeassessed.

Feasibilityisdefinedastheeaseofapplicationofthemeasurementinstrumentinitsintendedcontextofuse,givenconstraintssuchastimeormoney(49).Aspectsoffeasibilityare,forexample,completiontime,costofaninstrument,lengthoftheinstrument,typeandeaseofadministration.Feasibilityappliestoboththepatientsandtheprofessionalwhoareinvolvedinthemeasurement.Theconcept‘feasibility’isrelatedtotheconcept‘clinicalutility’,wherefeasibilityreferstoameasurementinstrument,andclinicalutilityreferstoanintervention(50).

Interpretabilityandfeasibilityarenotmeasurementpropertiesbecausetheydonotrefertothequalityofanoutcomemeasurementinstrument.However,theyareconsideredimportantaspectsforawell‐consideredselectionofanoutcomemeasurementinstrument.

Page 52: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

52

Steps6‐9:Evaluatethemeasurementproperties

Thesteps6‐9concerntheevaluationoftheninemeasurementpropertiesoftheincludedoutcomemeasurementinstruments.Inthesestepspermeasurementproperty,dataisextractedonthecharacteristicsofthestudies,andtheresultofeachstudy,theriskofbiasoftheincludedstudiesisratedbyusingtheCOSMINRiskofBiasstandards,andtheresultsofthestudiesareratedbyapplyingthecriteriaforgoodmeasurementproperties.Subsequently,allevidenceissummarized,andthequalityofallavailableevidencepermeasurementpropertypermeasurementinstrumentisgradedusingamodifiedGRADEapproach.

Characteristicsofthestudiesrefertothecharacteristicsoftheincludedpatientpopulations,andpopulationofincludedprofessionals(forsuggestionsofcharacteristicsseeappendix5).Forspecificrecommendationsforextractinginformationontheresultsofstudiesonreliabilityandmeasurementerrorseestep8extractinginformation(p53).

Instep6thecontentvalidityisassessed.Instep7theinternalstructure(structuralvalidity,internalconsistencyandcross‐culturalvalidity\measurementinvariance)isassessed.Astheassessmentofreliabilityandmeasurementerrorrequiresanadditionalstep(i.e.understandinghowtheresultsofastudyinformyouaboutthereliabilityormeasurementerrorofaoutcomemeasurementinstrument),thesetwomeasurementpropertiesarenowassessedinaseparatestep,i.e.step8,apartfromtheassessmentofthemeasurementpropertiescriterionvalidity,hypothesestestingforconstructvalidity,andresponsiveness(i.e.step9).

Step6.Evaluatecontentvalidity

Instep6contentvalidityisevaluated.InthecurrentstandardsandcriteriaforassessingcontentvalidityofPROMs(6)emphasizeisputontherelevance,comprehensiveness,andcomprehensibilityofthePROMfortheconstruct,targetpopulation,andintendedcontextofuse.InthisassessmentalsothedevelopmentofthePROMisconsidered,specifically,theitemelicitationphaseandtheresultsfromthepilot‐testingphase.Theassessmentofcontentvalidityofothertypesofinstrumentsmaybedifferent,andmoreresearchisneededtodevelopstandardsandcriteriaforothertypesofmeasurementinstruments.

Assessingthecontentvalidityofmeasurementinstrumentsthatincludemultipleitems–eitherbasedonareflectiveorformativemodel–canheavilyleanonthestandardsandcriteriaforPROMs.Only,becauseprofessionalsareinvolvedinthemeasurement,thethreeaspectsofcontentvalidity(i.e.relevance,comprehensiveness,andcomprehensibility)shouldbeaskedtotheprofessionals.Dependingontheconstructofinterest,theseaspectscouldbeaskedtopatients,too,forexampleforPerFOMs,orClinROMsaboutsymptomsorseverityofconditions.

Page 53: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

53

Fortheassessmentofcontentvalidityofmeasurementinstrumentsthatexistofasingleparameter(e.g.imaging‐basedparameters,orlaboratoryvalues),otheraspectsarelikelymorerelevant.Forexample,youshouldjudgewhetheritmakessensethatthemeasurementinstrumentindeedmeasurestheconstructitpurportstomeasure,basedontheoryandmedicalknowledge,andbasedontheclaimsbythemanufacturer.Inaddition,theunitofmeasurementshouldmatchtheconstructtobemeasured.Forexample,a6minutewalktest–expressedinthedistancecoveredoveratimeof6minutes–measureswalkingcapacity,ratherthanphysicalfunctioning(51).Ascurrentlynostandardsandcriteriaforcontentvalidityexist,facevalidity(whichisarathersubjectivejudgmentaboutwhetherthecontentoftheinstrumentindeedlooksasanadequatereflectionoftheconstructtobemeasured)couldbeassessedbythereviewer.

Step7.Evaluatetheinternalstructure

Instep7theinternalstructure(structuralvalidity,internalconsistencyandcross‐culturalvalidity\measurementinvariance)isassessed.Thisstepisonlyrelevantwhenthemeasurementinstrumentisamulti‐iteminstrumentbasedonareflectivemodel.Thestandards(1)andcriteria(5)providedforsystematicreviewsofPROMscanbeused.

Step8.Evaluatereliabilityandmeasurementerror

Next,instep8reliabilityandmeasurementerrorareassessed.Inchapter2and3wehaveexplainedhowtoassessthequalityofeachstudyonreliabilityandmeasurementerror.

Inasystematicreviewperstudy,youshouldfirstextractinformationabouttheelementsofacomprehensiveresearchquestion(seechapter2),thespecificICCmodelorformula,andtheresultsofeachstudy.Next,youshouldassessthestudyqualityusingthestandards(seechapter3),andassesstheresultsofeachstudy,bycomparingtheresultsagainstthecriteriaforgoodmeasurementproperties(5).Subsequently,youshouldsummarizeallevidenceforreliabilityandformeasurementerror,respectively,andgradethequalityoftheevidenceusingthemodifiedGRADEapproach(5).Basedonthisoverview,youcanrecommendonthebest‐evidencemeasurementprotocolforaspecificmeasurementinstrument.

Extractinginformation

InAppendix1weprovideanexampleofadataextractiontable.First,werecommendtoextractthesevenelementsofacomprehensiveresearchquestion,andtheresearch

Page 54: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

54

questionasstatedbytheauthorsinthearticle.Basedontheelements,youcansubsequentlyformulateacomprehensiveresearchquestion.Next,werecommendtoextracttheinformationaboutthekeyelementsofthereview,i.e.theconstruct,population,typeofmeasurementinstrument,andmeasurementpropertiesofinterest.Theconstructtobemeasured(element3ofacomprehensiveresearchquestion),andthespecificmeasurementproperties(element4ofacomprehensiveresearchquestion)arealreadyextracted,sothetargetpopulationandthetypeofmeasurementinstrumentarerecommendedtobeextracted.Thetargetpopulationreferstothetargetpopulationofthespecificstudy.IntheexampleofSkeieetal.(19),thetargetpopulationwerepatientswithlow‐backpain.Thiscanbedifferentfromthestudypopulation(i.e.thesampleused)asextractedinitem7,or(slightly)differentfromthetargetpopulationofthereview(e.g.abroaderpopulation).InthestudyofSkeie,notonlypatientswithlow‐backpainwereincluded,butalsopatientswithotherspinalcomplaintssuchasmidbackpain,neckpain,and/orextremitypain,orevenpain‐freesubjects.ThetypeofmeasurementinstrumentreferstowhethertheinstrumentunderstudyisaClinROM,PerFOM,laboratoryvalue,aPROMoranObsROM.

Last,werecommendtoextractinformationaboutthestatistics:themodelorformulaused,theresult,and,ifapplicable,its95%confidenceinterval.Ifavailable,werecommendtoextractthevariancecomponents,ortheSDsampleorSDdifference(seealsochapter3.2formoreexplanation).Forordinalordichotomousdatawerecommendtoextracttherawnumbersinthecellsplusmarginaltotals.

RiskofBiasassessment

Thenextstepinthereview,istoassessthequalityofeachstudy,usingPartBoftheRiskofBiastooltoassessreliabilityandmeasurementerror(asdescribedinchapter3).Werecommendtousetheworst‐scorecountsmethodstocometoanoverallratingperstudy.InAppendix2weprovideanexampleofsuchatabletoorganizetheseratings.Werecommendthateachstudyisassessedbytwoindependentreviewers,andthattheycometoconsensus.

Comparisonagainstthecriteriaforgoodmeasurementproperties

Eachresultofeachsinglestudyonreliabilityormeasurementerrorisnowcomparedagainstthecriteriaforgoodmeasurementproperties(5).AsnocriteriafortheunweightedKappa,andCVwereprovidedintheguidelinesforsystematicreviewsofPROMs,weaddedthesemissingcriteria(seeTable8).Criteriafor%specificagreementaredifficulttoset,becausetheyare,justlikesensitivityandspecificity,highlydependentonthesituation.Asaruleofthumb80%mightbeused.

Page 55: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

55

Table8.Extendedcriteriaforgoodreliabilityandmeasurementerror(adaptedfromPrinsenetal.(5))

Reliability

+ ICCor(weighted)Kappa≥0.70

? ICCor(weighted)Kappanotreported

– ICCor(weighted)Kappa<0.70

Measurementerror

+

SDCorLoAorCV*√2*1.96<M(C)IC1;%specificagreement>80%2

? MICnotdefined

–SDCorLoAorCV*√2*1.96>M(C)IC1;%specificagreement<80%2

1theM(C)ICvaluemaycomefromanotherstudy.2Sometimesahigherpercentageismoreappropriate;whensubstantiated,thiscouldbeappropriate,too.

Summarizingtheevidence

Tocometoanoverallconclusionofthereliabilityorthemeasurementerrorofanoutcomemeasurementinstrument,oneshouldfirstdecidewhethertheresultsfrommultiplestudiescanbecombined.Youshouldtaketwoaspectsintoaccountinthisdecision.1)Dotheresultsrefertothesameinformation(i.e.refertothesameunderlyingcomprehensiveresearchquestion).Resultsfromdifferentdesigns(i.e.differentcomponentswerevariedacrosstherepeatedmeasurements)giveyouotherinformationaboutthereliabilityofaninstrument,andthereforecannotsimplybesummarized.And2)Aretheresultsconsistent,thatisallresultsareeithersufficient(+)orinsufficient(‐).Incaseofinconsistencyinresults,werecommendtosearchforreasonsforthisinconsistency,e.g.differentdesignsorstatisticalmodels,differentpopulations,differentbackgroundofraters.Subsequently,subgroupsofstudiescanbesummarized.

Tosummarizetheevidence,youcaneitherqualitativelysummarizetheresults(e.g.describetherangeoftheresults)orquantitativelypooltheresults.Inreliabilitystudies,onlythepointestimateofanICCorCohen’skappaisusedtoconcludewhetherthespecificmeasurementinstrumenthassufficientreliability(e.g.inthecriteriathatweproposeabove).Therefore,itisnotnecessarytopoolthedatatoobtainamoreprecisepointestimate.

Themeasurementerrorreferstotheabsolutedeviationofthescorefromthe‘true’scoreortheamountoferrorinthescore.Thepointestimateofthemeasurementerrorparameterreferstothisdeviationorerror,andthereforeitisusedtoknowhowprecisethemeasurementinstrumentisabletomeasureapatient.Tocometoamoreprecisepointestimatesofthemeasurementerror,theparametersobtainedinstudieswiththesamedesign(i.e.thathavethesameunderlyingcomprehensiveresearchquestion)can

Page 56: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

56

bepooled,whentheconfidenceintervalsarealsoreporting(whichcanbeobtainedusingthesamplesize(39)orbootstrappingmethods(52)).

Note,thatyoushouldonlysummarizeorpoolparametersofmeasurementerrorthatwerederivedfromthesamestudydesignandmodelorformulaused.Forexample,theSEMconsistency(eitherformula4or8,chapter3.2)andSEMagreement(formula2,chapter3.2)shouldnotbecombined.However,SEMconsistencyusingeitherformula4or6(chapter3.2)canbecombinedastheywillleadtothesameresult,andtheSDCconsistencyusingeitherformula5,7,or9(chapter3.2)canbecombined.ThesameresultsarefoundwhenusingeithertheSEMone‐wayrandomeffectsmodel(formula1,chapter3.2)orSEMagreement(formula2,chapter3.2).Thisisbecauseallsourcesofvariance(apartfromthevariancebetweenpatients)aretakenintoaccountinbothformulas.Therefore,theseparameterscanbecombined.

Handlinginconsistentresults.

Iftheresultsofstudieswiththesameunderlyingresearchquestionareinconsistent(e.g.bothsufficientandinsufficientresultsarefound),firstexplanationsforinconsistencyshouldbeexplored.Forexample,slightlydifferentstudypopulationsormethodswereused.Ifanexplanationisfound,subgroupsofstudies(e.g.nowbasedonthesamestudypopulation,orinwhichthesamesourceofvariationisvaried)canbesummarized.Theoverallconclusionfor(e.g.)reliabilitycansubsequentlybedrawnpersubgroup.Whentheexplanationisfoundinthequalityofthestudies(i.e.verygoodandadequatestudiesleadtoanotheroverallratingthandoubtfulandinadequatestudies),thedoubtfulandinadequatequalitystudiesmayonlybereported,butignoredinthisstep,andonlyverygoodandadequatequalitystudiesareconsideredtobedecisiveindeterminingtheoverallratingwhenratingsareinconsistent.Thisshouldbeexplainedinthemanuscript.

Ifstudieswiththesameunderlyingresearchquestionshowedinconsistentresults,andnoexplanationcanbefound,onecanconcludethatresultsareinconsistent.

WerefertotheUsermanualoftheCOSMINmethodologyforsystematicreviewsofPROMsformoreinformation.

Ratethequalityofthesummarizedresult

Ifmultiplestudiescanqualitativelybesummarized(e.g.therangeofresults)orquantitativelypooled,theoverallresultcanagainbecomparedtothecriteriaforgoodmeasurementproperties(seeTable8);youcanthenconcludethattheoutcomemeasurementinstrumenthaseithersufficient(+)orinsufficient(‐)reliabilityormeasurementerror.Oryoushouldconcludethattheresultsareinconsistent(±),or

Page 57: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

57

indeterminate(?).Formoreinformation,werefertotheUsermanualoftheCOSMINmethodologyforsystematicreviewsofPROMs.

GradingthequalityoftheevidenceusingthemodifiedGRADEapproach

Aftersummarizingorpoolingallevidenceperoutcomemeasurementinstrumentforreliabilityandformeasurementerror,andratingthesummarizedorpooledresultsagainstthecriteriaforgoodmeasurementproperties,thenextstepistogradethequalityoftheevidence.Thequalityoftheevidencereferstotheconfidencethatthesummarizedorpooledresultsistrustworthy.WedevelopedamodifiedGRADE(GradingofRecommendationsAssessment,Development,andEvaluation)approachtogradetheevidenceashigh,moderate,loworverylow(5),basedonthe1)riskofbias(i.e.themethodologicalqualityofthestudies),2)inconsistency(i.e.unexplainedinconsistencyofresultsacrossstudies),3)imprecision(i.e.totalsamplesizeoftheavailablestudies),and4)indirectness(i.e.evidencefromdifferentpopulationsthanthepopulationofinterestinthereview).ThisprocedureisdescribedintheUsermanualoftheCOSMINmethodologyforsystematicreviewsofPROMs(5,45).

Drawconclusionon‘best‐evidencemeasurementprotocol’

Theresultsofreliabilitystudieswiththeirspecificdesignsinformyouwhetherasourceofvariation(forexamplethetrainingofarater,thespecificmachineused)importantlyaffectsthescore(i.e.themeasurement).Ifpossible,thissourceofvariationshouldbestandardizedorrestrictedinfuturemeasurements.Bylookingatallevidenceforvarioussourceofvariation,youcannowdrawconclusionsabouthowtostandardizeandrestrictthemeasurement,anddescribethisbest‐evidencemeasurementprotocol.

Step8.Evaluatecriterionvalidity,hypothesestestingforconstructvalidity,andresponsiveness

Instep8criterionvalidity,hypothesestestingforconstructvalidity,andresponsivenessisassessed.Thestandards(1)andcriteria(5)providedforsystematicreviewsofPROMscanbeused.

Page 58: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

58

Steps10‐11:.Selecttheoutcomemeasurementinstrument

Thesteps10and11concernstheformulatingrecommendations(step10)andthereportingofthesystematicreview(step11).

Step10.Formulaterecommendations

Thegoalofasystematicreviewonmeasurementinstrumentsistogetanoverviewofallavailableevidenceonthequalityofoutcomemeasurementinstrumentsthatmeasureaspecificconstructinadefinedpatientpopulation.Basedonthisoverview,andtakingaspectsoffeasibilityandinterpretabilityintoaccount,werecommendyoutoformulateyourrecommendationsaboutthemostsuitableoutcomemeasurementinstrument.Tocometoanevidence‐basedandfully‐transparentrecommendation,werecommendtocategorizetheincludedmeasurementinstrumentsintothreecategories.Pertypeofmeasurementinstrumentyoucanconcludewhichinstrument(s)arerecommended(categoryA)orpromising(categoryB),orinsufficient(categoryC)andshouldnotbeusedanymore.

Category(A):

Werecommendusingdifferentdefinitionsofthecategory(A),dependingonthestructureofthemeasurementinstrument:

Multi‐itemreflectief

Evidenceforsufficientcontentvalidity(anylevel),ANDsufficientinternalconsistency(atleastlowquality,meaningalsosufficientstructuralvalidity)

Multi‐itemformatief

Evidenceforsufficientcontentvalidity(anylevel)

Singleitem(singleparameter)(nogoldstandard)

Sufficientfacevalidity(ratedbye.g.thereviewersteam),ANDevidenceforsufficientreliability(anylevel)

Singleitem(goldstandardavailable)

Evidenceforsufficientcriterionvalidity,ANDevidenceforsufficientreliability(anylevel)

Category(B):outcomemeasurementinstrumentnotcategorizedas‘A’or‘B’.

Category(C):outcomemeasurementinstrumentwithhighqualityevidenceforaninsufficientmeasurementproperty.

Page 59: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

59

Step11.Reportthesystematicreview

InaccordancewiththePRISMAStatement(53,54),werecommendtoreportthefollowinginformation:(1)thesearchstrategy(forexampleonawebsiteorinthe(online)supplementalmaterialstothearticleatissue),andtheresultsoftheliteraturesearchandselectionofthestudiesandmeasurementinstruments,displayedinthePRISMAflowdiagram(includingthefinalnumberofarticlesandthefinalnumberofmeasurementinstrumentsincludedinthereview)(Appendix3);(2)thecharacteristicsoftheincludedmeasurementinstruments,includingaspectsoffeasibilityandinterpretability(Appendix4);(3)thecharacteristicsofthestudies,includingthecharacteristicsoftheincludedpatientpopulations,andpopulationofincludedprofessionals(Appendix5);(4)themethodologicalqualityratingsofeachstudypermeasurementpropertypermeasurementinstrument(i.e.verygood,adequate,doubtful,inadequate),theresultsofeachstudy,andtheaccompanyingratingsoftheresultsbasedonthecriteriaforgoodmeasurementproperties(sufficient(+)/insufficient(‐)/indeterminate(?)).IntheUserManualforconductingsystematicreviewsofPROMs(45)anexampleisprovided.InAppendix6weprovideexamplesspecificallyforcolumnsonreliabilityandmeasurementerror.ThetablecouldbepublishedforexampleasAppendixorsupplementalmaterialonthewebsiteofthejournalonly;(5)aSummaryofFindings(SoF)tablepermeasurementproperty,includingthepooledorsummarizedresultsofthemeasurementproperties,itsoverallrating(i.e.sufficient(+)/insufficient(‐)/inconsistent(±)/indeterminate(?)),andthegradingofthequalityofevidence(high,moderate,low,verylow).IntheUserManualforconductingsystematicreviewsofPROMs(45)anexampleisprovided.InAppendix7weprovideexamplesspecificallyforcolumnsonreliabilityandmeasurementerror.TheseSoFtables(i.e.onepermeasurementproperty)willultimatelybeusedinprovidingrecommendationsfortheselectionofthemostappropriatePROMforagivenpurposeoraparticularcontextofuse.

Page 60: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

60

Appendix1.DataExtractiontableofrelevantinformationforeachincludedstudyinasystematicreview.

Extractionitem Instruction Study1 Study2Elementsofacomprehensiveresearchquestion1.Nameoftheinstrument

Alternatively:typeofinstrumentandparameter

2.Versionorwayofoperationalization

Allrelevantcomponentsthatareknownorexpectedtoinfluencethescore,andwhicharestandardizedorrestricted(facetofstratification(23))

Equipment:Preparatoryactions:

Equipment:Preparatoryactions:

Unprocesseddata/samplecollection:Dataprocessingandstorage:

Unprocesseddata/samplecollection:Dataprocessingandstorage:

Assignmentofthescore/determinationofthevalue:

Assignmentofthescore/determinationofthevalue:

3.Construct Descriptionofwhatisbeingmeasured

4.Measurementproperty

Reliabilityand/ormeasurementerror

5.Componentsthatwillberepeated

e.g.wholemeasurement(i.e.allcomponents)orsomeofthecomponent

6.Source(s)ofvariationvaried

Componentswhichisvariedacrossthemeasurements(i.e.focusofanalysis;facetofgeneralizability(23))

7.Patientpopulation

(i.e.facetofdifferentiation(23))

Theresearchquestion

Publishedresearchquestion

Asformulatedbytheauthors

Comprehensiveresearchquestion

Asformulatedbythereviewer

Additionalkeyelementofresearchaimofthereview

Targetpopulation Descriptionofthepopulationtowhichtheauthorswanttogeneralize

Typesofmeasurementinstrument

e.g.ClinROM,PerFOM,laboratoryvalue,PROMorObsROM

Page 61: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

61

Statisticalinformationandresults

Modelorformulaused

Statisticalmodel

Result e.g.results(95%CI)ofICC,kappa,SEM,LoAandsystematicdifference

Variancecomponents

Allreportedvariancecomponents

Applycriteriaforgoodmeasurementproperty*

sufficient(+),insufficient(‐),orindeterminate(?)

*althoughthisisarating,andnotdataextraction,weincludeithere,astherequiredinformationtomaketheratingisextractedhere.

Page 62: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

62

Appendix2.RiskofBiasratingsperstandardperstudy

RiskofBiasrating study1 rater1 rater2 consensusDesignrequirements 1 Stabilityofthepatients 2 Timeinterval 3 Similarityofmeasurementcondition 4 Administationwithoutknowledgeof

scoresorvalues 5 Scoreassignmentordeterminationof

valueswithoutknowledgeofthescoresorvalues

6 Otherimportantflaws Statisticalmethods 7 Forcontinuousscores:ICC 8 Forordinalscores:Kappa 9 Fordichotomous/nominalscores:

Kappaforeachcategoryagainsttheothercategoriescombined?

Finalrating

Page 63: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

63

Appendix3.ExampleofaFlow‐chart

Page 64: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

64

Appendix4.Exampleofreportingtableoncharacteristicsoftheincludedmeasurementinstruments.

Name(referencetofirstarticle)

Construct Intendedcontextofuse

Best‐evidencemeasurementprotocol

Targetpopulation

Typeofmeasurementinstrument

Feasibilityaspects

Interpretabilityaspects

LMMthickness(19)

Thicknessofrestingmuscle

Evaluation Trainingdiagnosticultrasound.Specificinstructionsforpatient,andprobepositions.

Patientswithlowbackpain

Ultrasound Meanscoreinmixofpainpatientswas27.9mm(±3.2)

LMMcontraction(19)

Comparisonofthethicknessofrestingmusclewiththatofactivatedmuscle

Evaluation Trainingdiagnosticultrasound.Specificinstructionsforpatient,andprobepositions.

Patientswithlowbackpain

Ultrasound Meanscoreinmixofpainpatientsranges1.3mm(±1.7)–3.5mm(±2.6)

Othercharacteristicswhichmaybeextractedare:conceptualmodelused,recommendedbystandardizationinitiatives,fullcopyavailable,fitforpurpose(diagnostic,prognostic,evaluation).

Aspectsoffeasibilityare,forexample,completiontime,licensinginformationandcostsofaninstrument,typeandeaseofadministration.Feasibilityappliestoboththepatientsandtheprofessionalwhoareinvolvedinthemeasurement.ItmaybeconsideredtoreportthisinformationinaseparateTable.

Aspectsofinterpretabilityreferto1)interpretabilityofsinglescores(e.g.informationonthedistributionofscoresinstudypopulationorotherrelevantsubgroups,andfloorandceilingeffects),and2)interpretabilityofchangescores(i.e.M(C)ICvalues).

Page 65: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

65

Appendix5.Exampleofreportingtableoncharacteristicsofthestudypopulations.

Measurementinstrument

Reference Measurementpropertyassessed

Patientpopulation Professionalpopulation Responserate

Samplesize

Patientcharacteristics Samplesize

Characteristicsofprofessionals

LMMcontraction

(19)Study2 Reliability,measurementerror

30 47%female,agemean(SD)37(±12);LBPn=20;neck/midbackpainn=5;extremitypainn=1;painfreen=4

2 Chiropractorsexperiencedindiagnosticultrasoundforthemusculoskeletalsystem,i.e.4and8yearsresp.,withapostgraduatediplomaindiagnosticultrasound.Beforethestudy.bothdevelopedtheprotocolofdiagnosticultrasoundthatwasappliedinthisstudy.

(19)Study3 Reliability 30 50%female,agemean(SD)38(±11);LBPn=23;neck/midbackpainn=7

2

(19)Study4 Reliability,measurementerror

30 43%female,agemean(SD)40(±11);LBPn=20;neck/midbackpainn=6;extremitypainn=3;painfreen=1

2

B 1

2

Patientcharacteristicsreferto,e.g.age,gender,diseasecharacteristics(diagnosis,diseaseduration,diseaseseverity),setting,andgeographicallocation.

Ratercharacteristicsmayreferto,e.g.professionalbackground,specifictrainingreceived,oryearsofexperience.

Page 66: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

66

Appendix6.OverviewTableofqualityandresultsofstudiesonreliabilityandmeasurementerror.

Measurementinstrument(MI)(ref)

TypeofMI Reliability Measurementerrorn Studyquality Result(rating) N Studyquality Result(rating)

LLMcontractionscore(study2)(19)

Ultrasound 30 Adequate 0.97(0.92‐0.98) 30 Adequate LoA[−0.94;1.22mm]

LLMcontractionscore(study3)(19)

Ultrasound 30 Adequate 0.94(0.88‐0.97)

LLMcontractionscore(study4)(19)

Ultrasound 30 Adequate 0.97(0.94‐0.99) 30 Adequate LoA[−1.32;1.25mm]

LLMcontractionscore(ref)

LLMcontractionscore(ref)

Pooledorsummaryresult(overallrating)

90 0.94‐0.97(+) 90 SDCconsistsncy=1.08;1.29a

acalculatedfromLoA

Page 67: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

67

Appendix7.SummaryofFindingsTablesforReliabilityandMeasurementerror.

BasedonthestudiesonreliabilitydescribedbySkeie(19)

Reliability Summaryresult Overallrating Qualityofevidence

UltrasoundmeasurementoftheLMMcontractionscore–best‐evidencemeasurementprotocol:rater,dayandactivemotortasksperformedbeforemeasurementwerenotofinfluence

RangeICC:0.94‐0.97 Sufficient High(twostudiesofadequatequality)

MeasurementinstrumentB–

BasedonthestudiesonmeasurementerrordescribedbySkeie(19)

Measurementerror Summaryresult Overallrating Qualityofevidence

UltrasoundmeasurementoftheLMMcontractionscore–best‐evidencemeasurementprotocol:rater,dayandactivemotortasksperformedbeforemeasurementwerenotofinfluence

RangeSDCconsistsncy:1.08‐1.29

MIC=notassessed

?

MeasurementinstrumentB–

Page 68: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

68

References1. Mokkink LB, de Vet HCW, Prinsen CAC, Patrick DL, Alonso J, Bouter LM, et al. COSMIN Risk of Bias checklist for systematic reviews of Patient-Reported Outcome Measures. Qual Life Res. 2018;27(5):1171-9. 2. Walton MK, Powers JA, Hobart J, al. e. Clinical outcome assessments: A conceptual foundation – Report of the ISPOR Clinical Outcomes Assessment Emerging Good Practices Task Force. Value Health. 2015;18:741-52. 3. Powers JH, 3rd, Patrick DL, Walton MK, Marquis P, Cano S, Hobart J, et al. Clinician-Reported Outcome Assessments of Treatment Benefit: Report of the ISPOR Clinical Outcome Assessment Emerging Good Practices Task Force. Value Health. 2017;20(1):2-14. 4. Mokkink LB, Boers M, van der Vleuten CPM, Bouter LM, Alonso J, Patrick DL, et al. COSMIN Risk of Bias tool to assess the quality of studies on reliability or measurement error of outcome measurement instruments: a Delphi study. . BMC Medical Research Methodology. 2020;20(293). 5. Prinsen CAC, Mokkink LB, Bouter LM, Alonso J, Patrick DL, de Vet HCW, et al. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27(5):1147-57. 6. Terwee CB, Prinsen CAC, Chiarotto A, Westerman MJ, Patrick DL, Alonso J, et al. COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study. Qual Life Res. 2018;27(5):1159-70. 7. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63(7):737-45. 8. Hamilton M. The assessment of anxiety states by rating. Br J Med Psychol. 1959;32(1):50-5. 9. Douglas PS, DeCara JM, Devereux RB, Duckworth S, Gardin JM, Jaber WA, et al. Echocardiographic imaging in clinical trials: American Society of Echocardiography Standards for echocardiography core laboratories: endorsed by the American College of Cardiology Foundation. J Am Soc Echocardiogr. 2009;22(7):755-65. 10. Jungmann PM, Welsch GH, Brittberg M, Trattnig S, Braun S, Imhoff AB, et al. Magnetic Resonance Imaging Score and Classification System (AMADEUS) for Assessment of Preoperative Cartilage Defect Severity. Cartilage. 2017;8(3):272-82. 11. Fischer JSJ, A.J.; Kniker, J.E.; Rudick, R.A.; Cutter,G. Multiple Sclerosis Functional Composite (MSFC). Administration and scoring manual.; 2001. 12. Genc S, Omer B, Aycan-Ustyol E, Ince N, Bal F, Gurdol F. Evaluation of turbidimetric inhibition immunoassay (TINIA) and HPLC methods for glycated haemoglobin determination. J Clin Lab Anal. 2012;26(6):481-5. 13. Holen JC, Saltvedt I, Fayers PM, Hjermstad MJ, Loge JH, Kaasa S. Doloplus-2, a valid tool for behavioural pain assessment? BMC Geriatr. 2007;7:29. 14. Farooq MN, Mohseni Bandpei MA, Ali M, Khan GA. Reliability of the universal goniometer for assessing active cervical range of motion in asymptomatic healthy persons. Pak J Med Sci. 2016;32(2):457-61. 15. Jordan K, Haywood KL, Dziedzic K, Garratt AM, Jones PW, Ong BN, et al. Assessment of the 3-dimensional Fastrak measurement system in measuring range of motion in ankylosing spondylitis. J Rheumatol. 2004;31(11):2207-15.

Page 69: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

69

16. Correll S, Field J, Hutchinson H, Mickevicius G, Fitzsimmons A, Smoot B. Reliability and Validity of the Halo Digital Goniometer for Shoulder Range of Motion in Healthy Subjects. Int J Sports Phys Ther. 2018;13(4):707-14. 17. D'Agostino M A, Aegerter P, Jousse-Joulin S, Chary-Valckenaere I, Lecoq B, Gaudin P, et al. How to evaluate and improve the reliability of power Doppler ultrasonography for assessing enthesitis in spondylarthritis. Arthritis Rheum. 2009;61(1):61-9. 18. Terwee CB, Mokkink LB, Knol DL, Ostelo RW, Bouter LM, de Vet HC. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res. 2012;21(4):651-7. 19. Skeie EJ, Borge JA, Leboeuf-Yde C, Bolton J, Wedderkopp N. Reliability of diagnostic ultrasound in measuring the multifidus muscle. Chiropr Man Therap. 2015;23:15. 20. Mathew AJ, Ostergaard M. Magnetic Resonance Imaging of Enthesitis in Spondyloarthritis, Including Psoriatic Arthritis-Status and Recent Advances. Front Med (Lausanne). 2020;7:296. 21. Butland RJ, Pang J, Gross ER, Woodcock AA, Geddes DM. Two-, six-, and 12-minute walking tests in respiratory disease. Br Med J (Clin Res Ed). 1982;284(6329):1607-8. 22. de Jong K ea. Richtlijnen 6-minutes timed walking test.; 2000. 23. Bloch R, Norman G. Generalizability theory for the perplexed: a practical introduction and guide: AMEE Guide No. 68. Med Teach. 2012;34(11):960-92. 24. Feys P, Lamers I, Francis G, Benedict R, Phillips G, LaRocca N, et al. The Nine-Hole Peg Test as a manual dexterity performance measure for multiple sclerosis. Mult Scler. 2017;23(5):711-20. 25. Mathiowetz V, Weber K, Kashman N, Volland G. Adult norms for the Nine Hole Peg Test of finger dexterity. Occup Particip Health. 1985;5:24-38. 26. Arvidsson Lindvall M, Anderzen-Carlsson A, Appelros P, Forsberg A. Validity and test-retest reliability of the six-spot step test in persons after stroke. Physiother Theory Pract. 2020;36(1):211-8. 27. Romani J, Giavedoni P, Roe E, Vidal D, Luelmo J, Wortsman X. Inter- and Intra-rater Agreement of Dermatologic Ultrasound for the Diagnosis of Lobular and Septal Panniculitis. J Ultrasound Med. 2020;39(1):107-12. 28. Gellhorn AC, Carlson MJ. Inter-rater, intra-rater, and inter-machine reliability of quantitative ultrasound measurements of the patellar tendon. Ultrasound Med Biol. 2013;39(5):791-6. 29. Brennan RL. Generalizability Theory. New York: Springer-Verlag; 2001. 30. Govaerts MJ, van der Vleuten CP, Schuwirth LW. Optimising the reproducibility of a performance-based assessment test in midwifery education. Adv Health Sci Educ Theory Pract. 2002;7(2):133-45. 31. McGraw KOW, S.P. Forming inferences about some intraclass correlation coefficients. Psychological Methods. 1996;1:30-46. 32. Shrout PE, Fleiss JL. Intraclass Correlations: Uses in assessing rater reliability. Psychological Bulletin. 1979;86:420-8. 33. Kraemer HC, Periyakoil, V. S., Noda, A. Kappa coefficients in medical research. Tutorial in biostatistics. Statistics in Medicine. 2002;21:2109–29. 34. Cohen J. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin. 1968;70:213-20. 35. Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement. 1960;20:37-46. 36. Vach W. The dependence of Cohen's kappa on the prevalence does not matter. J Clin Epidemiol. 2005;58(7):655-61.

Page 70: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve

70

37. Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. . Educational and Psychological Measurement. 1973;33:613-9. 38. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8(2):135-60. 39. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307-10. 40. de Vet HC, Terwee CB, Mokkink L, Knol DL. Measurement in Medicine. Cambridge: Cambridge University Press; 2011 2010. 41. Euser AM, Dekker FW, le Cessie S. A practical approach to Bland-Altman plots and variation coefficients for log transformed variables. J Clin Epidemiol. 2008;61(10):978-82. 42. de Vet HC, Mokkink LB, Terwee CB, Hoekstra OS, Knol DL. Clinicians are right not to like Cohen's kappa. BMJ. 2013;346:f2125. 43. de Vet HC, Dikmans RE, Eekhout I. Specific agreement on dichotomous outcomes can be calculated for more than two raters. J Clin Epidemiol. 2017. 44. de Vet HCW, Mullender MG, Eekhout I. Specific agreement on ordinal and multiple nominal outcomes can be calculated for more than two raters. J Clin Epidemiol. 2018;96:47-53. 45. Mokkink LB, Vet HC, Prinsen CA, patrick DL, Alonso J, Bouter LM, et al. COSMIN methodology for systematic reviews of Patient‐Reported Outcome Measures (PROMs) - user manual 2018 [Available from: www.cosmin.nl. 46. Higgins JP, Green S. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. The Cochrane Collaboration, 2011. 2011 [Available from: www.handbook.cochrane.org. 47. Cochrane Hanbook for Systematic reviews of Diagnostic Test Accuracy Reviews 2013 [Available from: http://methods.cochrane.org/sdt/handbook-dta-reviews. 48. Terwee CB, Jansma EP, Riphagen, II, de Vet HC. Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Qual Life Res. 2009;18(8):1115-23. 49. Boers M, Kirwan JR, Tugwell P, Beaton D, Bingham CO, III, Conaghan PG, et al. The OMERACT handbook: OMERACT; 2015 2015. 50. Smart A. A multi-dimensional model of clinical utility. International journal for quality in health care : journal of the International Society for Quality in Health Care. 2006;18(5):377-82. 51. Stratford PW, Kennedy D, Pagura SM, Gollish JD. The relationship between self-report and performance-related measures: questioning the content validity of timed tests. Arthritis Rheum. 2003;49(4):535-40. 52. Efron B. Better bootstrap confidence intervals. Journal of the American Statistical Association. 1987;82(397):171-85. 53. Moher D, Liberati A, Tetzlaff J, Altman DG, Group P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6(7):e1000097. 54. Peterson DAB, P.; Jabusch, H. C.; Altenmuller, E.; Frucht, S. J. Rating scales for musician's dystonia: the state of the art. Neurology. 2013;81(6):589-98.