Methodological quality (risk of bias) assessment tools for primary and secondary medical studies: what are they and which is better? – Military Medical Research

Randomized controlled trial (individual or cluster)

The first RCT was designed by Hill BA (1897–1991) and became the “gold standard” for experimental study design [12, 13] up to now. Nowadays, the Cochrane risk of bias tool for randomized trials (which was introduced in 2008 and edited on March 20, 2011) is the most commonly recommended tool for RCT [9, 14], which is called “RoB”. On August 22, 2019 (which was introduced in 2016), the revised revision for this tool to assess RoB in randomized trials (RoB 2.0) was published [15]. The RoB 2.0 tool is suitable for individually-randomized, parallel-group, and cluster- randomized trials, which can be found in the dedicated website https://www.riskofbias.info/welcome/rob-2-0-tool. The RoB 2.0 tool consists of five bias domains and shows major changes when compared to the original Cochrane RoB tool (Table S1A-B presents major items of both versions).

The Physiotherapy Evidence Database (PEDro) scale is a specialized methodological assessment tool for RCT in physiotherapy [16, 17] and can be found in http://www.pedro.org.au/english/downloads/pedro-scale/, covering 11 items (Table S1C). The Effective Practice and Organisation of Care (EPOC) Group is a Cochrane Review Group who also developed a tool (called as “EPOC RoB Tool”) for complex interventions randomized trials. This tool has 9 items (Table S1D) and can be found in https://epoc.cochrane.org/resources/epoc-resources-review-authors. The Critical Appraisal Skills Programme (CASP) is a part of the Oxford Centre for Triple Value Healthcare Ltd. (3 V) portfolio, which provides resources and learning and development opportunities to support the development of critical appraisal skills in the UK (http://www.casp-uk.net/) [18,19,20]. The CASP checklist for RCT consists of three sections involving 11 items (Table S1E). The National Institutes of Health (NIH) also develops quality assessment tools for controlled intervention study (Table S1F) to assess methodological quality of RCT (https://www.nhlbi.nih.gov/health-topics/study-quality-assessment-tools).

The Joanna Briggs Institute (JBI) is an independent, international, not-for-profit researching and development organization based in the Faculty of Health and Medical Sciences at the University of Adelaide, South Australia (https://joannabriggs.org/). Hence, it also develops many critical appraisal checklists involving the feasibility, appropriateness, meaningfulness and effectiveness of healthcare interventions. Table S1G presents the JBI Critical appraisal checklist for RCT, which includes 13 items.

The Scottish Intercollegiate Guidelines Network (SIGN) was established in 1993 (https://www.sign.ac.uk/). Its objective is to improve the quality of health care for patients in Scotland via reducing variations in practices and outcomes, through developing and disseminating national clinical guidelines containing recommendations for effective practice based on current evidence. Hence, it also develops many critical appraisal checklists for assessing methodological quality of different study types, including RCT (Table S1H).

In addition, the Jadad Scale [21], Modified Jadad Scale [22, 23], Delphi List [24], Chalmers Scale [25], National Institute for Clinical Excellence (NICE) methodology checklist [11], Downs & Black checklist [26], and other tools summarized by West et al. in 2002 [27] are not commonly used or recommended nowadays.

Animal study

Before starting clinical trials, the safety and effectiveness of new drugs are usually tested in animal models [28], so animal study is considered as preclinical research, possessing important significance [29, 30]. Likewise, the methodological quality of animal study also needs to be assessed [30]. In 1999, the initial “Stroke Therapy Academic Industry Roundtable (STAIR)” recommended their criteria for assessing the quality of stroke animal studies [31] and this tool is also called “STAIR”. In 2009, the STAIR Group updated their criteria and developed “Recommendations for Ensuring Good Scientific Inquiry” [32]. Besides, Macleod et al. [33] proposed a 10-point tool based on STAIR to assess methodological quality of animal study in 2004, which is also called “CAMARADES (The Collaborative Approach to Meta-Analysis and Review of Animal Data from Experimental Studies)”; with “S” presenting “Stroke” at that time and now standing for “Studies” (http://www.camarades.info/). In CAMARADES tool, every item could reach a highest score of one point and the total score for this tool could achieve 10 points (Table S1J).

In 2008, the Systematic Review Center for Laboratory animal Experimentation (SYRCLE) was established in Netherlands and this team developed and released an RoB tool for animal intervention studies – SYRCLE’s RoB tool in 2014, based on the original Cochrane RoB Tool [34]. This new tool contained 10 items which had become the most recommended tool for assessing the methodological quality of animal intervention studies (Table S1I).

Non-randomised studies

In clinical research, RCT is not always feasible [35]; therefore, non-randomized design remains considerable. In non-randomised study (also called quasi-experimental studies), investigators control the allocation of participants into groups, but do not attempt to adopt randomized operation [36], including follow-up study. According to with or without comparison, non-randomized clinical intervention study can be divided into comparative and non-comparative sub-types, the Risk Of Bias In Non-randomised Studies – of Interventions (ROBINS-I) tool [37] is the preferentially recommended tool. This tool is developed to evaluate risk of bias in estimating comparative effectiveness (harm or benefit) of interventions in studies not adopting randomization in allocating units (individuals or clusters of individuals) into comparison groups. Besides, the JBI critical appraisal checklist for quasi-experimental studies (non-randomized experimental studies) is also suitable, which includes 9 items. Moreover, the methodological index for non-randomized studies (MINORS) [38] tool can also be used, which contains a total of 12 methodological points; the first 8 items could be applied for both non-comparative and comparative studies, while the last 4 items appropriate for studies with two or more groups. Every item is scored from 0 to 2, and the total scores over 16 or 24 give an overall quality score. Table S1K-L-M presented the major items of these three tools.

Non-randomized study with a separate control group could also be called clinical controlled trial or controlled before-and-after study. For this design type, the EPOC RoB tool is suitable (see Table S1D). When using this tool, the “random sequence generation” and “allocation concealment” should be scored as “High risk”, while grading for other items could be the same as that for randomized trial.

Non-randomized study without a separate control group could be a before-after (Pre-Post) study, a case series (uncontrolled longitudinal study), or an interrupted time series study. A case series is described a series of individuals, who usually receive the same intervention, and contains non control group [9]. There are several tools for assessing the methodological quality of case series study. The latest one was developed by Moga C et al. [39] in 2012 using a modified Delphi technique, which was developed by the Canada Institute of Health Economics (IHE); hence, it is also called “IHE Quality Appraisal Tool” (Table S1N). Moreover, NIH also develops a quality assessment tool for case series study, including 9 items (Table S1O). For interrupted time series studies, the “EPOC RoB tool for interrupted time series studies” is recommended (Table S1P). For the before-after study, we recommend the NIH quality assessment tool for before-after (Pre-Post) study without control group (Table S1Q).

In addition, for non-randomized intervention study, the Reisch tool (Check List for Assessing Therapeutic Studies) [11, 40], Downs & Black checklist [26], and other tools summarized by Deeks et al. [36] are not commonly used or recommended nowadays.