By: Chad E Cook, Damian Keter, Ken Learman
Navigating the Literature: Navigating the ever-growing, healthcare literature can be challenging [1]. The sheer amount of new research, articles, and guidelines published regularly can be overwhelming. The number of biomedical publications has been steadily increasing over the years. As of 2022, there were approximately 3.3 million scientific and technical articles published worldwide [2]. The volume of information and the time constraints of a busy clinician can lead to information overload. This is particularly important since it can be difficult to determine which information is relevant and credible amidst the vast amount of available content.
In publishing, risk of bias measures are tools and methods used to assess the likelihood that the results of a study are influenced by systematic errors or biases. With the very high number of systematic reviews, which are designed to summarize overall results into a common understanding, the use of risk of bias measures is crucial for evaluating the quality, reliability, and trustworthiness [3-5] of research findings. This, and a focus on transparency in research, has led to the proliferation of risk of bias measures and their adoption into publication practice. However, there are limitations to risk of bias measures that may denude their utility in reconciling the literature. The purpose of this blog is to: 1) outline the limitations of risk of bias measures and 2) discuss the best ways of interpreting the literature when risk of bias measures provides interpretation conflict.
Limitations of Risk of Bias Measures: Risk of bias measures are useful tools that assist in guiding evidence synthesis, particularly in systematic reviews and meta-analyses. Risk of bias measures aid in selecting high-quality studies and weighting their contributions appropriately, leading to more reliable conclusions. Nonetheless, there are limitations to current risk of bias measures, which include: 1) subjectivity of raters, 2) elevating risk when reporting is actually the problem, 3) overemphasis on selected scoring areas and failure to recognize other notable contributors, and 4) interpretation issues (meaningful scaling) within and between instruments.
Subjectivity of raters: Assessments of risk of bias often involve subjective judgments, which can vary between reviewers. Best practice involves two different reviews and a consensus of findings, but assessment requires appropriate training to assure that reviewers truly understand each item of the risk of bias scale. A recent study [6] examined the inter-rater reliability of several risk of bias tools for non-randomized studies and found variability in the assessments that was attributed to differences in the complexity and clarity of the criteria used in the tools. Furthermore, results of the analysis using multiple tools on the same article can yield differing interpretations of the trustworthiness of a causal inference [7]. For this reason, it is
common practice for systematic review guidelines to mandate that two independent reviewers must complete risk of bias assessments and come to consensus on discrepancies [8].
Elevating risk when reporting is actually the problem: Reporting checklists in publishing are essential tools used to improve the transparency, completeness, and quality of research reporting. Common examples of reporting checklists include CONSORT for randomized controlled trials, PRISMA for systematic reviews and meta-analyses, and STROBE for observational studies. Unfortunately, not all studies are written using reporting checklists as a guide, which can lead to the inability to discriminate if the study design excluded the risk of bias component or if it was simply omitted from reporting. Risk of bias can only be evaluated based on what is reported and if what is reported is poor or omitted (despite being performed in the study), the risk of bias may be artificially inflated [9]. A counterfactual argument exists where investigators can use a checklist and report that design elements that meet the checklist occurred, when they did not or were inelegantly applied. This brings investigator intent to the table which we can never accurately assess, but exists nonetheless.
Overemphasis on selected scoring areas: In an effort to reduce administration burden, most risk of bias scales overemphasize areas (e.g., randomization, allocation concealment) and underemphasize others (e.g., interventional fidelity, blinding of outcomes, incomplete outcome data). Certainly, the underemphasized areas are as important or potentially more important than those that are historically supported [9].
Interpretation issues: There are two major considerations when interpreting results of a risk of bias tool. First, most risk of bias scales provide a summary score, but it is questionable whether this score actually reflects a meaningfully elevated risk, especially if the values are not weighted. For example, a high risk of bias score on the PEDro scale, a commonly used measure in physical therapy studies, total PEDro scores of 0-3 are considered ‘poor’, 4-5 ‘fair’, 6-8 ‘good’, and 9-10 ‘excellent’; it is important to note that these classifications have not been validated [10]. Second, the actual impact of bias may be variable depending on the direction of the impact. Two biases may move the outcome in opposite directions offsetting each other and producing minimal, if any, net effect on the inference. Third, best practice suggests that a sensitivity analysis or a subgroup analysis is appropriate when variations in risk of bias measures are identified in a synthesis-based review (e.g., systematic review). Conducting sensitivity analyses helps determine how the inclusion or exclusion of studies with high risk of bias affects the overall results. Performing subgroup analyses helps to explore whether studies with low, moderate, or high risk of bias yield different results [9].
Summary
Risk of bias measures provide additional data in the determination of study bias or quality, but these tools are not gospel and should not be taken as absolute, unquestionable truth. As with many tools used in interpreting publications, there are limitations to their use. As such, determining a study as “good” or “bad” or “trustworthy” or “not trustworthy”, purely from a risk of bias score should not be recommended.
References 1. https://www.pharmacytimes.com/view/tips-tricks-for-staying-up-to-date-with-medical-literature-guidelines-as-a-busy-pharmacist 2. https://ncses.nsf.gov/pubs/nsb202333/publication-output-by-region-country-or-economy-and-by-scientific-field
3. Riley SP, Flowers DW, Swanson BT, Shaffer SM, Cook CE, Brismée JM. ‘Trustworthy’ systematic reviews can only result in meaningful conclusions if the quality of randomized clinical trials and the certainty of evidence improves: an update on the ‘trustworthy’ living systematic review project. J Man Manip Ther. 2024 Aug;32(4):363-367.
4. Flowers DW, Swanson BT, Shaffer SM, Clewley DJ, Riley SP. Is there ‘trustworthy’ evidence for using manual therapy to treat patients with shoulder dysfunction?: A systematic review. PLoS One. 2024 Jan 18;19(1):e0297234.
5. Riley SP, Swanson BT, Shaffer SM, Flowers DW, Cook CE, Brismée JM. Why do ‘Trustworthy’ Living Systematic Reviews Matter? J Man Manip Ther. 2023 Aug;31(4):215-219.
6. Kalaycioglu I, Rioux B, Briard JN, Nehme A, Touma L, Dansereau B, Veilleux-Carpentier A, Keezer MR. Inter-rater reliability of risk of bias tools for non-randomized studies. Syst Rev. 2023 Dec 7;12(1):227.
7. Jüni P, Witschi A, Bloch R, Egger M. The Hazards of Scoring the Quality of Clinical Trials for Meta-analysis. JAMA. 1999;282(11):1054–1060. doi:10.1001/jama.282.11.1054.
8. Checklists for systematic reviews and research synthesis. https://jbi.global/sites/default/files/2020-08/Checklist_for_Systematic_Reviews_and_Research_Syntheses.pdf
9. Higgins JPT, Altman DG, Sterne JAC (editors). Chapter 8: https://training.cochrane.org/handbook/current/chapter-08
10. Assessing risk of bias in included studies. In: Higgins JPT, Churchill R, Chandler J, Cumpston MS (editors), Cochrane Handbook for Systematic Reviews of Interventions version 5.2.0 (updated June 2017), Cochrane, 2017. Available from www.training.cochrane.org/handbook.