Disagreement in interpretation: a method for the development of benchmarks for quality assurance in imaging

https://doi.org/10.1016/j.jacr.2003.12.017Get rights and content

Abstract

Purpose

To calculate disagreement rates by radiologist and modality to develop a benchmark for use in the quality assessment of imaging interpretation.

Methods

Data were obtained from double readings of 2% of daily cases performed for quality assurance (QA) between 1997 and 2001 by radiologists at a group practice in Dallas, Texas. Differences across radiologists in disagreement rates, with adjustments for case mix, were examined for statistical significance using simple comparisons of means and multivariate logistic regression.

Results

In 6703 cases read by 26 radiologists, the authors found an overall disagreement rate of 3.48%, with a disagreement rate of 3.03% for general radiology, 3.61% for diagnostic mammography, 5.79% for screening mammography, and 4.07% for ultrasound. Disagreement rates by radiologist for the 10 radiologists with at least 20 cases ranged from 2.04% to 6.90%. Multivariate analysis found that controlling for other factors, both differences among radiologists and across modalities, statistically significantly contributed to differences in disagreement rates.

Conclusion

Disagreement rates varied by modality and by radiologist. Double reading studies such as these are a useful tool to rate quality of imaging interpretation and to establish benchmarks for QA.

Introduction

Radiologists do not currently have an objective benchmark for an acceptable level of missed diagnoses to meet hospital accreditation and proctoring requirements [1]. Also, there is no accepted measure by which to judge other nonradiologist physicians’ imaging interpretations. Government regulations, accreditation requirements, and the movement toward consumerism in the marketplace are placing ever increasing demands on the need to demonstrate quality. Residency training programs in radiology expose residents to studies in optical physics regarding perceptual errors. Although they also are familiarized with existing studies on rates of disagreement between multiple reads of the same film, there is a paucity of recent literature on the subject that is applicable to actual practice situations and that can serve as a credible benchmark. This study seeks to address that gap.

Section snippets

Sources of data

The International Radiology Group (IRG) is a radiology practice in Dallas, Texas, currently reading between 1200 and 1500 cases per day. It operates a highly automated, streamlined reading center. Since its inception, IRG has maintained a quality assurance (QA) program. Its cases come largely from outpatient settings in 26 states and include studies from nighttime emergency and off-hours coverage. Over the past several years, the center has developed its digital and film transmission

Results

Table 1 shows the overall disagreement rate, 3.5%, and disagreement rates by modality. Screening mammography had the highest disagreement rate, 5.8%, of all the modalities.

Table 2 shows disagreement rate by radiologist by modality disregarding whether the radiologist was the first or second reader on the case. It also shows actual versus expected overall disagreement rates for each radiologist. Comparing actual and expected disagreement rates overall across all modalities, radiologist I had a

Conclusions

This study reports the results of a QA program involving a double reading of 2% of more than 300,000 cases that produced statistically valid results. In this study, a 5% or less disagreement rate was recorded. Another study of similar magnitude [3] also reported disagreement rates in a similar range. We believe that our study provides important input toward building a credible benchmark that could be used to measure accuracy of interpretation of plain film, mammography and ultrasound.

Comparison with other studies

Acknowledgements

We would like to acknowledge the contributions of Brian Hall, vice president of operations at IRG, Dallas, Texas.

This study received assistance in statistical analysis from the ACR’s Technology Assessment Studies Assistance Program, which provided the professional services of Jonathan H. Sunshine, PhD, Mythreyi Bhargavan, PhD, and Rebecca Lewis, MPH.

References (9)

There are more references available in the full text version of this article.

Cited by (74)

  • Neuroradiology diagnostic errors at a tertiary academic centre: effect of participation in tumour boards and physician experience

    2022, Clinical Radiology
    Citation Excerpt :

    Reported error rates for diagnostic radiology range between 1.7–9%.1–11

  • Peer learning in breast imaging

    2022, Clinical Imaging
  • Transitioning From Peer Review to Peer Learning: Report of the 2020 Peer Learning Summit

    2020, Journal of the American College of Radiology
    Citation Excerpt :

    Score-based peer review was introduced to radiology nearly 20 years ago as a quality assurance mechanism for evaluating radiologist performance [1-3].

View all citing articles on Scopus
View full text