Abstract
BACKGROUND AND PURPOSE: Unlike in Europe and Japan, guidelines or recommendations from specialized radiological societies on workflow management and adaptive intervention to reduce error rates are currently lacking in the United States. This study of neuroradiologic reads at a large US academic medical center, which may hopefully contribute to this discussion, found a direct relationship between error rate and shift volume.
MATERIALS AND METHODS: CT and MR imaging reports from our institution’s Neuroradiology Quality Assurance database (years 2014–2020) were searched for attending physician errors. Data were collected on shift volume specific error rates per 1000 interpreted studies and RADPEER scores. Optimal cutoff points for 2, 3 and 4 groups of shift volumes were computed along with subgroups’ error rates.
RESULTS: A total of 643 errors were found, 91.7% of which were clinically significant (RADPEER 2b, 3b). The overall error rate (errors/1000 examinations) was 2.36. The best single shift volume cutoff point generated 2 groups: ≤ 26 studies (error rate 1.59) and > 26 studies (2.58; OR: 1.63, P < .001). The best 2 shift volume cutoff points generated 3 shift volume groups: ≤ 19 (1.34), 20–28 (1.88; OR: 1.4, P = .1) and ≥ 29 (2.6; OR: 1.94, P < .001). The best 3 shift volume cutoff points generated 4 groups: ≤ 24 (1.59), 25–66 (2.44; OR: 1.54, P < .001), 67–90 (3.03; OR: 1.91, P < .001), and ≥ 91 (2.07; OR: 1.30, P = .25). The group with shift volume ≥ 91 had a limited sample size.
CONCLUSIONS: Lower shift volumes yielded significantly lower error rates. The lowest error rates were observed with shift volumes that were limited to 19–26 studies. Error rates at shift volumes between 67–90 studies were 226% higher, compared with the error rate at shift volumes of ≤ 19 studies.
ABBREVIATIONS:
- ICC
- intraclass correlation coefficient
- QA
- quality assurance
- VIF
- variance inflation factor
Medical errors are the third leading cause of death in the United States.1 Many industries have implemented system-wide measures and technological advancements to decrease errors and fatalities. Between 1973 and 2020, the number of commercial aviation fatalities has decreased by 97%.2 Anesthesia-related mortality in the United States has decreased by 97% from 1952 to 2005.3 Reported rates of major errors in radiology range between 2%–6%.4⇓⇓-7 Error rates in diagnostic radiology have remained relatively constant since the time they were first studied in 1949.8⇓⇓⇓⇓⇓-14
Studies of both general radiology15,16 and neuroradiology17 have found higher shift volumes to be associated with diagnostic errors. Larger scale studies evaluating radiology error rates as a function of specific shift volume are lacking. The identification of specific shift volume ranges that minimize error rates could lead to corrective interventions. Our aims were to extend the knowledge gained from our prior work by using the same data set17⇓⇓⇓⇓-22 to evaluate shift volume specific error rates by neuroradiologists at a large academic medical center and to identify the shift volume ranges with the lowest error rates.
MATERIALS AND METHODS
Study Setting
Institutional review board approval with a waiver of informed consent was obtained for this retrospective study. Our department’s neuroradiology quality assurance (QA) database was searched for attending neuroradiologist errors between 2014–2020, which includes cases collected during the review of all addended reports and a review of computer selected studies during a clinical service, as previously described.20 Whenever assigned to a clinical service (all shifts, all radiologists), the neuroradiologist is presented with 3 randomly computer selected CT or MR imaging examinations, and the neuroradiologist is instructed to review and assign a score (1, 2a, 2b, 3a, 3b) to each by using the American College of Radiology RADPEER system.23 All RADPEER scores of 2 and 3 are further reviewed by either 2 additional attending neuroradiologists or the entire neuroradiology division during a quarterly QA conference to reach a consensus RADPEER score. Each reviewed examination is entered into the QA database along with its associated RADPEER score. Diagnostic error rates are expressed as the number of errors per 1000 interpreted examinations that are shift volume specific.
All 16 current or former members of the neuroradiology division are full-time employees who have completed neuroradiology fellowships. Division members have similar yearly productivity, read similar neuroimaging case mixes and rotate with similar frequency among general neuroradiology nonsubspecialized clinical services: emergency/inpatient (9-hour day shift), spine (9-hour day shift), outpatient (9-hour day shift), evening weekday call (6-hour evening shift), and weekend call (9–12-hour day shift). Attendings read a similar mix of adult and pediatric CT and MR imaging neuroradiology studies, which are read in chronological order on either a single common outpatient study worklist or a dedicated emergency/inpatient worklist. Broadly speaking, approximately 40% of the studies were dictated by trainees.
Data Collection
Data on shift volumes on the days an error was documented, the number of errors per shift volume, the total number of studies interpreted for each shift volume and demographic information were extracted from the radiology departmental database. Diagnostic error rates are expressed as the number of errors per 1000 interpreted examinations for each specific shift volume (for example, if there were 10 errors detected on all shifts of 40 studies, and there were 1000 shifts during which 40 studies were interpreted, then the error rate for a shift volume of 40 would be 10 / [40 × 1000]). The shift volume was defined as the total number of CT and MR imaging examinations that the neuroradiologist interpreted during the shift (eg, if 6 radiologists are working on a given day, this would constitute 6 total shifts for that day). We excluded shifts with 10 or fewer interpreted studies per day as outliers; many of these likely represent reports generated on administrative or off days, possibly to catch up with overflow work from the day before. For examinations comprising multiple unique accession numbers (eg, head MR imaging-MRA), each accession number was considered to be a single examination for purposes of determining shift volume.
Statistical Methods
Descriptive statistics (mean and SD for age; count and percentages for age) were obtained for the error group.
Optimal cutoff points categorizing examinations into one of several risk groups, using shift volume level as the diagnostic variable, were computed by using ordinary logistic regression.24 Given the wide range of daily shift volume productivity within our practice, we wanted to determine more than one possible cutoff point and error rate differences for the resultant groups. Each possible combination of cutoff points, categorizing examinations into 2, 3 or 4 separate risk groups, were computed. The optimal cutoff values are defined as the ones with the most significant overall likelihood ratio test statistics, giving the most significant overall set of risk groups for a specified number of cutoff points. Individual odds ratios and P values comparing each group with the group with the lowest shift volume (group 1) within each split are also computed, but these were not used to determine optimality. The error rate per 1000 studies was computed for each group within a split.
To assess the impact of potential correlation among examinations read by the same radiologist, the intraclass correlation, measured by the intraclass correlation coefficient (ICC), and the resulting variance inflation factor (VIF) were examined. In the presence of high intraclass correlation or VIFs, the use of ordinary logistic regression is inappropriate. As covariate information on the attending radiologist was unavailable for all examinations, the ICC and VIF were computed by using a subset of the data analyzed in a previous study.25 ICC values near 1 indicate high correlation, whereas values near 0 indicate low correlation. High VIFs indicate that variances of estimated parameters are higher than they would be in the absence of intraclass correlation, whereas values near 1 indicate little inflation. The ICC, its 95% confidence interval, and the VIFs were computed by using resampling techniques.26 All statistical analyses were performed in R, version 4.1.1.
RESULTS
There were 643 patients within the error group; 349 were men (54.3%), 293 were women (45.6%), and 1 (0.2%) patient’s sex was not recorded. The mean patient age was 49.8 years (SD, 25).
The estimated ICC among attending radiologists was low (0.001; 95% CI, 0–0.079), as was the estimated VIF (1.099), suggesting minimal correlation between examinations and minimal variance inflation. Six attending radiologists had small cluster sizes (having read fewer than 10 examinations). A sensitivity analysis of the ICC and VIF estimates was performed by rerunning the analysis without the small clusters. Similar estimates of the ICC (0.0005; 95% CI, 0–0.0796) and VIF (1.073) were obtained. Because of the low correlation and VIF, the method for determining optimal risk groups by using ordinary logistic regression was deemed acceptable.
A total of 643 errors were detected during the study period; 590 (91.7%) were clinically significant (RADPEER 2b, 3b). The overall error rate (per 1000 interpreted examinations) was found to be 2.36. Figure 1 presents a detailed view of the trend in error rate as a function of shift volume compared with the overall error rate. The Table presents the risk groups by shift volume for 1, 2 and 3 cutoff points. All group splits were found to be highly significant by using the likelihood ratio test (P < .001).
Detailed view of the trends in error rate as a function of shift volume level against the overall error rate.
Optimal risk groups
For a single cutoff point (search done for each shift volume level), group 1 consists of examinations read during shift volumes of ≤26 (error rate 1.59), whereas group 2 consists of examinations read during shift volumes of ≥27 (error rate 2.58). The odds of an examination in group 2 containing an error were found to be 63% higher than examinations in group 1 (OR: 1.63, P < .001).
For 2 cutoff points (search done by fours), group 1 consists of examinations read during shift volumes of ≤19 (error rate 1.34), group 2 consists of examinations read during shift volumes between 20 and 28 (error rate 1.88), and group 3 consists of examinations read during shift volumes of ≥29 (error rate 2.6). The odds of an examination in group 2 containing an error were found to be 40% higher than examinations in group 1 (OR: 1.4, P = .1), whereas the odds of examinations in group 3 were found to be 94% higher than examinations in group 1 (OR 1.94, P < .001).
For 3 cutoff points (search done by tens), group 1 consists of examinations read during shift volumes of ≤24 (error rate 1.59), group 2 consists of examinations read during shift volumes between 25 and 66 (error rate 2.44), group 3 consists of examinations read during shift volumes between 67 and 90 (error rate 3.03), and group 4 consists of examinations read during shift volumes of ≥91 (error rate 2.07). The odds of an examination in group 2 containing an error were found to be 54% higher than examinations in group 1 (OR: 1.54, P < .001), the odds of an examination in group 3 containing an error were found to be 91% higher than examinations in group 1 (OR: 1.91, P < .001), and the odds of an examination in group 4 containing an error were found to be 30% higher than examinations in group 1 (OR 1.30, P = .25).
DISCUSSION
In this study, we have found that neuroradiology errors are significantly lower on low shift volumes compared with high shift volumes. Based on our data, the best outcomes are seen when shift volumes are ≤19–26 studies. Our data are in line with the findings of a recent nationwide survey of 42 academic neuroradiology division chiefs that suggested a mean optimal shift volume of between 25–33 studies.27 Our group error rate was 2.36; if our shift volume could be limited to 19 studies, error rates could potentially decrease by 57%. The highest error rates were observed at shift volumes of between 67–90 studies, which was 226% higher than the lowest error rate, which was observed at shift volumes of ≤19 studies. Error rates at high shift volumes might be difficult to justify, given this degree of difference. Approximately 74 million CT and 39 million MR imaging studies are performed in the United States per year,28 and decreasing the mean error rates might have a significant impact from a public health perspective.1 Medicolegal outcomes of errors are not insignificant. An analysis of 2019 malpractice claims in diagnostic and interventional neuroradiology found a median plaintiff award of $2,877,847 and a median settlement amount of $1,950,000.29 Missing aneurysms on CT or MR imaging may lead to plaintiff awards of $4–43 million.30
For 3 cutoff points, the odds ratio for the group 4 risk group decreases, seemingly suggesting that it is safer than groups 2 and 3. However, this is most likely due to comparatively limited sample sizes for the shift volume levels comprising that group. Figure 2 presents a plot of the sample size for each shift volume; shift volumes of 91 and greater can be seen to have markedly lower sample sizes than those of the lower shift volumes. Furthermore, many shift volumes after this threshold are so small that no error has yet been observed at these volume levels. Together with the overall trends seen in Fig 1, we do not have any reason to suspect that risk group 4 is any safer than groups 2 and 3. We conclude the decreased odds ratio for group 4 is the result of sample size limitations.
Sample size for each shift volume level.
Prior literature has highlighted a broad range of potential strategies for reducing diagnostic errors within general and specialized practices, including interpretation by specialty and subspecialty radiologists,9,10,31,32 lower shift volumes,15 limits on shift lengths to less than 10 hours,15,33 reductions in noninterpretive tasks during clinical shifts,34 radiologist participation at multispecialty tumor boards,20 awareness of most common types of misses,18,19,21 and reduced interpretation speed.35 Subspecialty interpretations of neuroradiologic studies can change clinical care up to 41% of the time when compared with those of general radiologists or nonsubspecialized neuroradiologist interpretations.9,32 Radiologist workloads have been steadily increasing over recent decades.36,37 If we can develop more uniform policies and hospital business models in which shift volume limits and subspecialized interpretations are incentivized to act synergistically, as has been successfully done at many institutions across the country, the error reduction rates might approximate those of the airline industry or anesthesia departments. In 2012, the Royal College of Radiologists issued a national guideline recommending that radiologists interpret up to 2 complex CT or MR imaging examinations and up to 6 CT or MR imaging examinations, overall, per hour.38 In 2022, the Japanese College of Radiology issued a national guideline recommending that radiologists interpret up to 4 examinations per hour.39 The American College of Radiology has not issued official guidelines aimed at reducing error rates in the United States.
The results of this study can be used to guide future strategies. In our institution, the initial step has been to reduce the effect of extreme shift volumes by adjusting workflow without increasing shift frequency. Flexibility toward a labor division based on seniority level and experience with certain studies is encouraged as part of the process. Promoting educational and research efforts, multidisciplinary consultations, and peer-review discussions allows us to maintain our focus on quality and deepening our expertise. Mid- and long-term goals of limiting shift volume within the 19–26 range should then facilitate adjustments to the traditional business models and should thereby result in increased job attractiveness and more effective support to cohesive and motivated teams while maintaining profitability.
The limitations of our study include its single-center retrospective design. Second, though our study contains a large number of detected errors compared with other publications, only a fraction of potential errors was detected, given our methods of data collection. Third, all radiologists were fellowship-trained neuroradiologists at an academic medical center. Thus, our findings may not apply for general radiology practice, for neuroradiologists at community practices, or for academic neuroradiologists who have significantly different workflows than ours. Ideally, each institution should conduct an analysis of its own data. Fourth, we did not include confounders in our analysis. Based on prior work, neuroradiologists’ years of experience, trainee participation, and emergency interpretation setting did not have an impact on diagnostic errors, and shift volume was shown to be the dominant variable that influences outcomes.25 Shift length was not evaluated as a risk factor for diagnostic error, and the same volume threshold for a 6-hour shift should be different in comparison with that of a 9-hour or 12-hour shift; our shifts were generally shorter than 9 hours, whereas prior literature suggests a significant increase in errors beyond a threshold of 10 hours.15,33 Fifth, the analysis did not account for examination complexity (number of findings or diagnoses per study), which would be virtually impossible to evaluate, given the >250,000 studies performed during the study period. It might be reasonable to assume that higher volume shifts could have a higher proportion of negative or simple studies (negative CT head) and a lower proportion of complex MR imaging studies. Sixth, sub-analyses were not performed with respect to specific RADPEER scores (2a, 2b, 3a, or 3b). Further, prospective research is needed to determine shift volume safety cutoff points at the group and individual levels, using shift volume as the only variable and holding all possible confounders constant. Seventh, the lack of full covariate information on attending radiologists limited the analytic approaches that could be employed to handle potential intraclass correlation. Future studies should give thought to handling potential correlation though direct modeling.
CONCLUSIONS
Limiting the number of studies per shift to a low range, as opposed to high, unregulated volumes, should allow for the prevention of a significant number of diagnostic neuroradiology errors. Our study of shift volumes at a busy multispecialty, tertiary, large referral academic medical center suggests that neuroradiologic shift volumes should be limited to a 19–26 range per shift.
Footnotes
Disclosure forms provided by the authors are available with the full text and PDF of this article at www.ajnr.org.
References
- Received October 6, 2023.
- Accepted after revision December 18, 2023.
- © 2024 by American Journal of Neuroradiology