Quality Control in Neuroradiology: Impact of Trainees on Discrepancy Rates

The authors compared discrepancy rates of studies interpreted by academic neuroradiologists working with and without trainees in 2162 studies. Discrepancies were categorized from mildest (no changes needed in report) to most important (clinically significant interpretation discrepancies). Faculty working alone versus faculty plus trainee discrepancies were calculated. There was a 1.8% rate of clinically significant detection or interpretation discrepancy among neuroradiologists. The difference in the discrepancy rates between faculty only (1.7%), fellows and faculty (1.6%), and residents and faculty (2.2%) was not statistically significant but showed a trend indicating that reading with a resident increased the odds of a discrepant result. BACKGROUND AND PURPOSE: Prior studies have found a 2%–8% clinically significant error rate in radiology practice. We compared discrepancy rates of studies interpreted by subspecialty-trained neuroradiologists working with and without trainees. MATERIALS AND METHODS: Subspecialty-trained neuroradiologists reviewed 2162 studies during 41 months. Discrepancies between the original and “second opinion” reports were scored: 1, no change; 2, clinically insignificant detection discrepancy; 3, clinically insignificant interpretation discrepancy; 4, clinically significant detection discrepancy; and 5, clinically significant interpretation discrepancy. Faculty alone versus faculty and trainee discrepancy rates were calculated. RESULTS: In 87.6% (1894/2162), there were no discrepancies with the original report. The neuroradiology division had a 1.8% (39/2162; 95% CI, 1.3%–2.5%) rate of clinically significant discrepancies. In cases reviewed solely by faculty neuroradiologists (16.2% = 350/2162 of the total), the rate of discrepancy was 1.7% (6/350). With fellows (1232/2162, 57.0% of total) and residents (580/2162, 26.8% of total), the rates of discrepancy were 1.6% (20/1232) and 2.2% (13/580), respectively. The odds of a discrepant result were 26% greater (OR = 1.26; 95% CI, 0.38–4.20) when reading with a resident and 8% less (OR = 0.92; 95% CI, 0.35–2.44) when reading with a fellow than when reading alone. CONCLUSIONS: There was a 1.8% rate of clinically significant detection or interpretation discrepancy among academic neuroradiologists. The difference in the discrepancy rates between faculty only (1.7%), fellows and faculty (1.6%), and residents and faculty (2.2%) was not statistically significant but showed a trend indicating that reading with a resident increased the odds of a discrepant result.

T o improve the level of patient care that is delivered, one must investigate variables that pervade the patient care process for their impact on the care received. Level of care has been made quantifiable by QA programs, including peer review. Previous publications have found error rates for diagnostic imaging to be between 2.0% and 7.7%. [1][2][3][4][5][6][7][8][9] More than 60 years ago, L.H. Garland found that in an environment in which disease prevalence reached 100%, the radiologic error rate was approximately 30%. He then hypothesized that the error rate in a typical radiology practice, one in which 9 of 10 films had negative findings, would be approximately 5%. 5 His hypothesis was proved many times over, for example when Filippi et al 3 examined discrepancy rates between radiology residents and attending neuroradiologists in interpreting emergent neuroradiology MR imaging studies and found a major discrepancy rate of 4.2%. More recently, Zan et al 9 sampled 4534 neuroradiology cases with an outside report for comparison and found that 347 (7.7%) had clinically significant discrepancies between the outside study and the interpretation of subspecialty-trained neuroradiologists. Using a similar grading system, Babiarz and Yousem 1 performed a study in which 1000 studies were internally reviewed, and they found a significant discrepancy rate of 2.0% among subspecialty-trained neuroradiologists at a major university hospital.
The purpose of this study was to investigate 1 of the variables affecting the quality of patient care provided by a neuroradiology service: trainee participation in the initial study review process. The authors hypothesized that having 2 readers review a case would produce more accurate results, even if the second reviewer was a trainee. The authors predicted that when a sample of neuroradiology studies read by faculty alone, faculty with fellows, and faculty with residents was reviewed for accuracy by a second neuroradiology faculty member, the cases initially interpreted solely by faculty members would have the highest discrepancy rates. This article, therefore, compares the discrepancy rates of studies interpreted by subspecialty-trained neuroradiologists working with and without trainees.

Data Collection
In accordance with the Health Insurance Portability and Accountability Act, our institutional review board reviewed and approved the protocol for this retrospective study and waived the requirement for informed consent.
The study was conducted as described by Babiarz and Yousem. 1 During 41 months (January 1, 2009, to July 7, 2011), as part of a QA initiative in our department, staff neuroradiologists reviewed previously read neuroradiology studies. For the first 2 current studies of the day that had prior examinations, each neuroradiologist was instructed to review and grade the most recent comparison study and its report. The second reviewer graded the original reports on the basis of their own opinion of the findings and interpretation for discrepancies according to a previously validated 5-point rating scale. 1,9 The 5-point scale allowed the following categories 1) no change in the reading, 2) finding of a clinically insignificant detection discrepancy (eg, missed case of mild chronic sinusitis), 3) finding of a clinically insignificant interpretation discrepancy (eg, interpretation of an oligodendroglioma as an astrocytoma), 4) finding of a clinically significant detection discrepancy (eg, a missed tumor), and 5) finding of a clinically significant interpretation discrepancy (eg, interpreting a tumor as a stroke) ( Table 1). In addition to the discrepancy score, staff also noted the type of imaging study of each examined case. In the event of a discrepancy (grades 2-5), staff recorded the source and nature of the discrepancy.
From the pool of the reviewed cases, 200 cases were randomly selected, and their reports were analyzed for predicting practice disease prevalence.
The data from each QA case sheet were input into a spreadsheet, and we recorded the following: patient medical record number, date of the original study, date of the second review, type of examination, technique and original contributor, reviewer, score, and comments. The electronic patient record was queried to determine whether the original faculty contributor was the sole contributor on the original report or if a trainee's name also appeared as a contributor. Diagnostic radiology resident and neuroradiology fellow trainees were separately designated.
Nineteen subspecialty-certified or subspecialty-certificationeligible neuroradiologists assigned to read current cases reviewed cases and reports originally read by 23 subspecialty-certified or subspecialty-certification-eligible neuroradiologists. Four neuroradiologists were no longer within the division and represent former faculty who had read cases. A neuroradiologist is subspecialty-certified or subspecialty-certification-eligible following completion of a 1-year Accreditation Council for Graduate Medical Education-accredited diagnostic neuroradiology fellowship. Neuroradiologists must practice neuroradiology for 1 year after their fellowship year and then pass a 4-hour cognitive test of neuroradiology knowledge, proctored by members of the American Board of Radiology, to become subspecialty-certified.
Each neuroradiologist was responsible for covering the clinical service an average of 3-4 days a week. On some services, there may not have been 2 cases to review that had comparison studies because of low volumes (eg, myelography service, teleradiology service, and so forth). The 19 neuroradiologist reviewers were not allowed to review their own previous radiology reports and were instructed to skip such cases. The readers ranged in experience from 1 to 29 years' postneuroradiology fellowship training.
For validating the scoring system, all studies with clinically significant discrepancies (scores of 4 and 5) were assessed by a third independent expert reviewer with 23 years of experience in neuroradiol-ogy (D.M.Y.) who had previously published data using the 5-point scoring system. 1,8 Because D.M.Y. was 1 of the 23 neuroradiologists whose reports were reviewed in this study, no adjudicating was done on the single discrepant case that he had originally read.

Statistical Analysis
Total counts of all scores and their relative percentages were tabulated for each staff member as well as for the whole neuroradiology division. The discrepancy rate was defined as the proportion of studies resulting in clinically significant findings. For the purpose of calculating the overall and individual discrepancy rates, scores were dichotomized as being either above 3 (clinically significant findings) or not greater than 3 (clinically insignificant findings). Basic descriptive statistics, including the range, mean, and SD, were computed for observed counts. Discrepancy rates were calculated for each imaging technique (CT versus MR imaging) and body region imaged (brain/ head and neck versus spine versus bony structures). Discrepancy rates were also calculated by type of contributors, including faculty only, faculty and neuroradiology fellow, and faculty and resident. ORs for discrepant results comparing studies with faculty interpretation alone with studies including residents and fellows were computed. CIs for discrepancy rates and ORs were constructed by inverting asymptotically normal test statistics obtained from appropriate logistic regression models. These models were fitted by using generalized estimating equations to account for the potential correlation between studies conducted by the same contributor. All tests of hypotheses were 2-sided and conducted at significance level of .05.

Results
The 2162 neuroradiology studies originally read by 23 neuroradiologists (present and past) were reread by the 19 neuroradiologists on staff at the time of their review. On average, 94.0 studies were reviewed for each of the 23 original readers (range, 1-258; SD, 83.8).
Of the cases read by faculty only, which made up 16.2% of the total cases (350 of 2162), 6 had a score of 4 or 5 (1.7%). Of the cases read by faculty with fellows, which made up 57.0% of the total cases (1232 out of 2162), 20 had a score of 4 or 5 (1.6%). There was a total of 54 fellows who contributed to reports. The average fellow read 22.5 cases (range, 1-85; SD, 24.7). The personal discrepancy rates of fellows ranged from 0.0% to 10.0%. As for cases read by faculty and residents, which made up the remaining 26.8% (580 of 2162), 13 had a Detection discrepancy, not clinically significant 3 Interpretation discrepancy, not clinically significant 4 Detection discrepancy, clinically significant 5 Interpretation discrepancy, clinically significant There was a total of 79 residents who contributed to reports. The average resident read 7.5 cases (range, 1-35; SD, 7.5). The personal discrepancy rates of residents ranged from 0.0% to 100.0%. The differences between the discrepancy rates for faculty alone, with fellows, and with residents were not statistically significant. For a faculty member, the odds of a discrepant result were 26% greater (OR ϭ 1.26; 95% CI, 0.38 -4.20) when reading with a resident and 8% less (OR ϭ 0.92; 95% CI, 0.35-2.44) when reading with a fellow than when reading alone.
Of the 39 cases with scores of 4 or 5, nineteen were CTs (13 of the brain, 3 of the neck, 1 of the maxillofacial bones, 1 of the spine, 1 CT angiogram) and 20 were MR images (15 of the brain, 2 of the neck, 2 of the spine, and 1 MR venogram). Of the 39 "misses," 12 cases were classified as vascular; 18, as neoplasms; 2, as congenital; 2, as artifacts; 2, as degenerative; 2, as trauma; and 1, as infection (Table 3).
Overall, the neuroradiology service had a 1.8% (39 in 2162; 95% CI, 1.3%-2.5%) rate of clinically significant detection or interpretation discrepancies (scores of 4 or 5) when images were reread (Table 2). Although 23 neuroradiologists had dictated studies that were reviewed, only 16 of the 23 had Ն10 studies as part of their personal evaluation. For these 16 neuroradiologists who dictated the Ն10 original reports, the discrepancy rate ranged between 0.4% and 5.7% (mean ϭ 1.8%; SD ϭ 1.7%). Table 5 provides individual discrepancy rates for all 23 neuroradiologists.

Discussion
Research on the effect of factors impacting the image interpretation process is essential for the creation of an environment that is conducive to the highest possible level of patient care. In the past, many variables have been analyzed for their effect on the quality of service, such as distractions, availability of clinical history, availability of previous reports, duration of search time, reading room environment, and trainee participation. [10][11][12][13] Although our study focuses primarily on the effect of trainees on discrepancy rates, previous studies have analyzed similar variables, particularly how accurate residents are in addition to how their residency year affects their accuracy. 7,14 For example, Seltzer et al 15 found that the need for correction of reports made by residents decreased significantly between the first and second/third years of residency. When Filippi et al 3 looked specifically at emergency neuroradiology MR imaging studies and year of residency, the percentages ranged from 10.9% with first-year residents to 4.7% with second-year residents. Cooper et al 14 had similar results when they found that fourth-year residents had a statistically significantly lower rate of error, compared with doctors earlier in their residency. Although our study does not compare year of residency, it does compare a similar variable: level of expertise (ie, between residents, fellows, and faculty). It is the goal of residency and fellowship programs to train individuals with the hope of producing better health care professionals; thus, it is important to periodically ensure that such training is effective and improves health care as a whole.
The participation of trainees in the image-interpretation process, most commonly through review of studies before faculty interpretation, has many benefits. It is a critical component in the development of trainee proficiency and the achievement of eventual autonomy. In addition, previous studies have shown that multiple readers may be beneficial for interpretation accuracy and, therefore, patient care. Sickles et al 16 conducted a study analyzing mammography reports and found a significantly higher performance by radiologists using a multiple rather than a single reading system of interpretation. Some studies in mammography have followed the effect of a quadruple reading system, while others have varied as to whether the second reader was blinded to the first reader's review. 17,18 In this study, we found a 1.8% (39/2162; 95% CI, 1.2%-2.4%) rate of clinically significant detection or interpretation discrepancy between the original report and the second opinion review of 2162 neuroradiology examinations. The odds of a discrepant result when a faculty member was reading with a resident were 26% (OR ϭ 1.26; 95% CI, 0.38 -4.20) greater than when he or she was working alone. This rate was 34% greater than when the faculty member was reading with a fellow. The odds of a discrepant result when a faculty member was reading with a fellow were 8.0% (OR ϭ 0.92; 95% CI, 0.35-2.44) less than those when the faculty member was reading alone. These differences do not achieve statistical significance, though they do reflect a range of discrepancy rates that call to question the impact of trainee participation in the reading process. Our rates of discrepancy of 1.8% are less than those studies cited above that refer to residents' readings that are checked by faculty. 3,14,15 The homogeneity of readings and lower rates are likely due to the advanced state of training of all of the neuroradiology faculty and the similarities in case mix represented.
The focus of our study was on the impact of resident and fellow participation in the reading of neuroradiology scans and the subsequent discrepancy rates. The implications of our study are that a trend can be seen in a higher rate of discrepancies when residents are involved. We cannot determine the accuracy rates of the residents in this study, which would be useful in making suppositions from the data. However, 1 explanation may be that the faculty members had unwarranted confidence in the residents' interpretation and did not apply the same level of scrutiny to the findings as they would reading the case themselves. Was there a sense of complacency in approving residents' reports? If so, that confidence may not have been as well-placed as when one reviews a board-certified neuroradiology fellow. In our department, residents and fellows dictate cases on their own, and their completed reports are then revised and/or approved by the faculty. Also, most studies read at our institution (1812 of 2162, or 83.8% of cases included in this study) are read by faculty and residents/fellows.
There are a number of study limitations. First, for a study that spanned Ͼ41 months with the intention of 2 studies reviewed per day per neuroradiologist on clinical service, the total number of studies was much lower than expected. This resulted from a lower compliance rate than desired. Next, the scoring system, which has been validated by prior studies, 1,9 was somewhat arbitrary, and its use may have led to misclassification or a failure to capture some of the differences between the original reports and the reviews of these reports. In addition, the design is susceptible to confirmation bias. Because reviewers knew how the disease progressed in the follow-up imaging, they may have had an easier time detecting the more subtle cues that presented themselves in the original images.
Another bias that limited this study was disease-selection bias. Because we only included cases for which follow-up studies were being done, we likely altered the distribution of diseases seen and types of examinations reviewed. There is a wide range in the number of cases reviewed per faculty member largely based on the percentage of time on the clinical schedule. This means that some "funded" faculty members are under-represented in the sample. What we are reporting in this study is discrepancy rates, and they are not the equivalent of accuracy rates. Unfortunately, because of inadequate/incomplete follow-up for determination of definitive diagnosis in most cases, accuracy rates cannot be computed. Finally, another important limitation is the potential bias from having 1 expert reviewer in charge of final adjudication of all (except for his own) clinically significant discrepancies.
There are a number of ways in which the QA program can be improved to minimize these limitations. For example, to improve care for included patients, cases could be reviewed by a second neuroradiologist immediately following their reading, before releasing the report to the referring physician. In addition, the design of future studies with an assessment of accuracy instead of just discrepancy would also be beneficial.

Conclusions
At a university hospital, we found a 1.7% rate of clinically significant detection or interpretation discrepancy between the original dictations and the second-opinion reviews of neuroradiology cases among faculty neuroradiologists working alone, a 1.6% rate of discrepancy among faculty neuroradiologists working with fellows, and a 2.2% rate of discrepancy among faculty neuroradiologists working with residents. The overall rate of discrepancy was 1.8%. The results suggest that greater scrutiny may be warranted in reviewing residents' interpretations of neuroradiology studies. Our results may have limited generalizability because we only reviewed cases for which follow-up studies were being done and thus potentially introduced disease-selection bias. Also, our study design may have introduced confirmation bias by allowing the reviewers to see how the cases evolved with time. Our study is an example of 1 step in the practice quality-improvement process that we hope will serve as a baseline and outcome metric as we intervene to reduce variability in interpretation quality. Accuracy rates must also be addressed for the best patient care.