Factors Influencing Confidence in Diagnostic Ratings and Retreatment Recommendations in Coiled Aneurysms

BACKGROUND AND PURPOSE: Angiographic occlusion and retreatment of coiled aneurysms are commonly used as surrogate end points in clinical trials. We aimed to evaluate the influence of aneurysm, patient, and rater characteristics on the confidence of visual evaluation of aneurysm coiling and retreatment decisions. MATERIALS AND METHODS: Twenty-six participants of the Advanced Course in Endovascular Interventional Neuroradiology of the European Society of Neuroradiology were asked to evaluate digital subtraction angiography examinations of patients who had undergone endovascular coiling, by determining the grade of aneurysm occlusion, the change between immediate postprocedural and follow-up angiograms, their level of confidence, the technical difficulty of retreatment, and the best therapeutic approach. The experience, knowledge, and skills of each participant were assessed. The influence of rater and case characteristics on indicated confidence in diagnostic ratings and retreatment recommendations was analyzed. RESULTS: Interrater reliability was moderate regarding the assessment of aneurysm occlusion grade (intraclass correlation coefficient = 0.581) and substantial regarding change (intraclass correlation coefficient = 0.776). Overall confidence in the diagnostic rating was high (median, “very certain”). Confidence was statistically significantly higher in cases that were generally rated as “worse.” The odds of recommending retreatment were significantly higher in cases that were generally rated with higher mean confidence. CONCLUSIONS: Although overall confidence in the diagnostic rating was high, our study confirms the suboptimal interrater reliability of visual assessment of aneurysm occlusion as well as retreatment recommendations, rendering both questionable as primary outcome measures. Besides recurrence status, recommendation of retreatment is significantly influenced by patient age, aneurysm neck width, and characteristics of the therapist.

A n important drawback of aneurysm coiling is the possibility of recurrence with a rerupture risk. Previous studies found a low interobserver variability regarding the visual assessment of aneurysm occlusion. 1,2 In a recent meta-analysis, the interrater reliability of the visual rating of aneurysm occlusion was found to vary significantly as a function of imaging methods, grading scales, occlusion rates, and their interaction. 3 Little is known about the confidence of the observers in their ratings, which will probably also influence the retreatment decision in the individual case. However, aneurysm retreatment rate has also been used as a study end point. 4 Recent studies reported a low interrater reliability of retreatment recommendations in coiled aneurysms 5,6 ; and to this date, there are no guidelines about when and how to retreat a coiled aneurysm. Medical decision-making is a central aspect of neurovascular interventions, and studies analyzing this complex and multifactorial process are limited. A variety of factors such as cognitive biases, personal experiences, prior training, and medicolegal considerations affect decision-making. 7 We aimed to assess the interrater reliability and the confidence of diagnostic ratings as well as the interrater reliability of retreatment decisions. In a second step, we aimed to evaluate the influence of aneurysm, patient, and rater characteristics on, first, the certainty of visual evaluation of aneurysm coiling and, second, retreatment decisions.

Setting
The study was performed with 26 participants of the Advanced Course in Endovascular Interventional Neuroradiology of the European Society of Neuroradiology, held in Hamburg from January 26 to January 29, 2015, who agreed to participate. The study was approved by the local ethics committee (Ethik-Kommission Ä rztekammer Hamburg, WF-030/14), and the requirement for written informed consent was waived. Participants' records and information were anonymized and de-identified before analysis.

Assessment of Experience, Knowledge, and Skills of Participants
Before the course, participants were invited to complete an on-line survey to rate their experience in interventional neuroradiology (On-line Appendix). The on-line survey was a multiple-choice questionnaire with the opportunity to provide commentary for each question. It consisted of 10 questions concerning their qualifications as well as the number of procedures they assisted or performed as a primary operator in aneurysm embolization, mechanical thrombectomy, and endovascular treatment of arteriovenous malformations or dural arteriovenous fistulas. As described in detail elsewhere, 8 work experience and aneurysm treatment experience were calculated from corresponding items and expressed as standardized scores (z scores with a mean of zero and an SD of 1).
To assess knowledge, the participants had to complete an examination consisting of 3 parts at the end of the course. The first part consisted of 20 multiple-choice questions concerning neuroanatomy and neuroembryology, pathophysiology, materials, and techniques as well as studies in interventional neuroradiology. In the second part, participants had to answer 12 questions regarding treatment of a complex incidental aneurysm of the posterior communicating artery region of the internal carotid artery, applying movies derived from live fluoroscopy of the real procedure to simulate a live case scenario. The third part included a semistructured standardized oral examination with questions covering the field of knowledge mentioned above, supplemented by situational perceptivity as well as assessment and management of complications applying standardized case materials. A total knowledge score was calculated from the subdomain scores and expressed as a standardized score.
Practical skills were assessed in aneurysm coiling, thrombectomy, and Woven EndoBridge device (WEB; Sequent Medical, Aliso Viejo, California) treatment as described in detail elsewhere. 8 For each participant, total skill scores were calculated from these subdomains and expressed as standardized scores.

Cases and Measures
DSAs of patients who had undergone endovascular coiling of either ruptured or unruptured aneurysms at our institution between 2010 and 2015 were evaluated. Only cases with at least 2 comparable angiographic series, one immediately following treatment and another 6 months later, were eligible. Case characteristics are shown in the Figure. For each case, the angiographic series immediately following treatment and after 6 months were presented as movies to the participants. Participants were informed about the patient's age and sex as well as whether the aneurysm was ruptured or unruptured and its location, size, and neck width. For each case, the participants answered 6 questions. First, participants were asked to determine the grade of aneurysm occlusion by selecting 1 of 3 options (complete occlusion, neck remnant, residual aneurysm). Second, they were asked to determine the change between the immediate postprocedural angiogram and the follow-up angiogram using a 3-step scale (better, same, worse). Third, they were asked to indicate how confident they were in their diagnostic rating on a 6-step scale (complete guess, very uncertain, somewhat uncertain, somewhat certain, very certain, or completely certain). Then they rated the technical difficulty of retreatment (standard, difficult, unbearable risk) and were asked to recommend the best therapeutic approach for the patient (coiling only, stent-assisted coiling, clipping, other endovascular treatment such as a flow diverter or the WEB device, or no retreatment). Finally, if the presented patient was younger than 50 years of age, the participants were asked to recommend the best therapeutic approach if the patient were 70 years of age; in cases in which the presented patient was older than 50 years of age, they were asked to recommend the best therapeutic approach if the patient were 30 years of age. These cases were treated as "subcases."

Statistical Analyses
For the analysis of confidence ratings, we used a linear mixed model with independent random effects for cases (if applicable, also subcases) and raters. For the analysis of all other outcomes, we used generalized linear mixed models with a binomial (binary outcomes) or multinomial (categoric outcomes) distribution, a logit link, and the random effects described above. As a measure of interrater reliability, intraclass correlation coefficients (ICCs) were calculated from models with intercept only (baseline model) as the ratio between the variance between cases (if applicable, also subcases) and the total variance. 9 We determined ICCs for the assessment of aneurysm occlusion grade (complete/neck remnant versus residual aneurysm) and the change between the immediate postprocedure angiogram and the follow-up angiogram (same/better versus worse). We used the following categories for interpreting ICCs: poor to fair (below 0.4), moderate (0.41-0.60), substantial (0.61-0.80), and almost perfect (0.81-1). 10 The influence of various rater characteristics (work experience, aneurysm treatment experience, knowledge, skills), case characteristics (location, bleeding, aneurysm size, aneurysm neck width, patient age), and casewise averaged rating across raters (proportion rating residual grade, proportion rating worsening, proportion rating difficult retreatment, and mean confidence, if applicable) on the indicated confidence in diagnostic rating; any retreatment recommendation; and specific retreatment recommendations was analyzed by adding fixed effects to the baseline model.
Associations with P Ͻ .05 were considered statistically significant. All analyses were performed with SPSS 21 (IBM, Armonk, New York).

Participant Characteristics and Experience in Interventional Neuroradiology
All 26 participants answered all questions in the on-line survey, resulting in a response rate of 100%. Twenty participants were younger than 40 years of age (76.9%), and 7 of the 26 participants were women (26.9%). One participant (3.8%) was a neurosurgeon, 1 participant (3.8%) was in his first year of radiology training, 1 (3.8%) was in his fifth year, and 23 participants (88.5%) had completed their radiology residency. Each participant had at least 1 year of experience working in interventional neuroradiology, with 12 (46.2%) reporting at least 4 years. Ten of the 26 participants (38.5%) were certified neuroradiologists in their country. The high number of certified radiologists is because in most European countries, radiologic certification is a prerequisite for specialization in interventional neuroradiology.
One participant did not attend the final knowledge test. Only 5 participants answered at least 80% of the 20 multiple choice questions correctly (median, 13/65% correct answers). In the second "live case" section of the examination, only 2 participants answered less than 80% of the 12 questions correctly. In the final oral examination, 12 participants answered at least 80% of the 21 questions correctly and 6 participants answered Ͻ60%. Median procedural time for coiling of a pos-terior communicating artery aneurysm was 23 minutes and 29 seconds (14 minutes and 2 seconds to 49 minutes and 53 seconds), and median fluoroscopy time was 13 minutes and 33 seconds (5 minutes and 51 seconds to 31 minutes to 13 seconds). Correct first coil selection was achieved 15 times, and a median of 8 coils was used (minimum 5, maximum 11). In 12 cases, complications occurred.

Interrater Reliability of Occlusion and Change
Interrater reliability was moderate regarding the assessment of aneurysm occlusion grade (ICC ϭ 0.581) and substantial regarding change (ICC ϭ 0.776).

Confidence in Diagnostic Rating
Overall confidence in the diagnostic rating was high (median, "very certain"; mean 5.20). As indicated by substantial variance, some raters were generally more confident in their diagnostic rating regarding the determination of the change between the initial postprocedural angiogram and the follow-up angiogram than others (irrespective of cases). With substantial variance across cases, some cases were rated generally with higher confidence than others (irrespective of raters). As indicated by the poor interrater reliability (ICC ϭ 0.267), each rater reacted individually to each case (On-line Table). Confidence was statistically significantly higher in cases that were generally rated as "worse" (versus "same/ better"). See Table 1 for detailed results.

Retreatment Recommendation
The distribution and frequencies of retreatment recommendation per case are shown in the Figure. Interrater reliability of the recommendations of any type of retreatment versus no retreatment was substantial (ICC ϭ 0.619). The odds of recommending retreatment were statistically significantly higher in cases that were generally rated as being "residual aneurysm" (versus "complete/neck remnant aneurysm") and that were generally rated with higher mean confidence. The odds of recommending any retreatment were significantly lower in elderly patients (Table 1).
Interrater reliability was moderate to substantial for coiling (ICC ϭ 0.596), stent and coiling (ICC ϭ 0.561), and clipping (ICC ϭ 0.633), but poor for other endovascular treatments (ICC ϭ 0.342). The odds of recommending coiling were statistically significantly lower in cases that were generally rated as being difficult to retreat and in aneurysms with smaller neck widths. Moreover, a smaller neck width was associated with lower odds of recommending clipping. The recommendation of stent-assisted coiling was more likely in cases with wider necks and less likely in cases with smaller aneurysm size.
Raters with more theoretic knowledge were more likely to recommend coiling only. Raters with more work experience were less likely to recommend clipping ( Table 2). For the elderly, all specific retreatment recommendations were less likely than for younger patients.

DISCUSSION
While the overall confidence in their own individual diagnostic rating was high, we found the same low interobserver reliability regarding the visual assessment of aneurysm as in previous studies. 1,2 In a recent systematic review and meta-analysis, variability was found to be lower if raters had to comment on change. 3 This is in accordance with our findings, because interrater reliability was moderate with regard to the assessment of aneurysm occlusion grade and substantial with regard to change. Moreover, Ernst et al 3 found that in studies using DSA, interobserver agreement was significantly better in samples with a higher proportion of completely occluded aneurysms because complete occlusion does not require any further differentiation on the degree of residual flow. Thus, interrater reliability might have been higher in our study if the proportion of completely occluded aneurysms had been higher. Our study also confirms the limited interrater reliability of the recommendation of retreatment of coiled aneurysms in general as well as in the type of retreatment. 5,6 This finding renders questionable retreatment as a primary outcome measure in studies comparing different types of aneurysm treatment. 11 Although the necessity of retreatment is a meaningful outcome parameter for patients, it adds much variability if the recommendation for retreatment varies widely among therapists. The estimation of the need for retreatment is a subjective end point, the basic prerequisite of which is the presence of recanalization as an indicator of higher rupture or rerupture risk. However, the visual assessment of aneurysm occlusion status itself is subjective.
To analyze the underlying reasons behind the wide variation among raters, we evaluated numerous patient and rater variables. Our study shows that the recommendation of retreatment is influenced by not only recurrence status but also case characteristics such as age and neck width as well as rater characteristics such as work experience, theoretic knowledge, and level of confidence in the diagnostic rating.  (1) to completely certain (6). b Binary compared with no treatment; odds ratios of Ͼ1 indicate a higher probability of treatment, and odds ratios of Ͻ1 indicate a lower probability of treatment with an increase in the predictors. c Posterior (1) versus anterior (0). d Ruptured (1) versus asymptomatic (0). e Elderly, older than 50 years, (1) versus younger, 50 years or younger (0). f Proportion of ratings of residual versus complete/neck remnant aneurysm from 0 to 1. g Proportion of ratings of worse-versus-the same/better condition from 0 to 1. h Proportion of ratings of difficult/unbearably risky versus standard retreatment from 0 to 1. i P Ͻ .050. j P Ͻ .00.
As a strength of this study, all angiograms were presented as movies in 2 projections to recreate a more realistic clinical scenario, while previous studies created a case vignette with just 1 single projection per case. Moreover, we focused on coiled aneurysms, while previous studies analyzed a heterogeneous group, including both treated and untreated aneurysms. In contrast to previous studies, 6 we provided aneurysm-specific information to the raters, including location, size, neck width, and bleeding and found that the recommendation of stent-assisted coiling was more likely in cases with a wider neck and less likely in cases with a smaller aneurysm size. Although in a real-world setting, further factors influence the final retreatment decision such as patient's anxiety and preferences, our experimental setting allowed us to control for these factors and concentrate on the influence of specific rater and aneurysm characteristics.
In contrast to previous studies, we found that rater characteristics have a significant effect on retreatment recommendations. 12 This might be because we involved a larger group of raters and assessed rater characteristics such as knowledge and work experience in more detail compared with previous studies. Thus, we observed that raters with better theoretic knowledge in neurovascular interventions were more likely to recommend coiling and raters with more work experience in endovascular interventions were less likely to recommend clipping.
Similar to previous studies, the recommendation for retreatment was less likely in elderly patients. 6 The preference of observation in older individuals might be explained by the overall higher treatment risks due to significant comorbidities and shorter life expectancies, though this might not be the appropriate approach in otherwise healthy patients. Clipping was the treatment recommendation of most participants in only 1 case. This might be because most participants were neuroradiologists and recommendations might have been different with more neurosurgical participants. Thus, a recent study found that neurosurgeons were significantly more likely to retreat and recommended different types of treatments compared with neuroradiologists. 6 Most interesting, cases that were generally rated as being difficult to retreat were not significantly more likely to be recommended for clipping. This finding contradicts the common practice of "negative defensive medicine" (ie, high-risk cases are avoided and sent to other disciplines). 13 In contrast, the practice of positive defensive medicine might explain the observation that most participants rarely recommended "no retreatment." The therapist might fear being blamed for undertreatment in case a recurrent aneurysm ruptures. These findings underline the issue of guidelines based on expert opinions, the lowest level of acceptable evidence. Although in the absence of research evidence it might be the only guidance available, it might not be the best.
In recent years, many patients prefer to seek a second opinion on their disease and available treatments by another physician. Second opinion is a common treatment ratification tool that may critically influence diagnosis and treatment. Given the possibility of exchanging and sharing medical images currently, there is no more need for repetitive investigations that would harm the patient. Previous studies have shown that second-opinion interpretations of neuroimaging studies added value by reducing error and optimizing the care of patients. 14 Health care organizations are trying to control costs by urging and even demanding a second opinion before interventions. However, second opinions can provoke unnecessary costs on the medical budget of the community, and dissenting recommendations might confuse and alienate the patient.
In our study, supposing the retreatment recommendation of the majority is the best approach and therefore best for the patient's welfare, the probability of this approach being recommended in a second consultation was only 30%. The probability that the patient gets 2 different treatment recommendations was 61%, and the probability of 3 different treatment recommendations was 23%. Although second opinion as a treatment ratification tool in general might be useful, it appears to be a waste of resources as long as there is no current standard concerning when and how to retreat a coiled aneurysm. Instead, effort should be focused on conducting randomized controlled trials so that clinicians can properly counsel the patients regarding the benefits and relative risks of different management options.
As a recently published objective clinical study end point, aneurysm recurrence volumetry using registered 3D-MRA follow-up datasets was found to be highly sensitive in the detection of aneurysm recurrences and to represent an objective, rater-independent, and highly reliable method and thus a promising approach for future studies. 15,16 As a limitation, the number of cases was limited to 20. With 20 cases, the participants voluntarily answered 120 questions. Because concentration diminishes with time and we aimed to guarantee that the participants gave reliable and authentic responses,  (1) to completely certain (6). b Binary compared with no treatment; odds ratios of Ͼ1 indicate a higher probability of treatment, and odds ratios of Ͻ1 indicate a lower probability of treatment with an increase in the predictors. c In SD units. d P Ͻ .050.
we had to limit the number to 20 cases. Future studies with more cases are desirable. Only 10 of the 26 participants were certified neuroradiologists; therefore, the study does not necessarily represent the interventional neuroradiology community. However, with the detailed assessment of working experience, theoretic knowledge, and practical skills, the group is well-described.

CONCLUSIONS
Although the overall confidence in diagnostic rating was high, our study confirms the suboptimal interrater reliability of visual assessment of aneurysm occlusion as well as retreatment recommendations, rendering both questionable as primary outcome measures. Besides recurrence status, recommendation of retreatment is significantly influenced by patient age, aneurysm neck width, and characteristics of the therapist.