and Intraobserver Reliability Study or Coiled Intracranial Aneurysms: An Inter-Noninvasive Angiographic Results of Clipped

BACKGROUND AND PURPOSE: Noninvasive angiography is commonly used to assess the outcome of surgical or endovascular treatment of intracranial aneurysms in clinical series or randomized trials. We sought to assess whether a standardized 3-grade clas-si ﬁ cation system could be reliably used to compare the CTA and MRA results of both treatments. MATERIALS AND METHODS: An electronic portfolio composed of CTAs of 30 clipped and MRAs of 30 coiled aneurysms was independently evaluated by 24 raters of diverse experience and training backgrounds. Twenty raters performed a second evaluation 1 month later. Raters were asked which angiographic grade and management decision (retreatment; close or long-term follow-up) would be most appropriate for each case. Agreement was analyzed using the Krippendorff a ( a K ) statistic, and the relationship between angiographic grade and clinical management choice, using the Fisher exact and Cramer V tests. RESULTS: Interrater agreement was substantial ( a K ¼ 0.63; 95% CI, 0.55 – 0.70); results were slightly better for MRA results of coiling ( a K ¼ 0.69; 95% CI, 0.56 – 0.76) than for CTA results of clipping ( a K ¼ 0.58; 95% CI, 0.44 – 0.69). Intrarater agreement was substantial to almost perfect. Interrater agreement regarding clinical management was moderate for both clipped ( a K ¼ 0.49; 95% CI, 0.32 – 0.61) and coiled subgroups ( a K ¼ 0.47; 95% CI, 0.34 – 0.54). The choice of clinical management was strongly associated with the size of the residuum (mean Cramer V ¼ 0.77 [SD, 0.14]), but complete occlusions (grade 1) were followed more closely after coiling than after clipping ( P ¼ .01). CONCLUSIONS: A standardized 3-grade scale was found to be a reliable and clinically meaningful tool to compare the results of clipping and coiling of aneurysms using CTA or MRA.


CONCLUSIONS:
A standardized 3-grade scale was found to be a reliable and clinically meaningful tool to compare the results of clipping and coiling of aneurysms using CTA or MRA.
ABBREVIATION: a K ¼ Krippendorff a T he main goal of intracranial aneurysm treatments is to prevent ruptures or rebleeding. However, because such events may be devastating, many clinicians verify angiographic results to determine the success of therapy in each patient. [1][2][3] Ruptures or rebleeding are relatively infrequent. Thus, angiographic results are often selected as outcome measures in clinical trials comparing aneurysm treatments. [4][5][6][7][8] However, the repeatability of angiographic outcome measures must be verified before widespread use.
Conventional angiography, the criterion standard to diagnose aneurysms and assess the results of treatment, has been increasingly replaced by noninvasive CTA and MRA in the past decades. 9,10 One problem is that noninvasive angiographic modalities are used differently depending on treatments: Surgically managed patients are often followed by CTA, while patients treated with coils are more often followed by MRA. 9,10 When one judges the comparative success of therapy in clinical reports, it would seem that comparing CTA results of clipping with MRA results of coiling would be problematic, given the different diagnostic accuracies of the 2 modalities. 11,12 This problem is particularly relevant for clinical trials: We cannot require a catheter angiogram solely for the purposes of the study when safer, noninvasive tests clinically suffice for most patients. 13 The problem is compounded by the proliferation of grading scales, many of which are tailored to various devices and treatment modalities. 3,14,15 A standardized method of reporting angiographic results that would facilitate comparisons between treatments and imaging modalities is needed.
A simple, 3-grade classification for the adjudication of results of clipped and coiled aneurysms has previously been shown to be reproducible when applied to conventional angiography. 15 The questions that remained unanswered after a systematic review 15 and that we sought to address in this work were the following: 1) Can the same angiographic classification system be used to evaluate CTA results of clipping and MRA results of coiling? 2) Are the results repeatable when judged by various raters? 3) Does the grade of occlusion obtained by clipping and assessed by CTA or by coiling and assessed by MRA have the same meaning in terms of clinical management?

MATERIALS AND METHODS
This article was written in accordance with the Guidelines for Reporting Reliability and Agreement Studies. 16

Patient Selection
An electronic portfolio of 30 clipped and 30 coiled aneurysms was constructed. For each treatment technique, we aimed to include a wide spectrum of patients, with a balanced ratio (1:1:1) of completely occluded, residual, and grayzone aneurysm cases to minimize the paradoxes of k statistics. 17,18 The number of patients per treatment group was estimated to be sufficient (.24) according to recommendations. 19,20 For each patient, a high-definition video of the axial MRA or CTA sequences of the coiled or clipped aneurysm was provided. Patients with ruptured and unruptured aneurysms were selected from the clinical series of 1 tertiary care center (Center Hospitalier de l'Université de Montréal). Patient and aneurysm characteristics are summarized in Table 1.

Grading Scale
The grading scale is a variant of the Raymond-Roy classification. 21 Categories of the standardized 3grade classification system included the following: 1, complete occlusion; 2, residual neck (defined as ,2 mm using visual estimation); and 3, residual aneurysm (Fig 1). 15 Raters were not trained in the use of this classification system before the assessment.

Raters
Thirty-two clinicians were invited to participate: 24 (75%) raters (11 interventional neuroradiologists, 7 neurosurgeons, 4 interventional neurologists, and 2 diagnostic neuroradiologists) from 4 different countries accepted. Twenty raters agreed to perform a second evaluation of the cases in a permutated order at least 1 month later. There were 10 senior raters with .10 years of experience. Two of the interventional neuroradiologists had experience as core lab reviewers. Rater characteristics are shown in the Online Supplemental Data.

Agreement Study
An electronic survey was created and sent to the raters using the REDCap online data base manager (https://www.project-redcap. org/) hosted at the Center Hospitalier de l'Université de  Montréal. 22,23 For each of the 60 cases, raters were asked to assess the grade of occlusion and to choose the most appropriate clinical management, assuming all angiographic results concerned a ruptured aneurysm in a 65-year-old patient with a good outcome and no other medical problems. Possible options were the following: follow-up imaging in 3-5 years (or none at all); close follow-up (6-18 months); and immediate retreatment by endovascular means; or immediate retreatment by surgical means. The last 2 choices were then merged as immediate retreatment (either by surgical or endovascular means). Clinically meaningful differences were also assessed for all cases and were defined in accordance with McDonald et al 24 as cases for which at least one rater recommended follow-up (close or delayed) and another rater recommended retreatment (surgical or endovascular).

Statistical Analysis
All calculations were performed using R 3.5.3 statistical and computing software (http://www.r-project.org/). Inter-and intrarater agreement for the grading scale and for the clinical management choices was estimated using the Krippendorff a (a K ) statistic, and the 95% confidence intervals were determined using 1000 bootstrap iterations. Interpretation of a K values was given in accordance with Landis and Koch. 25 Comparisons of proportions of ratings between prespecified aneurysm and rater subgroups as well as the strength of the association between the raters' angiographic verdict and the management of the patient were evaluated using the Fisher exact test followed by a Cramer V test, with a significance threshold of .05.

Grading Scale
The number of aneurysms judged to be completely occluded (grade 1) by various raters varied between 10 (17%) and 36 (60%). Similarly, residual aneurysms (grade 3) were judged to be present in 11 (18%) to 36 (60%) patients. Perfect agreement among all 24 raters was found in 7/60 (12%) patients or in 22/60 (37%) after dichotomization of the scale into absence or presence of a residual aneurysm (grades 1 1 2 versus 3). The distribution of angiographic verdicts differed between clipped and coiled aneurysms (P ¼ .01): Clipped aneurysms were more often judged to be completely occluded, and coiled aneurysms were more often judged to have residual necks, while residual aneurysms were similarly allocated (Online Supplemental Data).
Overall interrater agreement of the grading scale for all raters and all patients was substantial (a K ¼ 0.63; 95% CI, 0.55-0.70). When treatment and imaging modalities were considered separately, agreement was substantial for coiled aneurysms followed by MRA (a K ¼ 0.69; 95% CI, 0.56-0.76) and moderate for clipped aneurysms followed by CTA (a K ¼ 0.58; 95% CI, 0.44-0.69), yet with overlapping confidence intervals. Better agreement for coiled cases assessed by MRA than clipped aneurysms assessed by CTA was also a trend for all rater subgroups. Senior raters performed no better than juniors, and training background had no effect ( Table 2).
Individual intrarater agreement was at least substantial for all raters and varied between 0.66 and 0.89. There were no significant differences between the mean intrarater agreement of the subgroups defined according to experience or training background. (Online Supplemental Data).

Relationship between Angiographic Results and Clinical Management
Raters generally selected delayed follow-up imaging for aneurysms they graded as completely occluded (63% of all grade 1 choices; n ¼ 371/593), close follow-up for those graded as residual necks (92% of all grade 2 choices; n ¼ 370/404) and retreatment for those graded as residual aneurysms (81% of all grade 3 choices; n ¼ 358/ 443) (P , .001) (Fig 2). Clinical management differed significantly between treatment groups: Retreatments were similar, but coiled aneurysms were selected for closer follow-up than clipped aneurysms (P , .001) (Online Supplemental Data). Finally, for each rater, a strong association (P , .01) was found between angiographic results and clinical management, with a mean Cramer V of 0.77 (SD, 0.14) (Online Supplemental Data).

DISCUSSION
Many scales have been proposed to grade angiographic results of various aneurysm treatments, but a previous review has shown that few have proved to be reliable and none have previously attempted to evaluate multiple treatment or imaging modalities at the same time. 15 Yet, a common language is needed to share experiences and to meaningfully compare outcomes of various treatments assessed by different noninvasive angiographic modalities. Verifying the reliability of treatment outcome measures is important if we are to learn and progress from clinical experience or using randomized trials.
In the present work, we demonstrated variability within and between raters in adjudicating angiographic results of clipping or coiling using CTA or MRA. Perhaps unsurprisingly, agreement was, for many rater and aneurysm subgroups, less concordant than when results were assessed using conventional angiography. 15,26-28 Even if interrater agreement was suboptimal, the substantial level of agreement achieved overall among raters, regardless of treatment or imaging technique and experience or background, is somewhat reassuring. Furthermore, intrarater agreement was at least substantial for all raters.
MRA follow-up of coiled aneurysms has previously been shown to be sensitive and specific to detect aneurysm remnants and recurrences compared with DSA. 11,29 CTA of clipped aneurysms has generally not been as accurate compared with conventional angiography, especially for small aneurysms, when multiple clips were used, on small parent vessels, or when the aneurysm was in the vicinity of bony structures. 12,30-32 These same technical limitations may explain the trend toward lower agreement for clipped aneurysm subgroups assessed by CTA compared with coiled aneurysms assessed by MRA.
The exact same angiographic result, obtained by one or the other treatment technique, may not have the same clinical significance in terms of clinical management. This is why we attempted to verify the clinical pertinence of the angiographic verdict of each rater for each case, assuming that the angiographic result concerned the same patient. Interrater agreement regarding the clinical management of the hypothetic case with various angiographic results was only moderate overall. This finding is not so surprising when one considers that this measure combines both disagreement in the assessment of angiographic results and disagreement in the clinical management of the same adjudicated residua. When intrarater agreement is compared, more variability is introduced at the level of clinical decisions than at the time of the angiographic verdict (Online Supplemental Data).
Divergence in clinical management may be explained by the diversity of raters from different specialties, hospitals, and countries who may apply different follow-up protocols. This feature may also explain the relatively high number of cases with clinically meaningful differences in the management decisions (65%), which are comparable with the number in previous studies. 24 Another important observation is the apparent different clinical meaning of a complete occlusion documented by CTA postclipping compared with a complete occlusion demonstrated by MRA postcoiling: Raters seemed more confident to opt for delayed follow-up when aneurysms were completely clipped, while being often inclined to follow completely coiled aneurysms more closely (Online Supplemental Data). This choice has also been previously observed with conventional angiography results. 15 If the clinical meaning of grade 1 angiographic result differs when it is obtained by CTA on clipped aneurysms or by MRA on coiled aneurysms, by contrast, the allocation of a residual aneurysm had a more reliable clinical meaning, at least in terms of retreatment, no matter the imaging or treatment technique (Online Supplemental Data).
Given the inherent pitfalls of using a surrogate angiographic outcome measure, the impossibility of blinding assessors to the treatment received, and the variability in the clinical significance of complete occlusion (grade 1) and residual neck (grade 2) shown in the present study, we believe that the residual aneurysm category (grade 3), as judged by core lab experts, would be a more reliable angiographic outcome measure to compare aneurysm results in clinical trials comparing various treatments. 3,7,8 Most important, the strong correlation that was shown between the adjudication of an angiographic occlusion grade and the preferred management option for all raters, regardless of treatment or imaging technique and rater experience or background, speaks in favor of the clinical pertinence of the proposed classification.
Our study had several limitations. First, the portfolio provided only axial CTA or MRA sequences at predetermined speed and window levels that could be repeatedly reviewed, but in practice, clinicians have access to multiple sequences and can adjust windowing at will. This difference may have minimized the variability of interpretation. Second, our study did not include endovascular treatments other than coiling, such as intra-arterial or intrasaccular flow diverters. Thus, results cannot be generalized to patients treated by newer devices. The arbitrary 2-mm cutoff between the residual neck and the residual aneurysm categories was previously discussed: It was chosen as a compromise that took into account the technical limitations of noninvasive angiographic modalities. 15 It was not meant to be measured with precision, and it is expected that the notion of residual neck would be differently interpreted, taking into account the initial aneurysm size.
The management question concerned a single theoretic clinical scenario, applied to all cases. Other clinical scenarios would have been more realistic. Moreover, raters were not provided with the time elapsed between initial treatment and the imaging presented for each case. Various timeframes could have led to an increase in the clinical management variability. Cases were artificially selected, as commonly done in interrater reliability studies, to cover a wide spectrum of patients despite the small size of the sample and to minimize paradoxes of k statistics. Results may have been different had another series of patients been studied. Finally, the diagnostic accuracy of various noninvasive imaging modalities using this classification system compared with the criterion standard conventional angiography was not studied.

CONCLUSIONS
Noninvasive angiographic results of clipping or coiling of aneurysms can be reliably reported by raters of various experience and backgrounds using a standardized classification system. The proposed classification was shown to be clinically meaningful, with each grade being strongly correlated to a different management option. This classification could be used to standardize results of published randomized trials, registries, or case series.