Inter- and Intrarater Agreement on the Outcome of Endovascular Treatment of Aneurysms Using MRA

BACKGROUND AND PURPOSE: Patients treated with coiling are often followed by MR angiography. Our objective was to assess the inter- and intraobserver agreement in diagnosing aneurysm remnants and recurrences by using multimodality imaging, including TOF MRA. MATERIALS AND METHODS: A portfolio composed of 120 selected images from 56 patients was sent to 15 neuroradiologists from 10 institutions. For each case, raters were asked to classify angiographic results (3 classes) of 2 studies (32 MRA-MRA and 24 DSA-MRA pairs) and to provide a final judgment regarding the presence of a recurrence (no, minor, major). Six raters were asked to independently review the portfolio twice. A second study, restricted to 4 raters having full access to all images, was designed to validate the results of the electronic survey. RESULTS: The proportion of cases judged to have a major recurrence varied between 16.1% and 71.4% (mean, 35.0% ± 12.7%). There was moderate agreement overall (κ = 0.474 ± 0.009), increasing to nearly substantial (κ = 0.581 ± 0.014) when the judgment was dichotomized (presence or absence of a major recurrence). Agreement on cases followed-up by MRA-MRA was similarly substantial (κ = 0.601 ± 0.018). The intrarater agreement varied between fair (κ = 0.257 ± 0.093) and substantial (κ= 0.699 ± 0.084), improving with a dichotomized judgment concerning MRA-MRA comparisons. Agreement was no better when raters had access to all images. CONCLUSIONS: There is an important variability in the assessment of angiographic outcomes of endovascular treatments. Agreement on the presence of a major recurrence when comparing 2 MRA studies or the MRA with the last catheter angiographic study can be substantial.

T ime-of-flight MR angiography is a noninvasive, radiationfree follow-up technique that has been shown to be sensitive (Ͼ85%) and specific (Ͼ85%) in identifying incomplete aneurysm occlusion after endovascular treatment. 1 MRA is replacing conventional angiography in following patients after coiling in many centers. 2,3 Interobserver agreement in the diagnosis of a residual or recurrent aneurysm has been reviewed recently. 4 Mean interrater agreement of the pooled studies was substantial ( ϭ 0.65; 95% confidence interval, 0.60 -0.69). However, authors concluded that because all 4 studies primarily focusing on reliability concerned conventional angiography, the reliability of MRA must still be considered unclear. 4 Multiple MR angiographic scales have been proposed, but few have been tested for reliability. 5 Interobserver agreement by using a catheter angiographic scale directly applied to MR angiographic results has been reported as "substantial" and "similar" to angiography, but the number of readings was limited to duplicates. 3,6 A more rigorous evaluation of the agreement in the MR angiographic diagnoses of residual or recurrent aneurysms in patients treated by endovascular coiling by a larger number of observers with various expertise is needed to establish the reliability of MRA diagnoses. 7 In clinical practice, the evolution of treated aneurysms is often followed by comparing the follow-up MRA with the final catheter angiogram of the embolization procedure. The reliability of the diagnoses of a stable occlusion or a recurrent lesion when the verdict depends on a comparison between 2 different imaging modalities (the MRA and the catheter angiographic results) has so far not been studied, to our knowledge.
Our objective was to assess inter-and intraobserver agreement in diagnosing aneurysm remnants and recurrences by using multimodality imaging, including DSA-MRA and MRA-MRA comparisons in close-to-clinical conditions.

MATERIALS AND METHODS
The present report was written in compliance with the Guidelines for Reporting Reliability and Agreement studies. 8 The evaluation of the intra-and interobserver variability in adjudicating outcomes of endovascular treatment was primarily done by electronic survey by using a portfolio of selected images (DSA-MRA and MRA-MRA) to ease the participation of multiple readers from various backgrounds, institutions, and countries. A second study restricted to expert readers having full access to the set of angiographic and MRA data on the same patients on the hospital PACS system of a single institution was designed to validate the results of the electronic survey and resemble clinical working conditions.

Cases
On the basis of Donner and Rotondi 9 (where for an expected K 0 of 0.600 with a prevalence of 0.3 and 5 raters, 24 subjects are sufficient for the lower limit of a 95% 1-sided confidence limit to be no less than 0.400), we estimated that 24 cases per group (MRA-MRA and DSA-MRA comparisons for a total of 48 cases) would suffice to provide meaningful results. The number of cases was increased to 56 to account for potential missing responses and to include a spectrum of patients followed by using various follow-up methods. Images were retrieved from 52 patients with 56 coiled aneurysms (all platinum coils) followed between May 2012 and June 2013 in 1 center. Cases were selected to include at least 2 comparable images from 1.5T or 3T MRI or angiographic series either immediately following treatment or later. Two authors (J.-C.G. and S.J.) selected the cases, aiming to include approximately 50% of easily replicable verdicts (25% of large recurrences, 25% of stable occlusions) and 50% of less clear cases. Cases and proportions were chosen to mimic a typical endovascular case series and to aim for a prevalence of approximately 30%-40% major recurrences, to minimize paradoxes of statistics. 10,11 The characteristics of patients are summarized in Table 1. Detailed characteristics for each patient can be found in On-line Table 1.

Portfolio of Images
A portfolio composed of 120 images from 56 cases (typically 1 pair of images per case) was assembled. On each page of the electronic survey, 1 postembolization and 1 follow-up image (at least) or 2 follow-up MRA images were displayed side by side. Most MR images (n ϭ 92) were time-of-flight source images, but 8 were 3D reconstructions.
No clinical information was provided. Observers had to independently grade each image according to a 3-value scale (complete occlusion, residual neck, residual aneurysm), graphically displayed on each page. 5,12 They were also asked to make a final judgment regarding the presence of a recurrence, according to a 3-value scale (no recurrence, minor recurrence, major recurrence) by comparing the 2 images. The definition of a major recurrence was "a saccular recurrence of a size sufficient to allow retreatment." Any other increase in the residuum was to be labeled a minor recurrence. 5,13 The portfolio was provided electronically (On-line Appendix).

Raters
The portfolio was sent to 15 participants, selected because they had served as a core lab for endovascular trials (n ϭ 5), participated in ongoing trials (n ϭ 5), or were on lists of potential participants (n ϭ 5). There were 14 interventionists (13 neuroradiologists) working in 10 different centers from 3 different countries (United States, France, and Canada). There were 8 senior (4 with Ͼ10 and 4 with Ͼ20 years; maximum, 40 years of experience) and 7 junior observers (Ͻ10 years of experience; minimum, 3 years).
The portfolio was sent twice electronically at least 3 months apart to 6 raters, blinded to their previous responses, who agreed to participate in the intraobserver agreement study.

PACS Study
Because agreement on judgments based on pairs of selected images differs from the normal clinical context, the same cases were independently reviewed, in a random order provided by another investigator, by 4 observers (all interventional neuroradiologists) having access to all images. Two senior observers also independently assessed the same cases twice on the PACS system Ͼ3 months apart in a different random order.

Statistics
The interrater agreement regarding the angiographic results (in 3 categories at 2 points in time [A and B]) and the final judgment regarding the presence of a recurrence (no, minor, or major recurrence) for the 15 raters by using the portfolio were estimated by using generalized . The 95% confidence intervals are reported. Stratified analyses according to follow-up imaging methods (DSA-MRA or MRA-MRA) or experience (all, seniors, juniors) were performed. For the 6 judges with a replicated judgment on the portfolio, intrarater agreement was also estimated by using statistics. The observed minimum, median, and maximum values of statistics are reported, including the 95% confidence interval. Because the primary angiographic end point of many trials has been the occurrence of a major recurrence, agreement was also analyzed in 2 categories (major recurrence, yes or no). For the PACS study, the inter-and intrarater agreement was estimated similarly by using statistics. All analyses were performed by using SAS, Version 9.2 (SAS Institute, Cary, North Carolina). All categories, such as "fair," "moderate," or "substantial" agreement, were qualified according to Landis and Koch. 14

RESULTS
Detailed statistical results can be found in On-line Tables 2-6.

Angiographic Results and Final Judgment on the Electronic Survey
There was a wide variability in angiographic results, with 2-23 cases (3.6%-41.1%) being judged as having a residual aneurysm on the first posttreatment evaluation. On the second set of images, between 12 and 39 cases (21.4%-69.6%) were judged as presenting a residual aneurysm. The proportion of portfolio cases judged to have a major recurrence at follow-up MRA varied between 16.1% and 71.4% (mean, 35.0% Ϯ 12.7%) (On-line Table 2). the category showing a lesser degree of agreement was "minor recurrence." Agreement improved to nearly substantial ( ϭ 0.581 Ϯ 0.014) when the judgment was dichotomized as the presence or absence of a major recurrence ( Table 2). Overall agreement regarding dichotomized verdicts on cases followed-up by MRA-MRA comparisons was similarly substantial ( ϭ 0.601 Ϯ 0.018).

PACS Reading
Raters having access to all images on the PACS system found major recurrences in 13.2%-57.4% of cases (mean, 32.4% Ϯ 15.3%). Agreement was no better when raters had full access to all images than among raters having access to only the selected images of the portfolio (Table 3). Interobserver agreement regarding the presence or absence of a major recurrence varied from slight to substantial, with a median of 0.455. Examples are illustrated in Fig 1.

MRA-MRA Comparisons
Inter-and intrarater agreement regarding the presence of a recurrence in the subgroup of 32 cases that were studied twice with MRA was very similar to the overall results (Table 2). Power was insufficient to study whether agreement differed with MRAs of different field strengths.

DISCUSSION
There was wide variability in the adjudication of outcomes of endovascular coiling on selected images from catheter angio-   (score C). The final angiographic result was "no recurrence" for 2/15 readers, "minor recurrence" for 6/15 readers, and "major recurrence" for 7/15 readers.
graphic or TOF MRA studies. Agreement was moderate at best and did not improve with rater experience. Disagreement among observers is not explained by divergences in the interpretations of the meaning of the various categories of the scale because agreement between 2 observations from the same raters was also fair to moderate in most cases. Reassuringly, agreement nearly reached a "substantial" level ( Ն 0.600) when the final verdict regarding the presence of a major recurrence was dichotomized (present-absent). Agreement when comparing the last angiographic result with the first MRA follow-up study (DSA-MRA), as commonly performed in clinical practice, was similar to MRA-MRA agreement. Agreement was not improved when raters had full access to all images.

Use of MRA for Aneurysm Follow-Up
Previous studies on the reliability of postcoiling follow-up imaging studies have been systematically reviewed. 4 The overall diagnostic reliability was substantial ( ϭ 0.65), but there was significant heterogeneity. Furthermore, the reliability of MRA follow-up examinations, now frequently replacing catheter angiographic studies, was not severely tested, for the primary aim of most studies was to assess diagnostic accuracy; interobserver agreement and reliability were conducted with few raters and reported as secondary quality measures. 4 By contrast, our primary aim was to test the reproducibility of angiographic outcomes as they are assessed in clinical practice with a variety of modalities (DSA, 1.5T and 3T TOF MRA) that are actually used in realistic cases selected to mimic a clinical series. Although Pierot et al 6 have proposed that contrast-enhanced MRA performed better than TOF-MRA, most studies did not find a significant difference in accuracy between modalities or field strengths (3T or 1.5T), and for pragmatic reasons, patients cannot always be followed on the same equipment. The strengths of the present study also include a large number of observers with various experience and from various institutions and the inclusion of DSA-MRA comparisons that are often used in clinical practice.

Purpose of Follow-Up after Coiling
Endovascular treatment of ruptured intracranial aneurysms was shown to improve patient outcomes compared with surgery. 15 It is also commonly used to treat unruptured aneurysms, even though it has not been shown superior to clipping or to conservative management. 16 Angiography is a pragmatic way to assess the results of treatment of unruptured aneurysms because trials powered to show a decreased incidence of clinical ruptures would necessitate a large number of patients followed for a long time. 16 The main drawback of coiling, compared with surgery, is the risk of angiographic recurrence, reported to occur in 10%-20% of patients. [17][18][19] Second-generation coils [20][21][22][23][24][25] and, in some centers, stents and flow diverters 26 have been proposed to improve the stability of treatment. Angiographic results are still the main surrogate end point of ongoing trials on modified coils 27,28 or of other studies performed for the approval of endovascular devices. 29 However, the clinical significance of angiographic recurrences remains controversial. 30 Aneurysm ruptures have been rare, and whether treated patients should even be followed has recently been questioned. 30 Some authors believe that most recurrences occur early and that follow-up imaging of aneurysms shown to be stable at 6 months is not necessary. 2,31 A randomized trial, following versus not following patients with imaging, may be indicated to settle the issue. What to do when a recurrence is identified is even more controversial. There is little agreement regarding indications for retreatment, 32 though substantial agreement between decisions based on MRA and conventional angiography has been shown in 1 study. 33

Various Scales
At least 21 different grading scales have been reported, 4 but most can be translated into the 3-value scale we have previously proposed. 11,34 It has been observed that agreement increases as the number of categories is decreased. 35 Other ways to improve the reliability of angiographic judgments have been proposed, by using volumetric measurements of residual lesions, 4 or by increasing the precision of nominal definitions, 24 but the success of such strategies remains to be demonstrated. The present work supports the idea that agreement among observers can reach an acceptable level when the scale is translated into a simple dichotomous verdict (presence or absence of a major recurrence).

Limitations
Our study has several limitations. First, we were careful to include a wide variety of cases that would test agreement in circumstances that were close to clinical conditions with multimodality imaging, but the selection of cases, images, and image parameters was artificial; a different set of cases could have led to different results. Second, we did not include patients treated with stents or stentassisted coiling; this choice may affect MRA interpretations. Third, aneurysms were, on average, larger; and posterior circulation aneurysms were more frequent than those in typical endovascular series, perhaps because we included a sufficient number of major recurrences to minimize the paradoxes of statistics. 10,11 The variability we observed in judging the extent of angiographic occlusion of treated aneurysms and the presence of a recurrence at follow-up was probably underestimated by the portfolio method we used to multiply the number of raters. Many potential sources of discrepancies (selection of images, series, or sequences; diverse techniques; and equipment from various centers) were absent. Nevertheless, the substudy on a smaller number of readers having access to all images from the PACS yielded similar results. Finally, how seriously observers worked to come to verdicts can always be questioned, and the context of assessment certainly differed from the normal clinical context.

CONCLUSIONS
There is an important variability in the assessment of angiographic outcomes of endovascular treatments. Agreement on the presence of a major recurrence when comparing 2 MRA studies or the MRA with the last catheter angiographic study can be substantial.