Interobserver Agreement after Pipeline Embolization Device Implantation

BACKGROUND AND PURPOSE: Although flow diversion devices are popular in treatment of aneurysms, angiographic assessment with these devices has rarely been verified by interobserver variability study. The purpose of this study was to determine the interobserver agreement of a 3-point grading system for assessing the angiographic outcome after flow diversion therapy of intracranial, saccular aneurysms and to determine factors affecting such agreement. MATERIALS AND METHODS: After approval by the institutional review board, 5 independent readers assessed pretreatment and follow-up digital subtraction angiograms from 96 patients treated with the Pipeline embolization device by using a 3-point grading system (complete, near-complete, and incomplete occlusion). “Minor discrepancy” was defined as a difference between any 2 readers of 1 grade, that is, complete vs near-complete or near-complete vs incomplete. “Major discrepancy” was defined as a difference between any 2 readers in which 1 reader noted complete occlusion and the other reader noted incomplete occlusion. We performed statistical analysis for the interobserver agreement by using the intraclass correlation coefficient. Subgroup analyses for discrepancy rate and ICC were performed for previously coiled aneurysms. RESULTS: The interobserver agreement was excellent (ICC, 0.76; 95% CI, 0.69–0.92). Among 96 cases, there was absolute agreement in 74 (77%), of which 67 had unanimous consensus of “complete” occlusion, 2 “near-complete” occlusion, and 5 “incomplete” occlusion. Discordance between any 2 readers was noted in 22 cases (23%), of which 7 (7.3%) revealed a major discrepancy. Subgroup analysis showed that minor discrepancies were more common among patients previously treated with coils vs those not previously treated with coils (37.5% vs 11.2%; P < .05). CONCLUSIONS: The observer agreement regarding occlusion after PED therapy is excellent. Only a minority of cases demonstrated discrepancy considered as major in this study.

O ngoing, angiographic assessment of aneurysms treated by endovascular means represents a standard aspect of patient care, primarily to identify patients considered at risk for future rupture or re-rupture. In an attempt to offer quantitative or semiquantitative data for angiographic follow-up, various ordinal scales have been proposed. These scales are either descriptive or use estimates of percent volumetric occlusion. Ordinal scales usually use descriptors such as "complete," "near-complete," or "in-complete" occlusion, and some scales use terms such as "neck remnant," "dog ear," "residual neck," and "residual" aneurysm. [1][2][3][4][5][6][7][8][9] Irrespective of the type of scale used, all are subject to interobserver variability given the relatively subjective nature of the interpretations. Previous studies [10][11][12] have assessed the degree of interobserver variability for aneurysm occlusion scales, typically demonstrating substantial variability of agreement among readers.
Flow diversion devices, including the Pipeline embolization device (ev3, Irvine, California), represent an important advance in aneurysm therapy, but previous studies [13][14][15][16] imprudently adopted a 3-point scale as the follow-up marker, which was the same as for aneurysm embolization by use of coils. Moreover, angiographic outcomes with these devices rarely have been subject to formal interobserver variability studies.
Therefore, the purpose of our current study was to measure interobserver agreement of a 3-point grading system for assessing fol-low-up angiographic results after PED therapy of intracranial aneurysms.

Image Acquisition
After approval by our institutional review board, 96 cases were chosen from either a data base of the Pipeline for Uncoilable or Failed Aneurysms registry (n ϭ 56) by random selection, or from an institutional data base at the Mayo Clinic in Rochester, Minnesota (n ϭ 38), in which all were patients treated at the Mayo Clinic from June 2009 to July 2011. Some of these cases have been published previously in clinical case series, 17,18 but in no previous publication was the issue of interobserver variability specifically assessed. All patients were treated with the PED; 16 patients had a history of aneurysm recurrence after coiling. The locations of the aneurysms were the paraclinoid ICA in 90 cases, the distal ICA in 4, and the basilar artery in 2.
For acquisition of the images, 3D digital subtraction angiography was performed before the procedure, and the working projection was decided to best display the artery and ignore the aneurysm. Subsequently, 2D angiographic images of the primary working projection were taken immediately before and 6 months after PED implantation. In all cases, 2D angiographic images were obtained at the pre-operative and the follow-up visit, but 3D images were available in only 91 cases because of a limited-image data base of the PUFS registry. In 5 cases without 3D images, all of which were included in the PUFS registry, MR angiography was used in 1, CT angiography in 1, and no CT or MRA in 3. Therefore, a total of 546 images from these 96 cases were selected from a data base of the PUFS registry and the Mayo Clinic.
For each case, a digital file was made that consisted of 1 or 2 3D DSA images, both preoperative and follow-up 2D DSA images of the working projection or the conventional angles, and distributed on-line for review.

Readers and Image Interpretation
Each angiogram was evaluated by 5 experienced readers, who worked in 3 different centers in different countries (United States and Korea). There were 4 senior readers (Ͼ10 years of experience in endovascular aneurysm therapy) and 1 junior reader (S.H.S., with 7 years of experience). Four of 5 reviewers were interventional neuroradiologists, and 1 reviewer was a dual-trained endovascular neurosurgeon. Two readers (D.F.K., G.L.) had participated in the PUFS registry.
Independent of the others' assessment, each reader made his own assessment by using a 3-point grading scale, which included "complete," "near-complete," and "incomplete" occlusion. For our study, we did not provide any specific training to the readers.
Results were segregated into 3 subgroups, including 1) unanimity, in which all readers noted the same occlusion status; 2) minor discrepancy, in which the greatest discrepancy between any 2 readers was 1 grade, that is, complete vs near-complete or nearcomplete vs incomplete; and 3) major discrepancy, in which at least 2 readers differed by 2 grades, that is, complete vs incomplete occlusion. For any case with any type of disparity, we defined as "dominant" any reading that was equivalent among 3 or more readers.

Statistical Analysis
To determine interobserver agreement, we calculated the intraclass correlation coefficient with the 2-way random-effects model by using SPSS v. 19.0 (SPSS, Chicago, Illinois). The ICC ranges between Ϫ1 and ϩ1, with scores closer to ϩ1 showing better agreement. Interpretation of ICC was as follows: poor, Ͻ0.40; fair to good, 0.40 -0.75; and excellent, Ͼ0.75. 19,20 Subgroup analysis of ICC was performed for cases with and without prior coiling. The proportions of cases with discrepancy were compared between cases with vs those without prior coiling.

DISCUSSION
In our study, we demonstrated excellent agreement among 5 reviewers by using a 3-point graded response to evaluate the angiographic outcome after flow diver-sion therapy of intracranial aneurysms. Clinically relevant discrepancies, which we defined as "major" discrepancy, were uncommon in our study. Although the subgroup of previously coiled aneurysms in our series was small, we noted significantly higher rates of discrepancy when coils were present vs when they were absent. Thus, we suggest that observer variability in the angiographic assessment of flow diversion devices will be high, except in cases of prior coil embolization. When clinical trials related to flow-diverting devices are still ongoing, a reliable and valid scoring system is mandatory in designing these trials, and this finding may affect the design and implementation of such trials.
Cloft et al 10 reported observer variability of several aneurysm assessment scales by 2 experienced observers and in follow-up angiogram assessments by using a 3-point response scale. In that study, the concordant rate and the value for interobserver agreement were 72% and 0.67, respectively. Tollard et al 11 described validation of a 3-point grading scale in scoring angiographic results of coiled aneurysms by 10 readers and showed a 21%-60% concordant rate and a generalized value of approximately 0.4. Daugherty et al 12 proposed that interobserver variability of a 5-point scale regarding the need and type of treatment of recurrent coiled aneurysms presented 0.27 of the median value as an agreement of 5 observers over 27 cases. Compared with these prior studies, our study showed a higher concordant rate and better, consistent agreement among readers. It is clear that the lack of coils to obscure the aneurysm-parent artery interface will yield improved concordance among readers. Furthermore, to a greater extent, our patient population was limited to large ICA aneurysms, which may have improved concordance.
Our study had some limitations. As noted, this study was mostly restricted to aneurysms of the paraclinoid ICA, which may have limited the generalizability of our findings. There was clustering of outcomes, with most of the cases demonstrating complete occlusion, which may have lowered the statistical value. 21,22 The definition of "near-complete" or "incomplete" occlusion should be considered specifically in PED embolization because the healing mechanism of the PED is different than that of the coil, and this terminology might create confusion with clinicians regarding decisions for subsequent treatment vs observation. Moreover, in cases of multiple aneurysms, it is necessary to clarify   which is the target aneurysm intended by the physician performing the procedure. Misinterpretation of the targeted aneurysm could produce some discrepancies in such a study (Fig. 2). Owing to a limited image data base in the PUFS registry, there may have been selection bias of the images in our study. Finally, although we have defined minor and major discrepancies, the clinical ramifications of such discrepancies remain unclear.

CONCLUSIONS
The observer agreement regarding occlusion after PED therapy is excellent. Only a minority of cases demonstrated discrepancy considered as major in our study, and there was a significantly higher discrepancy in a subgroup of the aneurysms treated with coils and the PED despite its small proportion.