Observer Variability of an Angiographic Grading Scale Used for the Assessment of Intracranial Aneurysms Treated with Flow-Diverting Stents

BACKGROUND AND PURPOSE: Novel angiographic grading scales for the assessment of intracranial aneurysms treated with flow-diverting stents have been recently developed because previous angiographic grading scales cannot be applied to these aneurysms. The purpose of this study was to evaluate the inter- and intraobserver variability of the novel O'Kelly Marotta grading scale, which was developed specifically for the angiographic assessment of aneurysms treated with flow-diverting stents. MATERIALS AND METHODS: Multiple raters (n = 31) from the disciplines of neuroradiology and neurosurgery were presented with pre- and posttreatment angiographic images of 14 aneurysms treated with intraluminal flow diverters. Raters were asked to classify pre- and posttreatment angiograms by using the OKM grading scale. Statistical analyses were subsequently performed with calculation of a generalized multirater κ statistic for assessment of inter- and intraobserver variability and by performing a Wilcoxon signed rank sum test for assessment of group differences. RESULTS: Variability analysis of the OKM grading scale yielded substantial (κ = 0.74) and almost perfect (κ = 0.99) inter- and intraobserver agreement, respectively, with no statistically significant differences between raters with a background of neuroradiology versus neurosurgery or attending physician versus trainee. CONCLUSIONS: The OKM grading scale for the assessment of intracranial aneurysms treated with flow-diverting stents is a reliable grading scale that can be used equally well by users of varying backgrounds and levels of training. Comparison with interobserver variability of pre-existing angiographic grading scales shows equal or better performance.

T he development of flow-diverting stents has ushered in an era of new paradigms and possibilities for the endovascular treatment of intracranial aneurysms. These devices are garnering momentum for difficult intracranial aneurysms that are wide-neck, have fusiform configuration, are located in perforator territories, or have complex geometry. Their mechanism of action relies on a reduction of filling of the aneurysmal sac with flow diversion toward the parent vessel, leading to stasis and subsequent thrombosis within the aneurysm. This process usually occurs with time, and angiographic evidence of aneurysm protection is usually not seen in immediate posttreatment angiograms. Rather, the process of stasis and thrombosis may not be complete for months after initial flow-diverting stent placement. While residual filling is a suboptimal or unacceptable posttreatment angiographic result for an aneurysm treated with endosacular coiling, this can be optimal and acceptable for an aneurysm treated by using flow-diverting devices. Traditional grading scales, such as the 3-point grading scale of Roy and Raymond used for aneurysms treated with coiling or clipping, 1 do not apply to aneurysms treated with flow-diverting stents. For example, after treatment, filling within the aneurysm body would be classified as residual aneurysm and would not be considered a desirable result according to the Roy and Raymond classification, but in the case of flow-diverting stent placement, this would be an expected immediate posttreatment result.
A new grading scale specifically tailored to the angiographic assessment of aneurysms treated with flow-diverting stents was published by O'Kelly et al, 2 termed the O'Kelly Marotta grading scale. This novel grading scale incorporates 2 dimensions as pa-rameters, which reflect the mechanism by which flow-diverting devices accomplish aneurysm protection: reduction in aneurysm sac filling (filling grade), which reflects an anatomic aspect, and promotion of stasis within the aneurysm sac (stasis grade), which reflects a more dynamic or physiologic parameter (Fig 1).
With any novel grading scale, an evaluation of performance in the hands of the evaluating end user is necessary for an assessment of reliability. We present here an analysis of the interand intraobserver variability of the OKM grading scale based on evaluation of conventional angiographic images of aneurysms pre-and posttreatment with flow-diverting stents by 31 evaluators with a background in neurointervention from the disciplines of neuroradiology and neurosurgery.

MATERIALS AND METHODS
Fourteen cases of aneurysms, pre-and posttreatment with flowdiverting stents (28 conventional cerebral angiograms), were presented to 31 evaluators from the backgrounds of neuroradiology and neurosurgery. Angiograms were shown in the form of a high-resolution, scalable, multimedia-enabled, interactive document in Adobe Portable Document Format (Adobe Systems, San Jose, California). Two cases (4 angiograms) were duplicates placed randomly within the set for assessment of intraobserver variability. No patientidentifying, demographic, or flow-diversion device information was provided to any of the evaluators, and none of the evaluators had previous exposure or were involved in the treatment or follow-up of any of the cases presented. Evaluators were presented with a brief written introduction to the OKM grading scale and 2 case examples with answers. There was no personal coaching or verbal introduction given. Evaluators were asked to classify each of the angiograms of the aneurysms according to the OKM grading scale by selecting the appropriate filling grade and stasis grade from a drop-down menu.
Statistical analysis was performed by using statistical software (Statistical Package for the Social Sciences, Version 20; SPSS, Chicago, Illinois). A generalized nonweighted multirater statistic was calculated according to the method described by Siegal and Castellan. 3 Categorization of values was based on Landis and Koch 4 and reviewed in Fleiss et al. 5 Differences between groups or raters were analyzed for statistical significance by using a Wilcoxon signed rank sum test.

RESULTS
Thirty-one evaluators, predominantly neurointerventionalists, with backgrounds in neuroradiology (n ϭ 24) and neurosurgery (n ϭ 7) participated in the study. Evaluators were either fellows or attending physicians. Interobserver variability was assessed by using a generalized multirater statistic, which showed substantial agreement across all evaluators with ϭ 0.74, a very low standard error, and a highly favorable P value (Table 1). Intraobserver variability, as determined by random insertion of 4 duplicate angiograms, was also excellent with ϭ 0.99, indicating almost perfect agreement (Table 1). To determine whether there were any differences in grading based on specialty background or level of training, we performed a Wilcoxon rank sum test. While the number of neuroradiologists participating in the study exceeded the number of neurosurgeons by greater than 3:1, no statistically significant difference existed between the responses provided by either group (Table 1). There was also no statistically significant difference in responses based on level of training (ie, fellow in training versus attending physician), again, with a greater number of attending physicians compared with fellows. We were also interested in assessing the accuracy of responses with respect to each filling grade and stasis grade category. The percentage of correct responses is tabulated in Table 2. (The correct answer is based on consensus agreement in grading for a given angiogram based on evaluation by publication authors [M.D.J. and T.R.M.]) In general, accuracy was greater than 80% in all but one category, which was filling grade C, which corresponds to the category of entry remnant. According to the OKM grading scale, aneurysms that are pretreatment are graded as filling grade A, total filling, by definition (ie, Ͼ95%) even if they are partially thrombosed. This concept appears to have been wellgrasped by raters, given the 93% accuracy achieved overall for pre-and posttreatment angiograms and 100% accuracy for grade A when pretreatment subgroup scores were tabulated. Accuracy for filling grades corresponding to 5%-95% aneurysm filling, grade B, and 0% aneurysm filling, grade D, showed 84% and 80% accuracy, respectively.
Evaluators were much less accurate in choosing filling grade C, or entry remnant category, which was reflected by an accuracy rate of 45%, which is substantially less than that in any of the other filling grade categories.
The accuracy of stasis grade scores (ie, grading the degree of contrast stasis within the aneurysm) ranged from 80% to 90%. Much of the variability in determining stasis grade was seen in cases in which filling of an overlapping venous plexus was mistaken for aneurysm filling or vice versa or where subtle stasis and a very small amount of filling persisted into the venous phase but was not detected by the evaluator.

DISCUSSION
The mechanism by which flow-diverting stents divert flow from the aneurysm into the parent vessel leading to progressive stasis, thrombosis, and obliteration of the aneurysm has necessitated the development of new grading scales to assess treatment and predict effect. Widely used and validated scales for grading of coiled aneurysms such as the Roy and Raymond simplified scale 6 do not apply to those treated with flow-diverting stents for which residual filling on the immediate posttreatment angiograms is acceptable and is the norm. Since the development of the OKM grading scale, the first published for aneurysms treated by using flowdiverting stents, other grading scales have also been developed. The grading scale by Kamran et al 7 incorporates 2 main axes: axis I, assessing the degree of aneurysm occlusion, and axis II, assessing parent artery patency. This grading scale has favorable interobserver agreement for both axes (based on assessment by 2 expert raters) for anatomic parameters but does not address gradation of dynamic or physiologic parameters such as contrast stasis in the aneurysm lumen.
This study shows that the OKM grading scale has good interand intraobserver variability across many raters (n ϭ 31) from varied backgrounds of neuroradiology and neurosurgery with different levels of experience. Our study does have some limitations in that the number of cases that each reader evaluated was limited to the grading of 28 angiograms. However, a high number of raters were recruited to more robustly address interobserver variability. Interobserver values may have been higher by selecting raters who were more experienced and coached on use of the grading scale; however, in reality, end-user populations normally comprise those without personal coaching as well as those with variable levels of expertise and the study was designed to capture this aspect. Personal coaching of raters may have also led to a higher degree of accuracy for the entry remnant category, defined as filling of Ͻ5%. Part of this is based on the difficulty of visually differentiating such a narrow range of aneurysm filling (ie, Ͼ0% to Ͻ5%) accurately. From a practical standpoint, the best angiographic view to assess and identify the entry remnant category is one in which the stent and the residual filling segment of the aneurysm are both in maximal profile. This view accomplishes visualization of what would be the aneurysm neck, by conventional nomenclature, and is where subtle residual contrast filling of the aneurysm of Ͻ5% would be best visualized.
The entry remnant would appear more like a sessile entity compared with one that is pedunculated or shows layering of contrast appearing like a fluid-fluid level or parfait. The entry remnant also has a propensity to occur in instances in which a small vessel arises from the aneurysm, functioning as a sump, keeping this small portion of the aneurysm persistently filling. However, with increasing and longer cumulative experience with flow-diversion devices, we are seeing that the entry remnant category could also be seen as a separate entity with its own prognostic significance and not only a continuum variable for filling grade. Some of our initial experiences suggest that aneurysms treated with flow-diversion stents that show angiographic appearances consistent with the entry remnant category have a propensity to fail to achieve complete exclusion from circulation. The details of the pathomechanism of this phenomenon are currently being explored (J.P. Cruz, personal communication, June 2011).  The interobserver variability of the OKM grading scale is comparable with well-established angiographic grading schemes for treated aneurysms (Table 3). Cloft et al 8 evaluated various wellknown aneurysm treatment grading schemes for observer variability with 2, 3, and 4 possibilities for grading based on the angiographic appearance, and reported values of 0.63, 0.54, and 0.50, respectively. Better agreement was observed in grading schemes with fewer assignment possibilities. In the OKM scale, there are 12 possibilities for grade assignment from 4 filling grades and 3 stasis grades, with an overall interobserver value of 0.74. Moreover, the OKM scale is also comparable with or better than, with respect to interobserver variability, other grading scales that also encompass Ͼ1 dimension of observation such as the Spetzler-Martin grading scale for AVM assessment, in which values of 0.47 9 and 0.70 10 have been observed.

CONCLUSIONS
The OKM grading scale for assessment of intracranial aneurysms treated with flow-diverting stents demonstrates good inter-and intraobserver variability across a spectrum of raters of different backgrounds and experience. The scale shows comparable or better interobserver variability as judged by values compared with well-established angiographic grading scales. The OKM grading scale has not been formally validated in terms of its ability to predict treatment effect or aid in prognostic stratification; however, this will likely come to fruition as more collective long-term assessments for aneurysms treated with flow-diverting stents become available.