Article Text

Download PDFPDF

Original research
Inter-rater reliability of published flow diversion occlusion scales
  1. Marcus D Mazur1,
  2. Philipp Taussky1,
  3. Lubdha M Shah2,
  4. Blair Winegar2,
  5. Min S Park1
  1. 1Department of Neurosurgery, University of Utah, Salt Lake City, Utah, USA
  2. 2Department of Neuroradiology, Clinical Neurosciences Center, University of Utah, Salt Lake City, Utah, USA
  1. Correspondence to Dr Min S Park, Department of Neurosurgery, University of Utah, 175 N Medical Drive East, Salt Lake City, UT 84132, USA; min.park{at}hsc.utah.edu

Abstract

Background With increasing use of flow-diverting stents for the treatment of intracranial aneurysms, standardized methods and a common language to evaluate angiographic outcomes are needed. Multiple grading scales have been developed for this purpose but none has been widely adopted.

Objective To analyze these scales to determine interobserver reliability.

Methods Four independent assessors scored the intraprocedural angiograms of patients who underwent flow-diverting stent deployment for an intracranial saccular or fusiform aneurysm at our institution between October 2012 and June 2015. Angiographic outcome immediately after flow-diverting stent deployment was scored using three grading scales (Kamran–Byrne (KB), Simple Measurement of Aneurysm Residual after Treatment (SMART), and O'Kelley, Krings, Marotta (OKM)). Statistical analysis was performed using Light's κ for multiple raters (κ), Kendall's coefficient of concordance (W), and intraclass correlation (ICC).

Results We included the angiograms of 50 consecutive patients (mean age 58 years, range 30–79) who underwent flow-diverting stent deployment for an intracranial aneurysm (40 saccular, 10 fusiform). Six aneurysms were located in the posterior circulation. The inter-rater reliability was typically poor or fair: SMART aneurysm filling (κ=0.30, W=0.36, ICC=0.12), SMART parent vessel stenosis (κ=0.07, W=0.33, ICC=0.12), KB axis I (κ=0.24, W=0.50, ICC=0.25), KB axis II (κ=0.07, W=0.30, ICC=0.06), OKM aneurysm filling (κ=0.23, W=0.45, ICC=0.13), OKM contrast stasis (κ=0.36,W=0.71, ICC=0.54).

Conclusions Existing flow-diverting stent grading scales have low inter-rater reliability for most categories.

  • Flow Diverter
  • Aneurysm
  • Standards
  • Angiography

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

The availability of flow-diverting stents (FDSs) introduced a paradigm shift in the treatment of intracranial aneurysms. In contrast to surgical clipping or endovascular coiling of aneurysms where occlusion is expected immediately after treatment, for an aneurysm treated with a FDS, residual filling on the immediate post-treatment angiogram is commonplace. As flow is directed away from the aneurysm into the parent vessel, the aneurysm is obliterated over time as progressive stasis leads to thrombosis, and the parent vessel is reconstructed. Consequently, new grading systems have been developed to assess the angiographic outcome of aneurysm treatment using a FDS.1–3 These include the Simple Measurement of Aneurysm Residual after Treatment (SMART) scale,1 Kamran–Byrne (KB) scale,2 and the O'Kelley, Krings, Marotta (OKM) scale.3 Grading systems should help clinicians to communicate, standardize research, assess outcomes, and provide prognostic information, but the value of a grading system depends on its validity and reliability. The only attempts at evaluating the reliability of these grading scales have been done by the authors who developed them.2 ,4 Here we provide the first independent objective analysis of inter-rater reliability for three FDS grading scales.

Methods

Four assessors independently graded the intraprocedural angiograms of consecutive patients with intracranial aneurysms treated with a FDS at our institution between October 2012 and June 2015. All angiographic runs were performed during FDS placement. In each case, contrast bolus was administered by hand injection. A Pipeline embolization device (Covidien, Irvine, California, USA) was placed in all patients by a fellowship-trained endovascular neurosurgeon (PT and MSP). All angiograms were graded retrospectively by four assessors, including a board-certified fellowship-trained endovascular neurosurgeon (MSP), a board-certified fellowship-trained neuroradiologist (LMS), an endovascular neurosurgery fellow (MDM), and a neuroradiology fellow (BW). Angiograms were viewed using a picture archiving and communication system (PACS) so that each assessor could select which angiographic runs were used for the assessment to mirror real-life application. Before data collection, the internal review board at the University of Utah approved this study.

Grading scales

Angiograms were scored using three different grading scales—namely, the KB scale,2 the SMART scale,1 and the OKM scale.3

The KB scale grades two axes: axis I, degree of aneurysm occlusion, and axis II, patency status of parent artery. Axis I is a five-point numerical scale for saccular aneurysms: 0, no change in flow into the aneurysm; 1, residual contrast filling >50% of pretreatment aneurysm volume, including reduction in contrast density or stasis; 2, residual filling <50% of aneurysm volume; 3, residual filling confined to aneurysm neck; 4, no residual filling or complete obliteration. For fusiform aneurysms, axis I: 0, no change in aneurysm in-flow; 1, residual filling >50% pretreatment aneurysm length and >50% width; 2, residual filling <50% length or <50% width; 3, residual filling <50% length and <50% width; 4, no filling. Axis II is a three-point scale: a, no change in parent artery diameter; b, narrowing of parent artery; c, parent artery occlusion. The same axis II criteria are used for both saccular and fusiform aneurysms. In total, there are 15 different grading possibilities.

The OKM scale grades two dimensions: the extent to which the aneurysm fills with contrast after FDS deployment (filling grade) and the length of time contrast persists in the aneurysm with respect to angiographic phase (stasis grade). The filling grade is assigned a letter: A, total filling (>95%); B, subtotal filling (5–95%); C, entry remnant (<5%); D, no filling (0%). The stasis grade is assigned a number that refers to the angiographic phase in which contrast stasis can be seen: 1, arterial phase; 2, capillary phase; 3, venous phase. Therefore, the OKM scale permits 10 different grading possibilities.

The SMART scale assesses aneurysm occlusion and in-stent stenosis. Occlusion is assessed according to the location of residual flow within the aneurysm during the venous phase. In addition, the presence of an in-flow jet during the early phase is evaluated. Occlusion is graded on a five-point scale: 0, early-phase coherent in-flow jet present; 1, completely patent aneurysm with diffuse in-flow during venous phase; 2, reduced but residual filling that reaches dome during venous phase; 3, residual neck filling during venous phase; 4, complete occlusion. Angiograms are initially evaluated for the presence of an in-flow jet in the arterial phase regardless of the extent of aneurysm occlusion (grade 0). Lacking that, the angiogram is evaluated in the venous phase to determine the occlusion grade (1–4). In-stent stenosis is also graded on a five-point scale: 0, none; 1, mild: not hemodynamically significant; 2, moderate: 50–70%; 3, severe but not completely occluded; 4, completely occluded. The SMART scale permits 25 grading possibilities.

Statistical analysis

Statistical analysis was performed with data considered to be categorical and ordinal. Treating data as categorical provided an assessment of the extent to which assessors assigned the same grade for each angiogram. Inter-rater reliability was calculated using Light's κ for multiple raters without weighting (κ),5 which enabled determination of the absolute agreement among raters for categorical data. For ordinal data, Kendall's coefficient of concordance (W) was used with correction for ties.6 This provided a test to determine, for example, if there was concordance among raters in identifying which angiograms showed the most contrast stasis or aneurysm filling. The effect of ties is to reduce the value of W. By correcting for ties, we provide the maximum calculated value of W. Next, intraclass correlation was calculated to compare the variability of multiple raters grading the same angiogram with respect to the overall variation across all patients. A one-way single score intraclass correlation model was used.7 Finally, we determined whether there were differences in grading based on specialty background or level of training. Light's κ for categorical data was used to quantify the group variability between neurosurgeons and radiologists and between attending physicians and fellows. We assessed the strength of inter-rater reliability according to the criteria proposed by Cicchetti and Sparrow8: <0.40, poor; 0.40–0.59, fair; 0.60–0.74, good; ≥0.75, excellent. A value of 1.00 indicates perfect agreement, 0 indicates no better than chance, and negative values indicate worse than chance. Calculations were performed with R software (V.3.0.2, R Foundation for Statistical Computing, http://www.R-project.org).

Results

The angiograms of 50 consecutive patients undergoing placement of a FDS for intracranial aneurysm during the study period were included in this study. The mean age of the patients was 58 years (range 30–79). The aneurysm location included the ophthalmic artery (19), superior hypophyseal artery (8), cavernous carotid artery (6), vertebral artery (5), posterior communicating artery (4), middle cerebral artery (3), supraclinoid internal carotid artery (2), dorsal internal carotid artery (2), and basilar artery (1). Forty aneurysms were saccular and 10 were fusiform. Aneurysm size ranged from 3 to 22 mm in greatest dimension (mean 8 mm).

First, the absolute agreement among raters was determined for each grading scale when data were considered to be categorical. The inter-rater reliability between individual assessors was low for each grading scale (range, κ=0.07–0.36; table 1). The grading scale with the highest reliability value, OKM contrast stasis, still showed poor agreement (κ=0.36, p=0.06). The remaining grading scales had worse inter-rater reliability (table 1): SMART aneurysm filling (κ=0.30, p=0.02), SMART parent vessel stenosis (κ=0.07 p=0.99), KB axis I (κ=0.24, p=0.32), KB axis II (κ=0.07 p=0.99), and OKM aneurysm filling (κ=0.23, p=0.67).

Table 1

Individual variability of flow-diverting stent grading systems

Second, we quantified the level of agreement among raters when data were considered to be ordinal. For each grading scale, treating data as ordinal resulted in higher inter-rater reliability than when data were considered to be categorical (table 1). The highest reliability was for OKM contrast stasis (W=0.71; p<0.01). For the other grading scales, however, the extent of concordance was fair at best: SMART aneurysm filling (W=0.36; p=0.02), SMART parent vessel stenosis (W=0.33; p=0.07), KB axis I (W=0.50; p<0.01), KB axis II (W=0.30; p=0.16), OKM aneurysm filling (W=0.45; p<0.01).

We calculated the intraclass correlation for each grading scale to evaluate the variability of multiple raters grading the same angiogram with respect to the overall variation across all patients. OKM contrast stasis demonstrated fair correlation, 0.54 (95% CI 0.40 to 0.67), but the other grading scales were much less consistent: SMART aneurysm filling, 0.12 (0.00 to 0.28); SMART parent vessel stenosis, 0.12 (0.00 to 0.27); KB axis I, 0.25 (0.11 to 0.41); KB axis II, 0.06 (−0.04 to 0.21); OKM aneurysm filling, 0.13 (0.01 to 0.29).

Finally, to determine whether there were differences in grading based on specialty background or level of training, we compared the inter-rater reliability between neurosurgeons and radiologists and between attending physicians and fellows. Again, inter-rater reliability was low for each grading scale, ranging from a correlation less than chance (κ=−0.11) to fair agreement (κ=0.44) (table 2).

Table 2

Group variability of mean Light's κ score for flow-diverting stent grading systems

Discussion

We found that FDS angiographic grading scales have low interobserver reliability. In contrast to our results, previous papers have reported much higher interobserver agreement.2 ,4 Those studies, however, have several methodological limitations, which may collectively decrease the external validity of their reported findings.

External validation

Notably, none of the scales have been validated externally. Our effort is the first attempt by an independent group to evaluate the inter-rater reliability of these scales. Kamran et al2 reported excellent interobserver agreement of the KB scale for the assessment of both aneurysm occlusion (axis I, κ=0.89) and parent artery patency (axis II, κ=0.90). A critical concern, however, is that two of the developers of the KB scale were also assessors, which probably precludes them from performing a truly independent analysis. Joshi et al4 reported substantial (κ=0.74) interobserver agreement of the OKM scale. Thirty-one raters, which included both neuroradiologists and neurosurgeons, were presented with pre- and post-treatment angiographic images of 14 aneurysms treated with a FDS. This study design is limited by observer bias, considering that four of the five investigators were the authors who had developed the OKM grading scale.3 The study's generalizability was also hindered by the use of specific angiographic images selected by the authors. This would potentially limit the number and selection of images available to the observers for evaluation. One wonders whether the interobserver agreement would have been so high if the raters had used clinical software, such as PACS, to view the angiograms. In the clinical setting, each assessor would need to select the views and adjust image contrast independently to assign a grade. With this ‘real-life’ application, different raters might select different phases in which to assess for filling and stasis, thus introducing more variability in the final outcome.

Angiography and image processing techniques

Grading may heavily depend on angiographic technique. Image quality and technique greatly influence the ability to determine flow states and the presence of stasis. High-resolution angiography is more likely to capture subtle flow into the aneurysm. Conversely, the extent of aneurysm occlusion may be overestimated if image quality is poor or the frame rate of image acquisition is prolonged. Furthermore, differences in images may be observed depending on the volume and duration of contrast bolus or on whether the bolus was administered by an infusion pump or hand injected. For example, a prolonged injection might result in continued aneurysm filling even as the venous phase is approached. Also, the working angiographic projections for FDS deployment may not be ideal for visualizing the aneurysm. Unlike coil embolization, where the working angle is directed at the aneurysm neck, for FDS it is more often directed at the parent vessel to ensure safe and complete deployment of the device. In certain instances, a separate angiographic run would be needed to visualize the aneurysm with sufficient accuracy to allow for grading.

The technique for image processing and storage may also have an abnormally large influence on the reliability of the scales. As angiography progresses through the various phases of the run, the frame rates used for image acquisition often change, with more images acquired during the arterial phase and progressively fewer during the capillary and venous phases. Additionally, not all images recorded at the time of angiography are stored in the clinical database because of limitations in storage capacity. Often, fewer than half of the acquired images are transferred for storage. The selection of certain images for storage, combined with the difference in frame rates, can make it more difficult to identify discrete phases of the angiogram.

Aneurysm types

Certain grading criteria may be more appropriate for specific aneurysm types. In the SMART scale, aneurysm occlusion is graded in the venous phase, if no discrete, arterial in-flow jet is identified. Considering that contrast stasis is more apparent in large and giant aneurysms, the SMART scale would potentially be useful to determine the extent of occlusion in these cases. However, for smaller aneurysms in which contrast clears quickly before the venous phase, it is not clear how these cases should be graded. If an early in-flow jet is present, the aneurysm is a grade 0. If the aneurysm fills without an early in-flow jet and then clears completely before the venous phase, no category in the SMART scale accounts for this situation.

Scale complexity

In contrast to other well-established grading classifications, such as the Raymond–Roy scale for coiled aneurysms9 or the Spetzler–Martin scale for arteriovenous malformations,10 the proposed scales for FDSs are substantially more complex. The number of possible classifications is 25 for SMART, 15 for KB, and 10 for OKM. The Raymond–Roy scale, with only three possibilities, was found to have a modest interobserver reliability by independent investigators (κ=0.50).11 The Spetzler–Martin scale, a five-point grading system with 12 possible combinations, was also shown to have a wide range of interobserver reliability by independent investigators (κ=0.47–0.70).12–14 Although Joshi et al4 cited high interobserver reliability for the OKM scale (κ=0.74), we believe that the much lower reliability demonstrated in our independent analysis more closely reflects the complexity of these grading scales.

In this study, we performed statistical analysis with data treated as both categorical and ordinal. Our objective was to quantify the agreement among observers, rather than the relationship of the scales to a hypothetical ‘gold standard’, which has not been determined for aneurysms treated with a FDS. Thus, we believe that using an ordinal scale, such as that of Joshi et al,4 may overestimate the inter-rater reliability compared with our categorical analysis. Treating the data as categorical enabled us to determine how well observers agreed on individual angiographic criteria. This resulted in low reliability values, suggesting that the observers rarely agreed on the same classification for a given angiogram. When data were treated as ordinal, the reliability values increased modestly and were highest for OKM contrast stasis (κ=0.71). One explanation for this discrepancy is that many of the existing scales are too complicated, giving observers too many choices. The observation that the other grading scales had poor or fair agreement even when data were treated as ordinal suggests that there was substantial disagreement even for the extreme ends of the spectra.

As demonstrated by two of the most widely cited grading systems in the treatment of neurovascular diseases (the Raymond–Roy9 and Spetzler–Martin10 scales), ease of use is paramount in the wide-scale adoption of any grading schema. The creators of the OKM scale used straightforward parameters for the determination of aneurysm filling/occlusion and contrast stasis, but even with only two parameters, there are still 10 possible grades. Conversely, with the KB scale, even though saccular and fusiform aneurysms are considered as two separate classes of aneurysms with different methods of grading for each, the combination with the patency grades results in 15 different categories for each aneurysm type. Furthermore, grading fusiform aneurysms requires the assessor to perform some basic calculations to accurately grade the results of flow diversion treatment. The SMART scale eliminates the need for any calculation of residual filling, but there is some confusion about the terms of the venous phase grading of the aneurysm, as we highlighted earlier. Additionally, although problems with patency of the parent vessel are encountered during the initial flow diversion procedure, the clinical importance of this narrowing has not been proved.

None of the current grading scales have been validated to provide data for the clinical or radiographic outcomes of aneurysms treated with FDS. Additionally, it is not known whether any of the specific components of each scale (eg, phase of contrast stasis, degree of in-stent stenosis, extent of aneurysm filling) provides important prognostic information, nor is it known whether a higher grade portends a better or worse outcome for any of the scale components. In addition to being reliable, successful grading scales must provide some further information about outcomes to be useful. Finally, to ensure validity, future FDS grading scales should include criteria derived from high-quality clinical studies. As the use of FDS for aneurysm treatment increases, prospectively collected outcomes data will be essential for developing a clinically important grading scale.

Limitations

Our study has some limitations. We included multiple assessors to decrease potential bias in grading the angiographic results after FDS placement, but in a few cases one or two of the assessors had performed the endovascular procedure. Therefore, although this analysis is an independent assessment of the reliability of the grading scales, it is not an independent evaluation of outcomes after flow diversion. Angiography was performed with the operator injecting each contrast bolus by hand and administering the fluoroscopic exposure manually, which introduces variation in angiography technique. Although this method might decrease the internal validity of our study, it increases the generalizability of our results as our method more closely reflects the variation in angiography technique found in other hospitals. Considering the shortcomings of the grading scales, these study limitations are unlikely to alter the salient point that the published scales have low inter-rater reliability and have limited clinical use for describing angiographic results after flow diversion.

Conclusion

In an independent assessment of current grading scales for aneurysms treated with FDS, we found low inter-rater reliability with the existing published grading scales. Future scales should be valid, reliable, and easy to use and should provide prognostic information about clinical and angiographic outcomes of aneurysm treatment with flow diversion.

Acknowledgments

The authors thank Kristin Kraus, MS, for her help in preparing this manuscript.

References

Footnotes

  • Contributors MSP and MDM conceived the study. All authors performed the data collection. MDM performed the statistical analysis. MDM and MSP interpreted the results. MDM and MSP drafted the manuscript. MSP supervised the study. MSP approved the final manuscript.

  • Competing interests PT is a consultant for Covidien and a proctor for the Pipeline embolization device.

  • Ethics approval This study was approved by the institutional review board at our institution before data collection.

  • Provenance and peer review Not commissioned; externally peer reviewed.