Interobserver Reliability of Baseline Noncontrast CT Alberta Stroke Program Early CT Score for Intra-Arterial Stroke Treatment Selection

The ASPECTS has been shown to predict outcomes of early ischemic patients after intra-arterial therapy by providing semiquantitative data regarding infarction core. In this article the authors assessed the interobserver reliability of this scale in patients with proximal occlusions. CT studies in 155 patients were retrospectively analyzed by 2 independent observers. Among patients with anterior circulation proximal artery occlusions who were eligible for intra-arterial therapy, interrater reliability for ASPECTS grading was substantial across the entire scale. When using the dichotomized ASPECTS (≤ 7 versus >7) for treatment selection, agreement was only moderate, limiting its utility. In the patient cohort, approximately 25% of treatment decisions would have been affected by interrater reliability. BACKGROUND AND PURPOSE: Early ischemic changes on pretreatment NCCT quantified using ASPECTS have been demonstrated to predict outcomes after IAT. We sought to determine the interobserver reliability of ASPECTS for patients with AIS with PAO and to determine whether pretreatment ASPECTS dichotomized at 7 would demonstrate at least substantial κ agreement. MATERIALS AND METHODS: From our prospective IAT data base, we identified consecutive patients with anterior circulation PAO who underwent IAT over a 6-year period. Only those with an evaluable pretreatment NCCT were included. ASPECTS was graded independently by 2 experienced readers. Interrater agreement was assessed for total ASPECTS, dichotomized ASPECTS (≤7 versus >7), and each ASPECTS region. Statistical analysis included determination of Cohen κ coefficients and concordance correlation coefficients. PABAK coefficients were also calculated. RESULTS: One hundred fifty-five patients met our study criteria. Median pretreatment ASPECTS was 8 (interquartile range 7–9). Interrater agreement for total ASPECTS was substantial (concordance correlation coefficient = 0.77). The mean ASPECTS difference between readers was 0.2 (95% confidence interval, −2.8 to 2.4). For dichotomized ASPECTS, there was a 76.8% (119/155) observed rate of agreement, with a moderate κ = 0.53 (PABAK = 0.54). By region, agreement was worst in the internal capsule and the cortical areas, ranging from fair to moderate. After adjusting for prevalence and bias, agreement improved to substantial or near perfect in most regions. CONCLUSIONS: Interobserver reliability is substantial for total ASPECTS but is only moderate for ASPECTS dichotomized at 7. This may limit the utility of dichotomized ASPECTS for IAT selection.

N CCT is the most widely used imaging technique in the evaluation of acute stroke because of its availability, speed, and accuracy for ruling out intracranial hemorrhage. 1,2 In patients with AIS, early reperfusion therapy improves outcomes in patients with small infarcts 3,4 but is harmful in those with extensive injury (eg, greater than 33% of the MCA territory). 5 Unfortunately, during the first several hours of acute ischemic stroke, signs of tissue injury on NCCT are subtle, and interobserver agreement is limited. 6 ASPECTS is a semiquantitative grading system developed to improve the assessment of early ischemic changes in the MCA territory on NCCT. 7 In recent studies, baseline (pretreatment) dichotomized ASPECTS has been demonstrated to predict good outcomes after IAT. [8][9][10] Yet before this approach is used clinically to select patients for IAT, it should demonstrate good interobserver agreement. The literature is conflicting in this regard, with values ranging from fair to substantial. 7,11 However, these studies included a broad range of stroke severities and may not necessarily apply to the population eligible for IAT, namely, patients with proximal cerebral artery occlusions.
We sought to evaluate the reliability of baseline ASPECTS, specifically in patients with anterior circulation PAO undergoing IAT, and how this would affect treatment decisionmaking for IAT. Specifically, we aimed to determine whether pretreatment ASPECTS dichotomized at 7 would demonstrate at least substantial agreement. We also examined the interobserver reliability of ASPECTS by topographic region.

Materials and Methods
From our prospective observational IAT data base, we identified 169 consecutive patients with AIS with anterior circulation PAO (terminal ICA and/or MCA M1 or M2 segment occlusion) who underwent intraarterial treatment between June 2004 and May 2010. General inclusion criteria for IAT included the following: 1) imaging assessment completed Ͻ8 hours post ictus; 2) NCCT without intracranial hemorrhage; 3) NCCT hypodense ischemic lesion or MRI DWI hyperintense ischemic lesion Ͻ1/3 MCA territory; 4) proximal artery occlusion on CT angiography; 5) NIHSS score Ͼ8 or with significant aphasia; 6) if IV rtPA was administered, no treatment response. Additional inclusion criteria for this study included an evaluable pretreatment noncontrast head CT scan. Of the 169 patients, 14 were excluded due to the presence of chronic infarcts ipsilateral to the acute stroke or lack of pretreatment NCCT imaging at our hospital. Clinical and imaging data were retrospectively analyzed. This study was approved by our local institutional review board and was Health Insurance Portability and Accountability Act compliant.

Imaging Protocol and Analysis
NCCT scans of the head were performed in helical mode (1.25-mm thickness, kV 120, mA 250) and reconstructed in the axial plane with 5-mm section thickness. This imaging was evaluated independently by 2 experienced neuroradiologists using the ASPECTS system. Contrary to the original scoring system, which utilized only 2 brain sections, 7 readers graded early ischemic change in each of the 10 ASPECTS regions (Fig 1) according to current methodology, which utilizes all of the scan images. 12 Early ischemic change was defined as tissue hypoattenuation or loss of gray-white matter differentiation, as these changes have been associated with edema and irreversible injury. Isolated cortical swelling, which was part of the original ASPECTS criteria, was not included, as recent work has demonstrated that it is associated with increased cerebral blood volume and may represent penumbral (threatened but salvageable) tissue. 13 Discrepancies between readers were resolved by consensus adjudication. Window and level settings were adjusted at the discretion of the readers to increase contrast between normal and ischemic brain. ASPECTS was calculated by subtracting the number of affected regions from a total possible score of 10. Imaging review was performed blinded to all clinical information except stroke side.

Statistical Analysis
Interrater agreement was measured for total ASPECTS, dichotomized ASPECTS (Յ7 versus Ͼ7), and each individual topographic region. ASPECTS was dichotomized at 7, given recent evidence supporting this threshold for identifying good clinical response to intra-arterial treatment or reperfusion. [8][9][10] For total ASPECTS, agreement was measured by using the concordance correlation coefficient. In addition, Bland and Altman 14 analysis was performed to assess the absolute degree of interrater differences across the entire ASPECTS scale. For dichotomized ASPECTS and for each ASPECTS region, Cohen coefficients were calculated. In addition, because scores are influenced by prevalence and bias, PABAK scores were calculated and reported with their associated prevalence and bias indices. 15,16 Interpretation of the values followed the proposed standards of Landis and Koch: 0 -.20 (slight); 0.21-0.40 (fair); 0.41-0.60 (moderate); 0.61-0.80 (substantial); and 0.81-1.00 (almost perfect). 17 Continuous data with normal distribution were reported as mean Ϯ standard deviation, ordinal or non-normal data were reported as median (IQR), and categoric data were reported as proportions. Normality was tested by using the Kolmogorov-Smirnov test. Statistical significance was taken at P Ͻ .05. Statistical analysis was performed by using MedCalc, version 11.2.1 (MedCalc Software, Mariakerke, Belgium). The distribution of consensus ASPECTS within the cohort was skewed toward higher ASPECTS (smaller infarcts), with 97 patients demonstrating ASPECTS Ͼ7 (Fig 2). The median baseline ASPECTS was 8 (IQR 7-9). There was substantial interrater agreement for total ASPECTS grading, with a concordance correlation coefficient of 0.77 (95% CI, 0.70 -0.83). In the Bland-Altman analysis (Fig 3), there was a small difference between the mean ASPECTS (7.4 versus 7.6; P Ͻ .05). The limits of agreement (95% CI) for the ASPECTS score differences ranged from Ϫ2.8 to ϩ2.4. ASPECTS was the same between readers in 34.2% (53/155), within 1 point in 76.8% (119/155), and within 2 points in 91.6% (142/155). There were no differences greater than 3 points.  For dichotomized baseline ASPECTS (Յ7 versus Ͼ7), there was a 76.8% (119/155) observed rate of agreement. Accounting for chance, there was moderate agreement, with ϭ 0.53 (PABAK ϭ 0.54; bias index ϭ 0.04; prevalence index ϭ 0.11). Fig 4 illustrates the prevalence of ischemic involvement for each ASPECTS region using the consensus reads. The basal ganglia and insula were the most frequently affected regions. The table provides the and PABAK values for the individual ASPECTS regions. Interrater agreement was variable, with the worst performance in the internal capsule and the cortical regions. The internal capsule and M6 region demonstrated only fair agreement. When accounting for bias and prevalence, agreement improved to at least substantial in all regions except the lentiform nucleus (moderate).

Discussion
Among highly experienced readers, there was substantial interobserver agreement for grading baseline ASPECTS on acute noncontrast head CT in the setting of anterior circulation proximal artery occlusion. Most differences in ASPECTS were within 1-2 points in this study. However, only moderate interrater agreement was achieved for assessing dichotomized ASPECTS Ͼ 7 versus Յ7. In approximately 25% of our cohort, treatment decisions for IAT based on dichotomized ASPECTS would have been affected by the interrater reliability.
Our findings with respect to total ASPECTS scoring support a previous study that also demonstrated substantial agreement (weighted ϭ 0.69) across the entire ASPECTS scale. 18 In that study, the authors found a mean interrater difference of zero, with a standard deviation of 1.1 points. Similarly, we found a mean difference of 0.2, with 76.8% of scores being within 1 point.
Previous studies have shown that baseline ASPECTS Ͼ7 identifies patients who are more likely to benefit from endovascular treatment. [8][9][10] However, based on our findings, the utility of this approach may be adversely affected by its limited interrater reliability. The literature is conflicting with respect to this question. The original ASPECTS article by Barber et al 7 reported substantial to near-perfect agreement for ASPECTS dichotomized at 7, with values ranging from 0.71 to 0.89. However, another study by Mak et al 11 demonstrated a score of 0.34 (fair), with a strikingly low 42% rate of observed agreement. When adjusting for prevalence and bias, agreement improved to moderate (PABAK ϭ 0.44). Our results confirm only a moderate level of reliability ( ϭ 0.53, PABAK ϭ 0.54) for dichotomized ASPECTS in the setting of proximal occlusions.
It is unclear exactly why the interrater agreement for dichotomized ASPECTS was so different between the study by Barber et al 7 and ours. However, this difference is probably related to variable selection bias and consequently differing patient cohorts. Timing of imaging is 1 factor that can influence lesion detection. In their study, patients underwent baseline NCCT within 3 hours of stroke onset and were treated with intravenous alteplase, while in the present study, at least half of the patients were imaged beyond 3 hours (median 188.5 minutes, IQR 83-278 minutes) and were treated with IAT. However, the increased time to imaging in our study would be expected to yield more conspicuous lesions and hence a better agreement rather than a worse one. Another possible explanation for the varying agreement is a difference in ASPECTS distribution. Although the median baseline ASPECTS was 8 in both studies, a broader distribution (ie, larger variance) in 1 study would result in a smaller proportion of scores around the threshold of 7 and therefore fewer potential discrepancies. Unfortunately, Barber et al 7 did not report the proportions of individual scores in their study. In our cohort, almost half   (48.4% [75/155]) of the consensus scores were 7 or 8 and thus prone to interrater disagreement. This last issue underscores the problem with using a binary classification scheme. Clearly, dichotomization will penalize scores that are discrepant by only 1 point if they are around the threshold value. Therefore, despite our excellent agreement for total ASPECTS, we achieved only moderate agreement for dichotomized ASPECTS because of the clustering of scores around 7. This limitation may particularly affect favorable IAT candidates (ie, those who present with minimal tissue injury). If a small infarct is present in the setting of a proximal occlusion, it typically involves the caudate nucleus, lentiform, and/or insula, which are end-artery territories. As such, the baseline ASPECTS is often 7 or 8 in these patients. Also in such patients, the moderate level of agreement that we observed for the lentiform nucleus may further contribute to the relatively low rate of agreement for dichotomized ASPECTS.
Our findings with respect to individual ASPECTS regions are novel, as no study, to our knowledge, has reported interobserver agreement by region. Notably, there was only fair to moderate agreement in the internal capsule and the cortical regions. However, this was mostly related to the low prevalence of ischemic involvement in our study. When accounting for prevalence and bias, these areas demonstrated substantial or near-perfect agreement, suggesting that there was strong consensus between the readers as to what was considered abnormal. It is somewhat surprising that the agreement for total ASPECTS was high despite the low level of agreement in these multiple regions. This suggests that overall agreement is determined largely by agreement within the more prevalent ASPECTS regions (eg, basal ganglia, insula).
This study is limited primarily by our highly selected patient population, all of whom underwent IAT. This explains why our ASPECTS distribution was skewed toward smaller infarcts, which are thought to be more likely to benefit from treatment. However, it is in this specific patient subset where interobserver agreement is most important. Moreover, for the analysis of dichotomized ASPECTS, and of the individual ASPECTS regions, we were able to mitigate this issue by adjusting for prevalence. Another limitation is that we did not assess intraobserver reliability. However, we believe that interobserver agreement is more relevant with respect to the clinical utility of ASPECTS. In order for NCCT ASPECTS to be widely adopted for selecting patients for IAT, it must have good reliability between readers at different medical centers. A further limitation is that our findings cannot be extended to general clinical practice, where stroke-imaging studies are often interpreted by physicians who are not expert at ASPECTS grading. Previous studies have shown that the accuracy of ASPECTS evaluation is dependent on the level of rater experience. 7,19,20 However, we wanted to assess the reliability of ASPECTS under the most ideal conditions (ie, using readers with significant experience with stroke imaging and ASPECTS grading). In support of this last point, a recent study by Wardlaw et al 21 found that the reliability of grading acute ischemia on NCCT (versus an expert neuroradiologist) was improved if the reader was a neuroradiologist and if the scan was acquired later in the 6-hour window. In our study, both readers were neuroradiologists and at least half of the scans were acquired after 3 hours. Future studies are needed to verify our findings in an unselected population of patients with AIS with proximal occlusions.

Conclusions
Among patients with anterior circulation proximal artery occlusions who are eligible for intra-arterial therapy, interrater reliability for ASPECTS grading is substantial across the entire ASPECTS scale. However, when using dichotomized ASPECTS (Յ7 versus Ͼ7) for treatment selection, agreement is only moderate, which limits the utility of this approach. In our cohort, approximately 25% of treatment decisions would be affected by interrater reliability.
Disclosures: Joshua Hirsch-UNRELATED: Consultancy: Phillips, CareFusion, Comments: Participated in focus group of NI specialists for NI suites of the future (Phillips); participated on NextGen team for product development (CareFusion); Royalties: CareFusion, Comments: as above; Stock/Stock Options: IntraTech, Nevro, NFocus, Comments: Development stage companies that are working on NI products (IntraTech and N-Focus) and nerve imaging (Nevro). Albert Yoo-UNRELATED: Grants/Grants Pending: Penumbra* * Money paid to institution