Article Text
Abstract
Background and purpose The Alberta Stroke Program Early CT Score (ASPECTS) is a widely used measure of ischemic change on non-contrast CT. Although predictive of long-term outcome, ASPECTS is limited by its modest interobserver agreement. One potential solution to this is the use of machine learning strategies, such as e-ASPECTS, to detect ischemia. Here, we compared e-ASPECTS with manual scoring by experienced neuroradiologists for all 10 individual ASPECTS regions.
Materials and methods We retrospectively reviewed 178 baseline non-contrast CT scans from patients with acute ischemic stroke undergoing endovascular thrombectomy. All scans were reviewed by two independent neuroradiologists with a third reader arbitrating disagreements for a consensus read. Each ASPECTS region was scored individually. All scans were then evaluated using a machine learning-based software package (e-ASPECTS, Brainomix). Interobserver agreement between readers and the software for each region was calculated with a kappa statistic.
Results The median ASPECTS was 9 for manual scoring and 8.5 for e-ASPECTS, with an overall agreement of κ=0.248. Regional agreement varied from κ=0.094 (M1) to κ=0.555 (lentiform), with better performance in subcortical regions. When corrected for the low number of infarcts in any given region, prevalence-adjusted bias-adjusted kappa ranged from 0.483 (insula) to 0.888 (M3), with greater agreement for cortical areas. Intraclass correlation coefficients were between 0.09 (M1) and 0.556 (lentiform).
Conclusion Manual scoring and e-ASPECTS had fair agreement in our dataset on a per-region basis. This warrants further investigation using follow-up scans or MRI as the gold standard measure of true ASPECTS.
- stroke
- thrombectomy
- CT
Data availability statement
Data are available upon reasonable request.
Statistics from Altmetric.com
Introduction
The management of acute ischemic stroke has undergone a sea change in the past 5 years, with the advent of mechanical endovascular thrombectomy (EVT) as the new gold standard treatment.1–6 Paramount to this success was the development of patient selection guidelines; the trials that established thrombectomy relied heavily on imaging selection, in particular the Alberta Stroke Program Early CT Score (ASPECTS). ASPECTS is a semi-quantitative scoring system for early ischemic changes, derived from non-contrast computed tomography (NCCT) images.7 In ASPECTS, 10 regions in the middle cerebral artery (MCA) territory of each hemisphere (figure 1) are assigned a score of 0 in the presence of early ischemic changes—parenchymal hypoattenuation, loss of grey-white differentiation, and focal swelling—or 1 in the absence of changes (ie, healthy-appearing brain), yielding a final score of 0–10.7 ASPECTS is an extensively validated scale that is useful for prognostication of long-term functional outcome following stroke.8 9 However, manual evaluation of whether a given region exhibits these changes is difficult, particularly at very early time points following stroke onset.10 As a consequence, the application of ASPECTS in practice is complicated by its variable inter-rater agreement and it requires substantial experience for accurate assessment.11 12
Advances in machine learning-based image analysis offer a potential solution to this problem. e-ASPECTS is a commercially available software suite developed for automated unbiased quantification of ASPECTS from NCCT images. Previous studies have shown e-ASPECTS to be non-inferior to experienced neuroradiologists13 and superior to stroke trainees,14 when compared with diffusion-weighted imaging or follow-up CT for accurate final infarct delineation. Moreover, the total e-ASPECTS score correlates with functional outcome following EVT15 16 and thrombolysis,17 and has shown early promise in determining whether patients are likely to benefit from interventional treatment,16 suggesting e-ASPECTS may be of value as an adjunct to clinical decision-making. In this study we sought to further validate the performance of e-ASPECTS in comparison to manual ASPECTS scoring from NCCT images by neuroradiologists in a large and well-characterized cohort of patients with ischemic stroke, with particular emphasis on its performance on a per-region basis.
Methods
Study subjects and imaging
Patients admitted to the Mayo Clinic (Rochester, Minnesota, USA) with a diagnosis of stroke were retrospectively evaluated for inclusion in the study. The inclusion criteria were: (1) clinical diagnosis of acute stroke in the anterior circulation; (2) age ≥18 years; (3) availability of NCCT images acquired within 24 hours of onset of symptoms; and (4) treatment by EVT.
Exclusion criteria included evidence of intracranial hemorrhage or other non-ischemic pathology on baseline imaging, severe artefacts in the NCCT images, and refusal of consent following research authorization checking. The study was given ethical approval by the institutional review board at the Mayo Clinic. Non-contrast CT scans were acquired with a slice thickness of 5 mm.
e-ASPECTS
e-ASPECTS scoring (Brainomix, Oxford, UK; version 8) was performed as previously described.13 Briefly, pseudonymized DICOM images were pre-processed and corrected for tilt, rotation, and other positional transformations. Based on a machine learning algorithm, ASPECTS regions were then automatically segmented and classified by the algorithm as ischemic or normal-appearing. The data were then exported to an Excel spreadsheet. When the algorithm detected non-acute hypodensity, it was counted as normal-appearing for the purposes of this comparison.
Manual ASPECTS scoring
In the same set of NCCT images, ASPECTS was manually determined by two neuroradiologists who were instructed in the appropriate use of ASPECTS.7 18 In cases of disagreement, the score was adjudicated by a third neuroradiologist. The assessors used all available slices, rather than just one supraganglionic and one ganglionic as per the original methodology. Early ischemic change was defined as the presence of hypodensity and/or loss of grey-white differentiation, with or without cortical swelling. Assessors were blinded to clinical details other than the presence of unilateral anterior circulation ischemic stroke and suspected laterality.
Statistical analysis
Baseline characteristics of the study population are shown in table 1. Comparison of performance between manual ratings and e-ASPECTS was conducted for all 10 regions separately and for composite ASPECTS. In addition, we compared superficial (M1–M6) and deep (lentiform, caudate, insula, internal capsule) regions combined—that is, presence of ischemia in any region was counted as 0 for the combined value and, if all superficial or all deep regions were normal-appearing, it was counted as 1. Comparisons were quantified as raw agreement, Cohen’s kappa, prevalence-adjusted, bias-adjusted kappa (PABAK), and intraclass correlation coefficient. All analyses were performed in R and plots created using the ggplot2 package.19 20
Results
An overview of the demographic details and imaging results is presented in table 1. Briefly, the mean age was 67.56±14.78 years and 48.9% of the patients were male. The most commonly infarcted region was the insula, in 95 cases for both e-ASPECTS and manual (53.4%), followed by the lentiform. The least commonly infarcted area was M3 for manual assessment; e-ASPECTS did not detect any patients with internal capsule ischemia. The median total ASPECTS was 9 for manual scoring and 8.5 for e-ASPECTS (see online supplementary figure 1). Bland–Altman analysis of ASPECTS found the mean difference between manual scores and e-ASPECTS to be 0.19 points (95% CI −2.38 to 2.77; figure 2).
Supplemental material
The full data of ischemic and healthy-appearing regions from manual readings and e-ASPECTS are presented in online supplementary table 1. From these data we calculated raw agreement and Cohen’s kappa (table 2). Agreement ranged from slight (M1, 0.094; M6, 0.194) to moderate (lentiform, 0.555; caudate, 0.537; M2, 0.493; insula, 0.481; M3, 0.421). As e-ASPECTS did not detect any cases with ischemia in the internal capsule, kappa could not be calculated for that region. For all superficial areas combined, the agreement was 0.482; deep regions were comparable with a kappa of 0.490. Due to the low number of ischemic areas in our dataset, we corrected kappa for prevalence and bias. PABAK values ranged from 0.742 (M5) to 0.888 (M3) in superficial areas and 0.483 (insula) to 0.719 (caudate) in deep regions. Cohen’s kappa for total ASPECTS was 0.248. Similarly, intraclass correlation coefficients (table 2) ranged from 0.09 (M1) to 0.531 (caudate), excluding internal capsule, indicating considerable variation in regional agreement between neuroradiologists and e-ASPECTS. The coefficients for superficial and deep ischemia overall were very similar at 0.484 and 0.492, respectively, while the coefficient for total ASPECTS was 0.662.
Supplemental material
Discussion
The primary aim of this paper was to compare the performance of e-ASPECTS against human scorers in a large well-characterized cohort of patients with acute stroke with baseline CT. Our results show that e-ASPECTS and manual raters had fair agreement overall in our dataset, which was exacerbated by a relatively small sample size for ischemia in some regions. On a per-region basis, following adjustment for prevalence, agreement ranged from moderate to good, with cortical regions having better agreement than deep structures. Agreement for ischemia in superficial regions combined was almost identical to agreement in deep region detection.
Performance comparisons between e-ASPECTS and manual assessment have been reported previously for total ASPECTS, including papers that have used ground truth measures such as MRI and follow-up scans. Herweh et al evaluated e-ASPECTS in 34 cases and found it to be superior to trainees and non-inferior to neuroradiologists when compared to DWI as a gold standard, based on region-specific, overall, and dichotomized (0–5 vs 6–10) scores.14 The same group confirmed this in a large cohort of 132 patients with follow-up CT as the measure of 'true' ASPECTS, and again found e-ASPECTS to be non-inferior in determining early ischemic change to neuroradiologists.13
To our knowledge, the only previous regional comparison between e-ASPECTS and manual scoring found that e-ASPECTS was more sensitive and less specific than human scorers and an alternative software solution (RAPID) in the cortex, but less sensitive and more specific in the deep regions.21 Intriguingly, e-ASPECTS also did not identify any internal capsule strokes in that dataset, despite manual assessment scoring internal capsule as ischemic in 4% of cases and RAPID scoring it in 21%. In that study, the gold standard was follow-up imaging rather than a comparison between expert consensus from baseline imaging and e-ASPECTS as reported here.
Our study has some technical limitations. First, we did not have access to follow-up scans and cannot therefore draw conclusions on the accuracy, sensitivity, and specificity of manual scoring and e-ASPECTS in our cohort. Second, although our cohort was relatively large compared with the majority of studies examining automated analysis of NCCT in stroke, the median ASPECTS value was 9 for human scorers and 8.5 for e-ASPECTS, suggesting the majority of our cases had limited ischemic changes. Third, e-ASPECTS is suggested to perform optimally at 1 mm slice thickness whereas our dataset had 5 mm slices, which may have disadvantaged the automated method.17
It is also important to note that e-ASPECTS is intended to be a decision-making aid rather than a replacement for reporting by an experienced radiologist. Its main utility is in circumstances where senior neuroradiologists might not be available and less experienced operators are determining ASPECTS, or if there is significant disagreement between clinicians. It would therefore be of interest to characterize its performance against human raters using acute estimates of ASPECTS, as retrospective scoring with multiple assessors and adjudication is likely to yield more accurate results than a single radiologist in real-world conditions. Indeed, previous comparisons of inter-rater performance have found kappa and ICC values comparable to those reported between e-ASPECTS and manual scoring in this study.22
Further investigation is also necessary to see to what extent management decisions would have been different based on discrepancies between automated and manual analysis. Our patient population was limited to those undergoing endovascular thrombectomy, and therefore only applies to patients with large vessel occlusion stroke. However, it is arguably this population in whom ASPECTS is most likely to be useful as an adjunct to clinical decision-making, as extensive ischemic change (often defined as ASPECTS ≤5) is frequently used as an exclusion criterion for thrombectomy. A decision to proceed to EVT is currently based on total ASPECTS rather than regional scores; as our patients had relatively high ASPECTS on both manual and automated scoring, it is difficult to characterize this in our dataset.
Conclusions
Manual scoring and e-ASPECTS had fair agreement in our thrombectomy dataset on a per-region basis. This warrants further investigation using follow-up scans or MRI as the gold standard measure of true ASPECTS. On a per-region basis, following adjustment for prevalence, agreement ranged from moderate to good, with cortical regions having better agreement than deep structures. Future prospective studies are needed to determine how e-ASPECTS performs compared with expert readers in predicting final infarct volume in patients undergoing successful revascularization therapies.
Data availability statement
Data are available upon reasonable request.
Ethics statements
Patient consent for publication
References
Footnotes
Twitter @DavidMihalMD
Correction notice Since this paper was first published, the middle initial C has been added to the author name John Benson.
Contributors AN and SMS analysed the data and drafted the manuscript. DM, JB, IM, DFK, and WB collected the data. DFK and WB supervised the project. All authors contributed to revising the manuscript, approved the final version, and agree to be accountable for all aspects of the work.
Funding AN was supported by an award from the Oxford University Clinical Academic Graduate School. This work was done during the term of an Award from the American Heart Association (19POST34381067) to SMS.
Competing interests The manuscript was reviewed by Brainomix, the developers of e-ASPECTS, but the authors retained full control over analysis, presentation, and discussion of results.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.