An MRI Rating Scale for Amyloid-Related Imaging Abnormalities with Edema or Effusion

BACKGROUND AND PURPOSE: Immune therapy against amyloid-β appears to be a promising target in Alzheimer disease. However, a dose-related risk for ARIA on FLAIR images thought to represent parenchymal vasogenic edema or sulcal effusion (termed “ARIA-E”), has been observed in clinical trials. To assess the intensity of ARIA-E presentation, an MR imaging scale that is both reproducible and easily implemented would assist in monitoring and evaluating this adverse event. MATERIALS AND METHODS: On the basis of a review of existing cases from a phase II bapineuzumab study, a scale was constructed with a 6-point score for the 6 regions on each side of the brain (range, 0–60). Scores would be obtained for both parenchymal and sulcal hyperintensities and frequently co-occurring gyral swelling. Inter-rater reliability between 2 neuroradiologists was evaluated in 20 patients, 10 with known ARIA-E and 10 without, by using the intraclass correlation coefficient. RESULTS: The 2 raters had excellent agreement in the identification of ARIA-E cases. A high inter-rater agreement was observed for scores of parenchymal hyperintensity (ICC = 0.83; 95% CI, 48–96) and sulcal hyperintensity (ICC = 0.89; 95% CI, 63–97) and for the combined scores of the 2 ARIA-E findings (ICC = 0.89; 95% CI, 62–97). Gyral swelling scores were observed to have lower inter-rater agreement (ICC = 0.54; 95% CI, −0.06–0.86). CONCLUSIONS: The proposed rating scale provides a reliable and easily implemented instrument to grade ARIA-E imaging findings. We currently do not recommend including swelling.

A lzheimer disease is a progressive neurodegenerative disease associated with dementia and is histopathologically characterized by cerebral neuronal loss, deposits of extracellular plaques of A␤, and the intraneural accumulation of hyperphosphorylated neurofibrillary tangles. 1,2 Treatment strategies targeted against these insults are being investigated; however, to date, no curative treatment exists. Therapies targeting the A␤ plaques have the longest research history, with the first animal models of immunotherapy for AD introduced Ͼ10 years ago. 3 Several human in vivo trials have been completed or are ongoing using both active and passive immunization strategies for A␤. [4][5][6] Immunization against A␤ is hypothesized to lead to an immune-mediated cleavage and removal of A␤ depositions in the brain. 7 Animal and human in vivo amyloid PET studies have shown that immunization therapy is effective in terms of A␤ removal, and several studies based on active immunization with the full-length A␤42 peptide suggested clinical benefits. 3,8,9 In addition to A␤ removal, MR imaging findings have been observed that are considered likely related to the clearance mechanism. 5,6,10 Dose-related findings include vasogenic edema, sulcal effusion, superficial siderosis, and cerebral microbleeds. The latter are also naturally observed in AD, because lobar microbleeds are related to cerebral amyloid angiopathy and AD pathology. 5,10,[11][12][13][14][15] Because both findings are considered related to amyloid pathology, the term "amyloid-related imaging abnormalities" has been proposed. ARIA is further subdivided into ARIA-H, representing hemosiderin deposits or superficial hemosiderosis, and ARIA-E, representing parenchymal vasogenic edema or sulcal effusion. ARIA-E can present with different imaging features, such as gyral swelling and sulcal hyperintensity, along with white matter hyperintensity. 16 Scoring guidelines and rating scales for the detection of microbleeds have been established and are widely used in research studies. 15,17 Given the number of clinical trials in patients with AD targeting A␤, a standardized assessment of this rather new imaging finding of ARIA-E would be useful to improve our understanding of its risk factors and outcomes. The aim of our study, therefore, was to establish a reproducible, clinically applicable, visual MR imaging rating scale for ARIA-E and to examine its internal validity in terms of inter-rater reliability.

Patient Population
All patients included in this study were part of a phase II, multicenter, randomized, double-blind, placebo-controlled multiple ascending dose immunization study by using bapineuzumab, a humanized monoclonal antibody against A␤. 5 The study was conducted at 30 sites in the United States between April 2005 and March 2008. Two hundred thirty-four patients were randomly assigned to receive either intravenous bapineuzumab or a placebo, in a ratio of 8:7, in 1 of 4 sequential dose cohorts (0.15, 0.5, 1.0, or 2.0 mg/kg). The patients had a mean age of 69 years, with slightly more women (55%), predominantly white (96%), often carrying at least 1 copy of the ApoE 4 allele (65%) and had a mean Mini-Mental State Examination score of 21 at enrollment ( Table 1). Four of the 10 included patients with ARIA-E were symptomatic on the basis of the investigator's reporting of symptoms. For more information on the study design and results see Salloway et al (2009). 5 MR imaging was performed before treatment, at week 6, and then at 13-week intervals through week 71 and included axial FLAIR sequences used to detect ARIA-E. Imaging was performed on MR imaging systems operating at 1.5T with 5-mm sections obtained in a 2D mode with 1-mm in-plane resolution. Although additional sequences such as DWI, T2-, T2*-, and T1-weighted images were originally considered for reviewing the whole MR imaging scan in general, they generally did not contribute to the detection of the sulcal hyperintensities (invisible on DWI, T2-, T2*-, and T1-weighted images). Regarding the parenchymal abnormalities, high signal on the T2-weighted images could be confused with partial volume averaging effects from the adjacent CSF and was not considered helpful in addition to FLAIR.

ARIA-E Rating Scale Development
For the construction of the scale, we used pairs of MR imaging scans (baseline and follow-up) of previously identified ARIA-E cases. All cases were preidentified by a central review and comprise a subsample of the previously published ARIA-E studies by Salloway et al 5 and Sperling et al. 10 ARIA-E was defined according to the guidelines of the Alzheimer's Association Research Round Table Workgroup 16 and included the occurrence of either sulcal or parenchymal hyperintensities. Two experienced neuroradiologists (F.B. and M.P.W.) reviewed the axial FLAIR images for number and extent of the 2 abnormalities in the 6 regions on both sides (left/right [L/R]) as used for the Age-Related White Matter Changes rating scale, 18 and they discussed their findings to define a scale based on the number and maximum in-plane crosssectional diameter of each abnormality in each region. A third characteristic, gyral swelling, was included in the rating scale by using a similar region-based approach.
A summary score can be derived, and we considered 2 approaches. One approach was to sum the regional scores on each side of the brain (L/R) for each type of abnormality, parenchymal or sulcal, with only the highest score of the 2 characteristics contributing to the score of the region. A second score was obtained by summing the highest scores among the 3 characteristics of the region (parenchymal hyperintensity, sulcal hyperintensity, or gyral swelling).

Inter-Rater Reliability Testing
To test inter-rater-reliability, 20 pairs of MR imaging scans (baseline and follow-up), including 10 previously identified ARIA-E cases (including a range of presentations from minimal extent to severe pathology) and 10 non-ARIA cases from the phase II bapineuzumab study, were scored for ARIA-E by using both summary scores. These cases had not been used in the development of scales. The raters were blinded to any clinical information and were unaware of whether the cases were ARIA-E or not. ICCs were derived for both total scores. Agreement was determined by using a 2-way random-effects-model (absolute agreement, single measures) ICC. The ICC provides an index of absolute agreement by taking the ratio between subject variability and total variability into account. 19 The ICCs were calculated for the combination of parenchymal hyperintensity and sulcal hyperintensity together (ARIA-E), followed by parenchymal hyperintensity, sulcal hyperintensity, and swelling separately and the combination of the 3 components. ICCs Յ 0.40 were designated poor-to-fair; 0.41-0.60, moderate; 0.61-0.80, good; and Ն0.81, excellent agreement. 20 Statistical analyses were conducted by using the Statistical Package for the Social Sciences, Version 17, for Windows (SPSS, Chicago, Illinois).

ARIA-E Scale Description
The developed rating scale for ARIA-E included both the location and magnitude of presentation of parenchymal hyperintensities, sulcal hyperintensities, and gyral swelling. If Ն1 of those 3 findings was present, the changes were scored according to the anatomic location in terms of lobe and side, resulting in scores for 6 regions bilaterally: frontal lobe, parietal lobe, temporal lobe, occipital lobe, central region (basal ganglia, thalamus, internal and external capsules, corpus callosum, insula), and infratentorial region (brain stem and cerebellum). Within each region, a score of 0 -5 was given on the basis of the spatial extent and multifocality of the abnormality. When a finding covered multiple lobes, the maximum in-plane diameter of the abnormality involving that particular lobe was measured and scored accordingly. Figures 1-3 provide examples of assessing the size and extent of the pathologic changes. A total score can be derived by summing up the 12 regional scores (range, 0 -60) from the characteristic, with the max-imum score defining the regional score. The scoring scheme is summarized in Table 2. The reading procedure for each patient typically took Ͻ5 minutes.

Descriptive Findings
The 10 patients with AD with ARIA-E were from a phase II study of bapineuzumab with additional descriptions of the study design and results in Salloway et al (2009) 5 and the identification of the ARIA-E findings in Sperling et al (2012). 10 Seven of the ARIA-E cases were detected on routine MR imaging, 2 were detected through retrospective MR imaging, and 1 was detected on an unscheduled MR imaging due to symptoms. All patients (69.4 Ϯ 9.5 years of age; 7 women and 3 men) who exhibited ARIA-E were diagnosed with probable AD, with a Mini-Mental State Examination score between 16 and 26 at the initial study enrollment and were assigned to the 4 dose cohorts, with 1 patient at 0.15 mg/kg, 1 at 0.5 mg/kg, 2 at 1.0 mg/kg, and 6 at 2.0 mg/kg. The 10 patients with AD without ARIA-E were selected by using a simple random-sample process from among all patients who were not identified with ARIA-E. Those patients without ARIA-E met the same eligibility requirements for AD as those with probable ARIA-E (AD and a Mini-Mental State Examination score between 16 and 26 at enrollment). They had a nearly identical age at enrollment of 69.2 Ϯ 8.4 years, with 4 women and 6 men, and they were assigned to all 4 dose cohorts, with 1 patient at 0.15 mg/kg, 2 at 0.5 mg/k, 3 at 1.0 mg/kg, and 4 at 2.0 mg/kg. As shown in Fig 4, the cases used in this study represented a wide range of ARIA-E pathology and illustrate the dynamics of the scale. Among the 5 cases with the highest scores, the score was strongly driven by parenchymal hyperintensity in cases 1 and 2 (with some additional sulcal hyperintensity), whereas sulcal hyperintensity was the major determinant in cases 3, 5, and 10 (with barely any parenchymal hyperintensity in the latter 2). Scores for swelling followed those of sulcal hyperintensity rather than those of parenchymal hyperintensity. Raters provided identical scores Parenchymal hyperintensity. Top row shows a left parietal lesion Ͻ2 cm in maximum diameter (score 1). Middle row shows a right occipital lesion Ͼ4 cm (score 4); lower row shows multifocal lesions, each Ͻ2 cm in diameter (score 2). All measurements were performed in-plane.

FIG 2.
Sulcal hyperintensity. The first column shows the FLAIR images at baseline, the second column shows the FLAIR images at follow-up, and the third column shows the FLAIR images at follow-up with a very narrow contrast window/level setting accentuating the abnormalities for descriptive purposes. Upper row shows a right frontal abnormality with a diameter Ͻ2 cm (score 1). Middle row shows a left occipital abnormality with a maximum diameter between 2 and 4 cm (score 3); and lower row shows a right parietal abnormality of Ͼ4 cm (score 4). for case 8 with a score of 3 for both sulcal hyperintensity and gyral swelling in the left frontal region by both raters. Case 7 had similar but not identical scores by the 2 raters, with both raters identifying lesions in the same regions and the same type of lesions within each region, but with 1 rater-provided score 1 category higher for 2 of the 7 regions with lesions. Both raters identified case 3 with the highest score, and the individual components scored were essentially identical. Case 10 had the largest absolute difference in the total score between the 2 raters, and this was due to 1 rater identifying 2 additional regions with lesions and having higher scores in the regions where both raters identified lesions (Fig 5).

Inter-Rater Agreement
In this limited sample, there was excellent consensus on the presence or absence of ARIA-E between the raters. Among the 10 ARIA-E cases, the ICC scores for the component characteristics and total scores are reported in Table 3, along with 95% confidence intervals. Excellent inter-rater agreement was found for the 2 elements of ARIA-E, parenchymal hyperintensity (ICC ϭ 0.83) and sulcal hyperintensity (ICC ϭ 0.89) and for a total score based on these 2 findings (ICC ϭ 0.89). Ratings of swelling exhibited only moderate concordance between the raters (ICC ϭ 0.54), with a wider confidence interval, resulting in a slightly lower ICC for the combination of the 3 features (ICC ϭ 0.78).

DISCUSSION
We present an easily applicable visual rating scale, which allows characterization of possible ARIA-E findings along 6 anatomic regions per hemisphere by using axial FLAIR images. This rating scale did not require extensive training, and both raters considered its application straightforward and easy to use. Of note, we observed a high degree of inter-rater agreement (ICC Ͼ 0.8) for the combined ARIA-E findings of sulcal and parenchymal hyperintensities, demonstrating reasonable internal validity. In the limited sample examined, the scale seems to exhibit a good dynamic range, which would allow the distinction between findings that are extensive and findings localized to a small area in a single region.
Gyral swelling was the characteristic with the least agreement between the raters. When swelling was included in the calculation of the total score, the ICC was 0.78, slightly lower when the characteristic was not included (ICC ϭ 0.89). However, the variance in the estimate substantially increased when swelling was included compared with when it was not included. The lower level of agreement for swelling in isolation is not an unexpected finding because the lack of hyperintensity makes the identification of the boundaries of swelling difficult to determine and likely enhances differences between raters. For this reason and because of the difficulty in terms of scoring particularly subtle changes, swelling was not included in the final grading system. More fundamentally, swelling may be truly based on volume increase of the parenchyma, perhaps resulting from a low level of vasogenic edema, or swelling may be due to a low level of sulcal hyperintensity, resulting in effacement of the normal sulcal low CSF signal intensity on FLAIR. These differing etiologies both contribute to the challenge in determining the boundaries and extent of the swelling (Fig 5).
The separate assessment of different ARIA-E manifestations is complicated because all 3 characteristics (parenchymal hyperintensity, sulcal hyperintensity, and gyral swelling) may be present in a given patient, may partly be indistinguishable from each other in terms of MR imaging appearance, and may be observed in the same region; these complications make it difficult to separate the characteristics in terms of spatial extent. For example, a clear differentiation between sulcal and parenchymal hyperintensities may be a challenge. In addition, sulcal hyperintensities and swelling were observed to frequently be associated with each other, and the presence of one may increase the propensity to identify the other. Additionally, they may share the same underlying etiology, as previously discussed. In our opinion, the extent of brain tissue involved is the most import characteristic to classify; thus, a total score might be most relevant, as opposed to the individual component scores (ie, parenchymal hyperintensity, sulcal hyperintensity, or gyral swelling).
A limitation of our study is that our rating scale has only been validated in terms of inter-rater agreement by 2 specialized and a Data are ranges. For each region/side (L/R), enter a score depending on the largest cross-sectional diameter: 0, no abnormalities; 1, monofocal lesion Յ2 cm; 2, multifocal lesions each Յ2 cm; 3, any lesion Ͼ2 but Ͻ4 cm; 4, any lesion Ն4 cm; 5, entire lobe. All measurements were performed in-plane.
experienced raters. This scale now awaits further evaluation in a more heterogeneous group of raters in terms of expertise, using a larger group of patients. The aim of this study was not to address the sensitivity and/or specificity in detecting subtle ARIA cases. However, considering that 10%-15% of all patients treated with immunization therapy against A␤ may exhibit ARIA-E at some time during the treatment, 10 our sample of 10 ARIA-E-positive cases representing a sample with varying severity of pathology (mild/moderate/severe) corresponds to a clinical trial dataset of approximately 100 treated patients. Further application of the scale is also needed to explore the association between the MR imaging findings and the clinical presentation of the patients in larger patient groups including more raters with different degrees of expertise. Additionally, only 2 methods of deriving a total score from the individual region scores and elements were considered; however, other approaches exist. Identifying the relationship of the presentation on the image to the clinical presentation, along with alternative scoring approaches, will be addressed by using a substantially larger series of patients in the ongoing phase III studies.

CONCLUSIONS
Our proposed visual rating scale for ARIA-E represents an easyto-apply rating scale with robust characteristics in terms of interrater agreement. On the basis of our results, this rating scale could serve as an important and useful instrument to apply in a clinical setting to monitor imaging abnormalities associated with amyloid-lowering treatment. Given the complexity of the MR imaging characteristics and the pathophysiology of ARIA-E, further validation of this visual rating scale is needed by using larger patient groups and including raters with different degrees of expertise. Most important, the scale awaits external validation of its appropriateness to assist in determining the clinical relevance of ARIA-E findings and possible implications for dose adjustment during amyloid-lowering therapy. In this patient with widespread ARIA-E involving multiple lobes on both sides of the brain, there were discrepancies between readers regarding the extent of lesions within each affected area. Upper row shows extensive sulcal hyperintensity covering large parts of the right temporal lobe, given a score of 4 by rater 1 but only a score of 3 by rater 2. Middle row shows that a score of 2 for sulcal hyperintensity (SH) was given in the left frontal lobe by rater 1 but a score 0 for SH, by rater 2. Lower row shows that a score of 5 for sulcal hyperintensity in the right occipital lobe was given by rater 1 but only a score of 3 by rater 2.