Brain Atrophy Is Associated with Disability Progression in Patients with MS followed in a Clinical Routine

BACKGROUND AND PURPOSE: The assessment of brain atrophy in a clinical routine is not performed routinely in multiple sclerosis. Our aim was to determine the feasibility of brain atrophy measurement and its association with disability progression in patients with MS followed in a clinical routine for 5 years. MATERIALS AND METHODS: A total of 1815 subjects, 1514 with MS and 137 with clinically isolated syndrome and 164 healthy individuals, were collected retrospectively. Of 11,794 MR imaging brain scans included in the analysis, 8423 MRIs were performed on a 3T, and 3371 MRIs, on a 1.5T scanner. All patients underwent 3D T1WI and T2-FLAIR examinations at all time points of the study. Whole-brain volume changes were measured by percentage brain volume change/normalized brain volume change using SIENA/SIENAX on 3D T1WI and percentage lateral ventricle volume change using NeuroSTREAM on T2-FLAIR. RESULTS: Percentage brain volume change failed in 36.7% of the subjects; percentage normalized brain volume change, in 19.2%; and percentage lateral ventricle volume change, in 3.3% because of protocol changes, poor scan quality, artifacts, and anatomic variations. Annualized brain volume changes were significantly different between those with MS and healthy individuals for percentage brain volume change (P < .001), percentage normalized brain volume change (P = .002), and percentage lateral ventricle volume change (P = .01). In patients with MS, mixed-effects model analysis showed that disability progression was associated with a 21.9% annualized decrease in percentage brain volume change (P < .001) and normalized brain volume (P = .002) and a 33% increase in lateral ventricle volume (P = .004). CONCLUSIONS: All brain volume measures differentiated MS and healthy individuals and were associated with disability progression, but the lateral ventricle volume assessment was the most feasible.

B rain atrophy assessment is an important biomarker in multiple sclerosis because of its relationship with neurodegeneration and disability progression (DP). [1][2][3] Brain atrophy develops early in the disease, 4 continues throughout its natural course, is partially independent from lesion burden, 5 is accelerated compared with normal aging, 3 and predicts development of physical and cognitive disability. 2,3,6,7 Thus, MR imaging-derived brain atrophy measurements were included in many recent phase III clinical trials as an important biomarker for determining the effect of disease-modifying treatment. 3,8,9 Evidence is mounting regarding the urgent need for incorporation of brain atrophy assessment into clinical routine and individual patient treatment monitoring. 2,3,6,7 There is also an in-creasing interest and need for monitoring the effect of diseasemodifying treatment on brain atrophy to make more personalized, patient-centric treatment choices. 10 However, there are numerous challenges to the measurement of brain atrophy in a clinical routine. 3,6,7 It is well-known that for reliable measurement of brain volume changes with time, patients should undergo imaging with the same scanner and without scanner/software/protocol changes. 3,11 However, this is very difficult to achieve in a clinical routine. 11 At this time, there are no long-term, large-cohort studies that have investigated the feasibility of measuring brain atrophy in a real-world setting, and its association with clinical outcomes.
Against this background, the aim of this study was to investigate the feasibility of brain atrophy measurement and its association with DP in a large cohort of patients with MS and clinically isolated syndrome (CIS) followed in a clinical routine for 5 years.

Subjects
This retrospective study, which included a collection of clinical and MR imaging data, enrolled 1815 subjects, of whom 1514 had MS, 137 had CIS, and 164 were healthy individuals (HI). The data were collected for 10 years. The inclusion criteria were the following: 1) consecutive subjects with MS and CIS and HI recruited and followed between 2006 and 2016 at an MS center; 2) age, sex, disease duration, and Expanded Disability Status Scale (EDSS) score (only for patients with MS and CIS) recorded at the first available MR imaging examination; 3) MR imaging examinations performed on 15T or 3T scanners; 4) two-dimensional T2-fluid attenuated inversion recovery and 3D T1-weighted imaging being part of standard clinical routine protocol; and 5) the presence of at least 2 longitudinal MR imaging pairs in the same individual subject Ն6 months apart. Exclusion criteria were the presence of a relapse and steroid treatment in the 30 days preceding the MR imaging examination for patients with CIS and MS, pre-existing medical conditions known to be associated with brain pathology (cerebrovascular disease, positive history of alcohol abuse), and pregnancy. On-line Fig 1 and the On-line Appendix provide details of the fulfilled inclusion and exclusion criteria and procedures in study subjects. The study was approved by the Human Subjects Institutional Review Board of the University at Buffalo.

MR Imaging Acquisition and Analysis
The MR imaging examinations used in the present study were performed on either 3T or 1.5T Signa Excite HD 12.0 Twin Speed 8-channel scanners (GE Healthcare, Milwaukee, Wisconsin). During the 10 years of the study, neither scanner underwent major hardware or software changes. Optimization of scanning protocols was allowed during the study, and details are provided in the On-line Appendix.
Whole-brain volume was determined on 3D T1WI that was modified using an inpainting technique to avoid tissue misclassification. 12 At baseline, normalized brain volume (NBV) was calculated using the FSL SIENAX method (http://fsl.fmrib.ox.ac. uk/fsl/fslwiki/SIENA), 13 whereas for longitudinal changes, the structural image evaluation, with normalization of atrophy (SIENA) method (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/SIENA) was used to calculate the percentage brain volume change (PBVC), and SIENAX was used to calculate the percentage NBV change. 13 NeuroSTREAM software was used to assess baseline lateral ventricle volume (LVV) and change across time on 2D-FLAIR images. 14 MR imaging analysis and quality control were performed in a fully blinded manner.
PBVC and percentage NBV and LVV changes were calculated between the first MR imaging and the most recent follow-up MR imaging and between all available MR imaging time points when Ͼ2 longitudinal MR imaging pairs were available (On-line Table  1). PBVC, and percentage NBV and LVV changes were annualized. Annualized PBVC and percentage NBV and LVV changes between serial imaging time points were also averaged on the basis of the number of all available serial MR imaging time points to obtain the annualized, cumulative yearly PBVC and percentage NBV and LVV change.

Statistical Analysis
Analyses were performed using Statistical Package for Social Science (SPSS), Version 24.0 (IBM, Armonk, New York). Differences among groups were analyzed using the 2 test, Student t test, Mann-Whitney rank sum test, and 1-way analysis of variance as appropriate. Brain volume differences among groups were calculated using analysis of covariance corrected for age, sex, and ratio of 1.5T and 3T MR imaging.
Additionally, to explore temporal associations between LVV, NBV, and PBVC and individual clinical measures (disease duration, EDSS, Multiple Sclerosis Severity Score [MSSS], and DP), univariate linear mixed-effects models with interaction terms across time were fitted and corrected for possible confounders (age, sex, field strength). LVV, NBV, and PBVC were used as dependent outcomes. Model fit was evaluated using the Akaike information criterion and included subject-level random intercept and/or time slopes. Analyses were performed in the entire study sample, as well as in the subpopulations of patients who had PBVC available between the first MR imaging and most recent follow-up. A nominal P value of Յ.05 was considered statistically significant using 2-tailed tests.

Study Sample
We enrolled 1815 subjects who met the inclusion and exclusion criteria (see Materials and Methods) (On-line Table 2). Demographic and clinical characteristics at baseline and during the follow-up are shown in On-line Table 2 and further described in the On-line Appendix.

MR Imaging Characteristics at Baseline and during Follow-Up
Of 11,794 MR imaging brain scans included in the analyses, 8423 MRIs were performed on a 3T, and 3371 MRIs, on a 1.5T scanner (On-line Table 2). The total number of MRIs from first MR imaging to most recent follow-up was 4.9 Ϯ 3.1 for MS, 3.8 Ϯ 1.9 for CIS, and 2.9 Ϯ 1.1 for HI (P Ͻ .001). The cumulative number of subjects, MRIs, and time from first to most recent follow-up are shown in On-line Table 1.
Reasons for analysis failures are shown in On-line Table 3. In particular, there were no measurement failures due to scanner changes using LVV, whereas failures occurred in 175 (11.6%) patients using PBVC and in 73 (4.8%) patients using percentage NBV change. Table 1 and On-line Tables 4 -6 describe brain volume measures at baseline and during the follow-up, according to the disease status (Table 1 and On-line Table 5) and MS subtype (On-line  Tables 4 and 6).

Brain Volume Changes with Time
At first MR imaging, the LVV was significantly higher in patients with MS compared with those with CIS and HI, while the NBV was significantly lower (P Ͻ .001 for both, Table 1). Brain volume changes from the first MR imaging to the most recent follow-up among MS, CIS, and HI were significantly different for percentage LVV change (P Ͻ .001), percentage NBV change (P Ͻ .0001), and PBVC (P Ͻ .001). Annualized brain volume changes were significantly different among MS, CIS, and HI for percentage LVV change (P ϭ .02), percentage NBV change (P ϭ .007), and PBVC (P Ͻ .001).
At first MR imaging, LVV was significantly higher in patients with progressive MS, compared with those with relapsing-remitting MS (RRMS), while NBV was lower (P Ͻ .001 for both, Online Table 4).

Longitudinal Relationship between Brain Volume Changes and Clinical Measures
Unadjusted univariate linear mixed-effect model analyses indicated that with time, longer disease duration (MS, P Ͻ .001; CIS, P Ͻ .001), higher MSSS (MS, P Ͻ .001; CIS, P Ͻ .001), and higher EDSS (MS, P Ͻ .001; CIS, P Ͻ .001) were associated with changes in LVV, NBV, and PBVC (Table 2). Patients with MS with DP had a decreased rate (Ϫ21.9%) of annualized PBVC (P Ͻ .001), an increased rate (ϩ21.6%) of annualized LVV enlargement (P Ͻ .001), and a decreased rate (Ϫ12.5%) of annualized NBV change  Note:-Est indicates estimate; DDY, disease duration. a Intercept as depicted in milliliters is the predicted value of the dependent variable when all the independent variables are restrained to zero. Estimate is a representation of the LVV volume increase and NBV volume decrease in milliliters per 1-unit increase of the independent measure per year (the interaction term with time). Estimate of PBVC is the representation of change in percentage per 1-unit increase of the independent measure per year (the interaction term with time). Volumetric (milliliters) data were fitted to random intercept and slope models, while PBVC models were fitted with random slope models.
(P ϭ .03) compared with patients without DP (Table 2 and Online Fig 2, upper row). See On-line Tables 7 and 8 for additional information.

DISCUSSION
This study provides additional insight into brain atrophy progression in a large cohort of patients with CIS and MS followed in a clinical routine as well as the relationship between the development of brain atrophy and DP. The study used retrospectively collected data for 10 years from Ͼ1800 individuals who were followed for an average of almost 5 years, and brain volume data were derived from Ͼ11,500 MR imaging examinations. Assessing brain atrophy in the clinical routine may become an important outcome for assessing the effectiveness of diseasemodifying treatment. Several reports have shown it to be one of the most reliable biomarkers of neurodegeneration that correlates with physical and cognitive impairment in patients with MS. [2][3][4]6,7,[15][16][17][18] For more than a decade, randomized controlled trials have used brain atrophy measurement as a secondary or tertiary end point to determine the effectiveness of treatment. 8,9,18 However, assessing brain atrophy in a clinical routine can be challenging due to several technical factors related to image acquisition and measurement methods. 2,3,6,7 A recent multicenter, retrospective, real-world study (Multiple Sclerosis and Clinical Outcome and MR Imaging in the United States [MS-MRIUS]) investigated the feasibility of brain atrophy measurement in a clinical routine without MR imaging protocol standardization, using academic and nonacademic centers specialized in treatment and monitoring of MS. 11 The MS-MRIUS study showed that 72% of patients with MS had 2D T1WI and only 28% had 3D T1-weighted MR imaging sequences for longitudinal brain atrophy measurement. Scanner/protocol changes occurred in Ͼ50% of patients during the 16 months of follow-up.
Image contrast and image resolution are important for a reliable and optimal segmentation of brain volume, and 3D pulse sequences are preferred for measurement of brain atrophy as the criterion standard for brain volumetric imaging because of reduced partial voluming and more accurate coregistration, especially for serial imaging with time, compared with 2D imaging. 2,3,6 In the present study, the main reasons for analysis failures were changes in imaging protocol, poor scan quality, and excessive motion artifacts. Although the scanner and software did not change during a 10-year period of data collection in the present study, subjects were examined on 2 different scanners (1.5T and 3T, On-line Table 1), and minor protocol optimization changes were allowed. Thus, measurement of whole-brain volume changes was not feasible in a substantial number of subjects. These findings are supported by the recent results of the multicenter MS-MRIUS study, 11 which found that the feasibility of brain atrophy measurement was substantially lower without the uniformity of scanners, resulting in even larger numbers of failures.
As per the inclusion criteria, all subjects underwent 3D T1WI at every time point of the study; however, longitudinal (PBVC) and cross-sectional-derived (percentage NBV change) wholebrain volume analysis of at least 2 pairs of MR imaging examinations failed in 36.7% and 19.2% of subjects, respectively. The higher prevalence of examination failure with PBVC, compared with the percentage NBV change whole-brain volume analysis, is because SIENA PBVC is a longitudinal registration-based technique requiring concomitant evaluation of the 2 times points, 13 whereas NBV is a cross-sectional measure performed on every scan separately; then, percentage changes are derived statistically between the 2 time points. 13 Due to these inherent differences in the 2 methods, the decreased feasibility of PBVC-versus-NBV change measurement was also previously shown in a recent MS clinical trial, 19 and our findings from clinical routine further confirm these findings. In the current study, scanner change was not defined a priori as a failed brain atrophy assessment. It is extremely difficult to ensure consistency of hardware and protocol use in the clinical routine over mid-to-long-term followup, even in the controlled setting of a specialized academic MS center. In addition, the results from the present study support previous multicenter findings 11 because we were able to obtain reliable LVV measurement in Ͼ96% of the study subjects longitudinally.
There is a strong need for developing and validating more simple brain volume measures that are resistant to MR imaging scanner and protocol changes and can be used in a clinical routine. 11 The assessment of LVV presents some advantages for calculation of brain atrophy on clinical routine practice scans compared with whole-brain volume measurements. 11,14,20,21 These advantages are mainly because tissue borders of the lateral ventricles have high contrast with respect to the surrounding CSF, and the position of the ventricles centrally to the FOV makes them less likely to be affected by gradient distortions, coregistration, error of tissue segmentation, incomplete head coverage, and wraparound artifacts. 3,11,14,20,22 Therefore, LVV measurement has the potential to become a meaningful and reliable measure of brain atrophy assessment when scanning protocols cannot be standardized. Several algorithms were introduced for the assessment of LVV. Most of the approaches tend to rely on research quality scans, which are sometimes not obtained in the clinical practice. 14 Contrary to those, NeuroSTREAM has the ability to operate with low-resolution scans, as confirmed in the present and previous studies. 11,14,20 We showed that patients with CIS and MS had higher annualized (first MR imaging to most recent follow-up MR imaging) and cumulative (using all available MR imaging examinations between different time points) brain volume changes compared with HI, using PBVC and percentage LVV change approaches. However, in a subsample of subjects who had PBVC, the percentage NBV change did not differentiate MS from HI; thus, crosssectional-derived whole-brain volume measures are far from ideal. 2,3,6,7 Patients with CIS showed the highest annualized cumulative percentage LVV change among all the 3 study groups (4.1% for CIS, 2.9% for MS, and 1.9% for HI). One study reported an annualized percentage LVV change of 3.4%, 23 while another study showed an annualized percentage LVV change of 5.5% in patients with RRMS. 24 A greater LVV change in patients with CIS who developed clinically definite MS (CDMS), compared with those who remained stable, was found within 1 year of followup. 16,25 Recently, it was shown that annualized percentage LVV changes between 3.1% and 3.51% on T2-FLAIR 20 correspond to a pathologic whole-brain atrophy rate of 0.4% 26 in patients with RRMS and that this LVV pathologic cutoff performs comparably with PBVC for predicting clinical outcomes. 20 The annualized LVV rates in patients with MS and CIS, as well as in HI observed in this study, are in line with the suggested LVV pathologic cutoff. 20 On the contrary, the annualized rate of whole-brain volume change (found with both longitudinal and cross-sectional-derived approaches) in this study was somewhat above the proposed pathologic cutoff of 0.4%. 26 This result can potentially be because most patients in the current study underwent first-generation disease-modifying treatments, which have a weak-to-modest impact on preventing brain atrophy 8,9,27 or they were not treated at all. Moreover, the mean baseline age of patients with MS and HI was around 46 years, which could also have contributed to somewhat accelerated whole-brain atrophy due to an aging effect. 2,3,6,7 We also evaluated brain atrophy in different MS disease subtypes and did not find significant differences in rates among MS disease subtypes during the follow-up. Our results are in line with evidence indicating that brain atrophy rates are independent of MS phenotype. 28,29 Using linear mixed-effects analysis, we showed that all brain volume measures were associated with disease duration, EDSS, MSSS, and DP in patients with MS and CIS. In a subgroup of patients with MS who had PBVC available between the first MR imaging and most recent follow-up, we found that patients with MS with DP had a 33.1% higher LVV yearly increment and 21.9% higher PBVC and NBV change yearly decrease compared with those without DP. Similar findings were found in patients with CIS who converted to CDMS, though the results did not reach significance.
Potential limitations of this study are that it did not consider potential biologic confounders that may have an impact on atrophy assessments, including diurnal fluctuations of brain volume, hydration state, and menstrual cycle 2,3,6,7 or the pseudoatrophy effect of disease-modifying treatments. However, due to the natural composition and size of the sample, we hypothesize that these confounding factors are largely driven when assessing group effects. Incorporating such confounds, though, will almost certainly be required when assessing individual patients, which would be the next step using the proposed MR imaging outcomes. A key strength of this study lies in the large number of subjects examined, with Ͼ11,500 MR imaging examinations and the follow-up of almost 5 years.

CONCLUSIONS
The present study is one of the first large cohort studies of brain atrophy measurement in patients with MS and CIS followed in a clinical routine. The study showed that T2-FLAIR-derived LVV measurement was the most feasible in a clinical routine. PBVC and percentage LVV change significantly differentiated patients with CIS and MS compared with HI, while all brain volume measures were independent of the disease subtype and predicted disability progression.