Reliability of Longitudinal Brain Volume Loss Measurements between 2 Sites in Patients with Multiple Sclerosis: Comparison of 7 Quantification Techniques

BACKGROUND AND PURPOSE: Brain volume loss is currently a MR imaging marker of neurodegeneration in MS. Available quantification algorithms perform either direct (segmentation-based techniques) or indirect (registration-based techniques) measurements. Because there is no reference standard technique, the assessment of their accuracy and reliability remains a difficult goal. Therefore, the purpose of this work was to assess the robustness of 7 different postprocessing algorithms applied to images acquired from different MR imaging systems. MATERIALS AND METHODS: Nine patients with MS were followed longitudinally over 1 year (3 time points) on two 1.5T MR imaging systems. Brain volume change measures were assessed using 7 segmentation algorithms: a segmentation-classification algorithm, FreeSurfer, BBSI, KN-BSI, SIENA, SIENAX, and JI algorithm. RESULTS: Intersite variability showed that segmentation-based techniques and SIENAX provided large and heterogeneous values of brain volume changes. A Bland-Altman analysis showed a mean difference of 1.8%, 0.07%, and 0.79% between the 2 sites, and a wide length agreement interval of 11.66%, 7.92%, and 11.94% for the segmentation-classification algorithm, FreeSurfer, and SIENAX, respectively. In contrast, registration-based algorithms showed better reproducibility, with a low mean difference of 0.45% for BBSI, KN-BSI and JI, and a mean length agreement interval of 1.55%. If SIENA obtained a lower mean difference of 0.12%, its agreement interval of 3.29% was wider. CONCLUSIONS: If brain atrophy estimation remains an open issue, future investigations of the accuracy and reliability of the brain volume quantification algorithms are needed to measure the slow and small brain volume changes occurring in MS.

C onventional MR imaging has proved to be the best way of monitoring inflammation in MS. However, it lacks a reliable evaluation of the neurodegenerative component of the disease. In the last decade, conventional MR-imaging-based methods have been developed to provide sensitive and repro-ducible assessments of brain volume loss that have increased the interest in brain atrophy as an index of accurate assessment and monitoring of MS progression.
Different approaches have been developed to assess brain volume loss. Several algorithms are available to perform its quantification, based either on direct or indirect measurements. Thus, they are classified as segmentation-based and registration-based techniques, respectively. Segmentationbased techniques 1,2 are based on tissue classification (CSF, GM, and WM) in a semiautomated or automated way and provide brain volume measures at a given time. While semiautomated techniques are time-consuming and less reproducible, automated techniques are found to be less accurate. 3 Segmentation-based techniques such as brain parenchymal fraction 3,4 and brain to intracranial capacity ratio 5 can be used for cross-sectional studies by including normalization for head size. FreeSurfer (http://surfer.nmr.mgh.harvard.edu), one of the most recent automatic segmentation-based algorithms, allows the measurement of specific brain anatomic regions. 6 Registration-based methods have been developed to improve longitudinal evaluations. They include brain edge motion analysis for BBSI, KN-BSI, 7,8 SIENA (http://www.fmrib. ox.ac.uk/fsl), 9,10 voxel-based statistical analysis for voxelbased morphometry, statistical parametric mapping, 11 and lo-cal Jacobian determinant analysis after nonlinear matching between coregistered images. [12][13][14] However, brain volume loss estimations are affected by physiopathologic factors such as dehydration, inflammation, or steroid therapy [15][16][17][18][19][20][21][22][23][24] and by technical factors such as MR imaging contrast quality, signal-to-noise ratio, upgrades of the MR imaging system, field inhomogeneity, and geometric distortions related to gradients. [25][26][27][28][29] Brain volume loss estimation also depends on the reliability of the postprocessing algorithms used. As no reference standard technique is available, the robustness of each algorithm is thus difficult to estimate. Recently, simulated brain volume loss measurements became available and provide data of known atrophy. 12,14,[30][31][32] Despite extensive use in epidemiologic series or clinical trials, validation of different techniques remains a major problem.
In this article, our objective was to compare the reliability of brain volume change measurements across 2 MR imaging sites of acquisition while using 7 different quantification algorithms. The algorithms used for this comparison are a segmentation-classification algorithm, FreeSurfer, BBSI, KN-BSI, SIENA, SIENAX (http://www.fmrib.ox.ac.uk/fsl), and a JI algorithm.

Patients
Nine patients were selected from a large cohort of 90 patients with MS participating in a longitudinal, biannual clinical and MR imaging follow-up. Patients were recruited from the MS clinic at the Neurologic Hospital, Lyon, France. Inclusion criteria were definite MS, according to the McDonald et al criteria, 33 and age 18 -50 years. All clinical subtypes were represented: clinically isolated syndromes, relapsing-remitting MS, secondary-progressive MS, and primary-progressive MS. Exclusion criteria were pregnancy at inclusion and any contraindication to a MR imaging examination. Immunoactive treatments were allowed. All patients gave written informed consent to participate in the longitudinal follow-up study, which was approved by the local ethics committee. The 9 patients (2 women and 7 men; 3 relapse-remitting MS, 3 secondary-progressive MS and 3 primaryprogressive MS) were selected because they underwent MR imaging examinations on 2 different MR imaging systems at 3 separate time points over 1 year of follow-up-at inclusion and at 6 and 12 months. Mean age at the inclusion was 41 years (range, 27-46 years), mean disease duration was 8 years (range, 2-18 years), and mean Expanded Disability Status Scale score was 4.5 (range, 2-6.5). Six patients received an immunoactive treatment during the period of observation.

MR Imaging Follow-Up
MR imaging examinations were performed at the 3 evaluations over a period of 1 year at each radiologic site. Images were acquired at least 1 month after a clinical relapse and/or a course of steroids. At each time point, 2 MR imaging scans were consecutively performed on 2 different systems: a 1.5T Intera (Philips Healthcare, Best, The Netherlands), located within the radiology department at the Neurologic Hospital of Lyon (hereafter, MRI1), and a 1.5T Sonata system (Siemens, Erlangen, Germany), located in the MR imaging department of CERMEP-Imagerie du vivant (hereafter, MRI2). The MR imaging protocol consisted of the acquisition of 3D millimetric T1 images without gadolinium injection using a 3D T1 fast-field echo sequence (TR, 7.56 ms; TE, 3.69 ms; matrix, 256 ϫ 256; section thickness, 2 mm; reconstructed in 1 mm; 180 sections; scan time 7 minutes and 10 seconds) with MRI1, and a MPRAGE 3D T1 (inversion recoveryturbo-flash; TR, 1970 ms; TE, 3.93 ms; TI, 1100 ms; matrix, 256 ϫ 256; section thickness, 1 mm; integrated parallel acquisition techniques, 2, 176 sections; scan time 4 minutes and 37 seconds) with MRI2. The overall image count obtained in this follow-up was 27 acquisitions on each MR imaging system.

Image Analysis
Seven postprocessing software programs were assessed, including 2 segmentation-based algorithms-a Bayesian tissue classification algorithm and FreeSurfer-and 5 registration-based algorithms: BBSI, KN-BSI, SIENA, SIENAX, and JI. Image analysis was performed randomly on the 27 pairs of images, blinded to patient clinical characteristics and clinical status.

Preprocessing
With the exception of SIENA, SIENAX, and FreeSurfer, which have their own preprocessing pipeline as part of the processing package, preprocessing steps consisted of the following: automatic extraction of the intracranial cavity mask at baseline; correction of intensity inhomogeneities using the N3 technique 34 for all time points; rigid registration of follow-up scans on the baseline scan, by optimizing mutual information to correct for rotations and translations inside the magnet; and differential bias field correction 35 to correct for differences in intensity inhomogeneity artifacts.

Markov Random Field-Based Tissue Classification
The unsupervised and adaptive classification algorithm used for automatic segmentation of brain tissue into CSF, GM, and WM is detailed elsewhere. 36 The algorithm allows unsupervised classification of the data in any number of tissue classes. It uses a statistical model that includes Bayesian distributions for brain tissue intensities and Gibbs Random Fields-based spatial contiguity constraints. Unsupervision is derived from the data and adaptivity is achieved through the variation of the size of the neighborhoods used for the estimation of the intensity characteristics. This allows slow variations of signal intensity in space to account for MR imaging intensity nonuniformity. Each scan was independently processed and reviewed by an experienced neurologist. Manual editing was performed to correct for classification errors.

FreeSurfer
The volume-based subcortical segmentation and surface-based cortical reconstruction on 3D T1-weighted images was performed using FreeSurfer software. The volume-based stream is an automated process that resections 3D spoiled gradient-echo T1WI to approximately 1 mm 3 voxel size for whole-brain tissue segmentation and specific subcortical region tissue volumes. The stream consists of 5 different stages. 6 Initially, the MR imaging volumes are registered to the Talairach space and the output images are intensity normalized. At the next stage, the skull is automatically stripped off the 3D anatomic dataset by a hybrid method that uses both watershed algorithms and deformable surface models. After skull stripping, the output brain mask is labeled using a probabilistic atlas, where each voxel in the normalized brain mask volume is assigned 1 of the following labels: cerebral white matter, cerebral cortex, lateral ventricle, inferior lateral ventricle, cerebellum white matter, cerebellum cortex, thalamus, caudate, putamen, pallidum, hippocampus, amygdala, accumbens area, third ventricle, fourth ventricle, brain stem, and CSF. For this study, FreeSurfer was run automatically with the default parameters.

BBSI and KN-BSI
BBSI is a semiautomated measure of global cerebral atrophy rates from serial MR imaging images. 7,37 The BBSI estimates the changes in cerebral volume using differences in voxel intensities between 2 serial MR imaging volume scans at the boundary region of the brain. BBSI assumes that differences between registered scans near the boundaries of cerebral structures are associated solely with the shifting of adjacent tissue. To compute the BBSI, the baseline and follow-up whole-brain regions are first delineated, using the classification algorithm previously described. A boundary and internal regions are obtained from the pair of whole brain regions. Then, intensities of the baseline and follow-up images are normalized by dividing these by the mean intensity inside the internal regions, respectively. Finally, the BBSI is computed within the boundary region using a prespecified manually chosen intensity window (I1/I2 ϭ 0.65/0.45 for site 1 and 0.50/0.70 for site 2). KN-BSI was proposed to improve the robustness and reproducibility of boundary shift integral brain atrophy measurements in multisite image studies. 38 There are 2 main improvements of the boundary shift integral: a specific intensity normalization is achieved by using K-means clustering of brain tissues and linear regression between their mean intensities. The intensity window is automatically computed from K-means clustering results, independently for each image.

SIENA
SIENA gives a fully automated analysis of whole-brain atrophy. 9,10 It includes 2 stages of segmentation: Brain Extraction Tool and the Automated Segmentation Tool from the FMRIB Software Library (http://www.fmrib.ox.ac.uk/fsl). Brain Extraction Tool makes a histogram of intensities and then transforms the image into a binary mask (object/background), which defines the center of gravity. 10 The 2 brain images are registered to each other using the skull images. 39,40 Then, FMRIB Automated Segmentation Tool classifies voxels within the brain mask of Brain Extraction Tool, in at least 3 or 4 classes, according to their intensities, to distinguish CSF from brain parenchyma and background, with or without GM-WM separation. 41 The resulting mask corresponds to brain parenchyma. Edge displacement between the 2 time points is then estimated by aligning the peaks of the spatial derivatives of the intensity profiles of both images, to subvoxel accuracy. Finally, the mean edge displacement is converted into a global estimate of percentage brain volume change between the 2 time points. Full automation limits the interactivity of the software. Errors during 1 of the processing steps require a restart of the whole procedure. For this study, SIENA was run with no manual correction and with the default parameters.

SIENAX
SIENAX gives a fully automated quantification of brain volume with a normalization for head size. 10 The algorithm extracts the skull and brain masks from 1 time-point image, which are then registered to the Montreal Neurological Institute 152 space. Images are thereby normalized for head size using the skull images to determine the registration scaling. Tissue segmentation is then realized to calculate the brain and gray and white matter volumes. The outputs provide normalized and un-normalized volumes. For this study, SIENAX was run automatically with the default parameters.

3D JI Using Tensor-Based Morphometry
TBM identifies regional structural differences, after nonlinear warping, between an image and a common anatomic template, for crosssectional analysis, or between 2 images of the same subject, for longitudinal analysis. [42][43][44] A Jacobian matrix (referred to as "J") is obtained for each voxel by taking gradient derivatives of the deformation field resulting from the nonlinear warping. 45 The determinant of these Jacobian matrices (J) represent an expansion (J Ͼ 1) or a contraction (J Ͻ 1) of each voxel after nonlinear warping, and can be represented in 3D. These voxel changes, or Jacobian determinant changes, are then integrated over the brain region to obtain an atrophy estimation of that region. TBM requires little manual interaction and is recognized as a favorable technique for large-scale brain studies. 46 The TBM techniques differ in the nonlinear registration algorithms used for the warping. 47 In this study, the nonlinear registration technique detailed in Vemuri et al 48 is used for warping, and brain atrophy is estimated by comparing each follow-up scan to the baseline scan. The expansion of the total intracranial CSF is quantified as an indirect marker of the brain parenchyma volume loss in order to limit any influence of the inflammatory activity within the brain parenchyma.

Statistical Analysis
The statistical analysis compared the results of brain volume loss obtained from each site. Intersite variability was measured by comparing the results obtained by each technique in both sites, MRI1 and MRI2. Comparisons were performed using first the Wilcoxon signed rank for the comparison of the mean percentage of brain volume loss, and then a Spearman correlation coefficient was established and tested. A Bland-Altman analysis was also performed. Finally, the SEM was estimated as:

Results
A description of the means, standard deviations, and median percentages of brain volume changes between the 7 techniques is presented for each time point and each site, MRI1 (Table 1) and MRI2 (Table 2). Fig 1 represents the changes in brain volume obtained in both sites from the 7 techniques for each patient. The results showed significant differences in percentages between the 2 sites for segmentation-based techniques, such as the segmentation-classification algorithm and FreeSurfer. In contrast, the BBSI, KN-BSI, and JI methods provided fairly stable results between the 2 sites. The results were also similar for SIENA, except for 1 patient (number 4) who presented a wide range of differences. SIENAX provided variable and heterogeneous results between the 2 sites.
The Wilcoxon signed rank test used to compare the results obtained from the 2 sites showed no significant differences between the 7 techniques. The Spearman correlation coefficients were not significant. However, the Bland-Altman analysis showed that segmentation-based techniques and SIENAX provided greater mean differences and wider agreement intervals than registration-based techniques (Fig 2). The segmentation-classification algorithm obtained a mean difference of 1.80% between the 2 sites, with a large length agreement interval of 11.66%. FreeSurfer provided a smaller mean difference, close to 0.07%, but kept a large length agreement interval of 7.92%. In contrast, registration-based techniques such as BBSI, KN-BSI, and JI algorithms were less sensitive to MR imaging system changes. The mean difference for these 3 techniques was equal to 0.45%, with a mean length agreement interval of 1.55%. SIENA obtained a reduced mean difference of 0.12% but a wider agreement interval of 3.29%. As estimated by the SEM and the delta values, the variability was larger for segmentation-based techniques and lower for registration-based techniques (Table 3).

Discussion
As atrophy is a slow process leading to small changes, the techniques used for its quantification have to be accurate and reliable. 49 In this work, we performed a comparative study of brain volume changes measured on 2 MR imaging systems, across 3 time points over 1 year, to evaluate the intersite variability and robustness. The acquisitions were performed on 9 MS patients and postprocessed by 7 different methods. The main findings showed that segmentation-based techniques provided larger and more heterogeneous values of   brain volume changes than registration-based techniques. Segmentation-based algorithms used in a semiautomatic way (with manual correction) are theoretically considered as reference standard techniques, as they provide data for each brain volume measurement from segmented masks validated by an observer. However, due to necessary manual corrections, these algorithms are time consuming and less reproducible. Thus, automatic segmentation-based algorithms are preferred. 50 The most recent automatic segmentation algorithm is FreeSurfer, which seemed more reliable than the classification algorithm used in this work. Nevertheless, segmentation-based algorithms are influenced by the quality of the acquisitions. BBSI, KN-BSI, JI, and, to a lesser extent, SIENA algorithms were less sensitive to MR imaging changes. The mean difference for BBSI, KN-BSI and JI was close to 0.45% and for SIENA was 0.12%. Mean length agreement intervals were, respectively, 1.55% for BBSI, KN-BSI and JI and 3.29% for SIENA. This is probably related to fewer errors induced by the postprocessing based on registration. Otherwise, the results obtained by SIENAX were more heterogeneous and discordant, confirming that SIENAX should be reserved for cross-sectional studies.
No matter which algorithm of quantification is used, several factors of variability may interfere with the evaluation of brain volume loss at a given time. Significant physiologic changes related to age and state of hydration 15,28,51 can lead to amplitude of variation up to 0.70% of the brain volume. 16,17 In MS, inflammatory activity corresponding to the occurrence of acute lesions transiently increases brain volume. 18 At the opposite end, treatments with high doses of corticosteroids or disease-modifying treatments may significantly and temporarily reduce brain volume. [19][20][21][22][23]52 These physiopathologic factors have a greater impact than the variations cause by scan-rescan. 17,53 In this study, these factors may not have influenced the results, as each patient was explored with each MR imaging system on the same day, under the assumption that changes caused by physiopathologic processes are not likely to occur during that time. Acquisition-related factors may also affect automated quantitative analysis results and lead to an incorrect quantification. 25 Within the same scanner  platform, upgrades do not significantly modify the results, but differences of field strength may introduce a slight bias. 27,28 However, in this study, the 2 systems were not upgraded and the B 0 fields were the same. In addition, inhomogeneity of the B 1 field can affect the reliability of tissue segmentation, as it creates image intensity distortions. However, recent studies using FreeSurfer software showed no significant changes when using or not using a correction of B 1 image intensity inhomogeneities. 27,32 Patient positioning (Z-positioning) has an effect on morphometric analyses in within-site or multisite evaluations. 26,29 It increases the mean absolute main error of SIENA from 0.17% to 0.40%. 26 The authors propose a correction for the gradient distortion, which reduced this mean absolute error to 0.15%. 26 The combination of datasets obtained from different manufacturer MR imaging platforms introduces a bias that should be considered in multisite studies. 27,28 Conclusions In this article, we evaluated the impact of changing MR imaging systems on the results of brain volume quantification and assessed the robustness of 7 postprocessing algorithms.
Intersite variability showed that segmentation-based techniques provided larger and more heterogeneous results than registration-based techniques. Segmentation-based algorithms are influenced by the quality of the acquisitions, and their use in longitudinal studies might not be recommended because of their variability. In contrast, this study showed a better reproducibility of the registration-based algorithms, such as BBSI, KN-BSI, JI, and, to a lesser extent, SIENA, which provided lower and more homogeneous results. Quantification algorithms insensitive to the quality of MR imaging acquisitions should be preferred, as MR imaging acquisitions are usually performed on different MR imaging systems in clinical practice. Nevertheless, MR imaging acquisitions should be performed at the same MR imaging site in case of longitudinal follow-ups. In the future, new postprocessing algorithms using field inhomogeneity and geometric distortion correction are needed to improve the accuracy of brain volume change quantification.