Predictive Utility of Marketed Volumetric Software Tools in Subjects at Risk for Alzheimer Disease: Do Regions Outside the Hippocampus Matter?

The authors assessed the prognostic efficacy of individual-versus-combined regional volumetrics in 2 commercially available brain volumetric software packages for predicting conversion of patients with mild cognitive impairment to Alzheimer disease. One hundred ninety-two subjects (mean age, 74.8 years) diagnosed with mild cognitive impairment at baseline were studied. On univariable analysis of 11 NeuroQuant and 11 Neuroreader regional volumes, hippocampal volume had the highest area under the curve for both software packages (0.69, NeuroQuant; 0.68, Neuroreader) and was not significantly different between packages. They conclude that of the multiple regional volume measures available in FDA-cleared brain volumetric software packages, hippocampal volume remains the best single predictor of conversion of mild cognitive impairment to Alzheimer disease at 3-year follow-up. BACKGROUND AND PURPOSE: Alzheimer disease is a prevalent neurodegenerative disease. Computer assessment of brain atrophy patterns can help predict conversion to Alzheimer disease. Our aim was to assess the prognostic efficacy of individual-versus-combined regional volumetrics in 2 commercially available brain volumetric software packages for predicting conversion of patients with mild cognitive impairment to Alzheimer disease. MATERIALS AND METHODS: Data were obtained through the Alzheimer's Disease Neuroimaging Initiative. One hundred ninety-two subjects (mean age, 74.8 years; 39% female) diagnosed with mild cognitive impairment at baseline were studied. All had T1-weighted MR imaging sequences at baseline and 3-year clinical follow-up. Analysis was performed with NeuroQuant and Neuroreader. Receiver operating characteristic curves assessing the prognostic efficacy of each software package were generated by using a univariable approach using individual regional brain volumes and 2 multivariable approaches (multiple regression and random forest), combining multiple volumes. RESULTS: On univariable analysis of 11 NeuroQuant and 11 Neuroreader regional volumes, hippocampal volume had the highest area under the curve for both software packages (0.69, NeuroQuant; 0.68, Neuroreader) and was not significantly different (P > .05) between packages. Multivariable analysis did not increase the area under the curve for either package (0.63, logistic regression; 0.60, random forest NeuroQuant; 0.65, logistic regression; 0.62, random forest Neuroreader). CONCLUSIONS: Of the multiple regional volume measures available in FDA-cleared brain volumetric software packages, hippocampal volume remains the best single predictor of conversion of mild cognitive impairment to Alzheimer disease at 3-year follow-up. Combining volumetrics did not add additional prognostic efficacy. Therefore, future prognostic studies in mild cognitive impairment, combining such tools with demographic and other biomarker measures, are justified in using hippocampal volume as the only volumetric biomarker.

in the clinical care of patients with memory impairment for prognosis remains challenging. 10,11 Medial temporal lobe volume assessments of MR images with visual ratings or manual or semimanual volumetric processing have been difficult to implement in a busy clinical environment due to the high interobserver variability of raters and/or the time-consuming nature of obtaining these measurements. 12 These problems have been addressed through the use of fully automated segmentation algorithms available in commercial software programs, providing the user with immediate, detailed volumetric analysis of the hippocampus and other brain regions, which is more sensitive than visual analysis. 13 Because the atrophy pattern in prodromal AD is spatially distributed, including regions beyond the hippocampus, such as the lateral and inferior temporal lobe, parietal lobe, and cingulate gyrus, 14 incorporation of such information may enhance the prognostic capability of these currently available tools. Indeed, pattern-analysis techniques, incorporating whole-brain morphologic information, [15][16][17][18][19] have been harnessed for this purpose and have shown a high prognostic value in individual patients. Such techniques, however, have not yet been implemented in commercially available products. The purpose of this study was to assess the prognostic efficacy of using the complete set of raw volumetric measures available in 2 fully automated, commercially available brain volumetric software packages. Such tools have been FDA 510(k) cleared for clinical use but have not yet been validated for specific diagnostic or prognostic purposes in AD. We sought to determine whether combining volumetrics by using multivariable approaches, including machine learning, [20][21][22] would add to the prognostic efficacy of individual measures alone, such as hippocampal volume, in predicting conversion from mild cognitive impairment (MCI) to AD. We hypothesized that the multivariable approach would enhance the prognostic value of already existing individual measures available in commercial volumetric software.

Subjects
All subject data were available through the Alzheimer's Disease Neuroimaging Initiative (ADNI), a multicenter trial with a publicly available data base (adni.loni.usc.edu). 15 The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of the ADNI has been to test whether MR imaging, PET, other biologic markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. For up-to-date information, see www.adni-info.org.
All subjects classified as having MCI under ADNI-1 or late MCI under Alzheimer's Disease Neuroimaging Initiative-Grand Opportunities (ADNI-GO), with baseline MR imaging and a baseline and 3-year clinical assessment available on or before November 11, 2013, were selected.
Conversion of MCI to AD was based on the National Institute of Neurological Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association criteria for probable or possible AD, determined by review from a committee according to protocol guidelines. Those subjects who did not fulfill these crite-ria for conversion were defined as nonconverters, including subjects with MCI and those who reverted to a normal state.

Image Acquisition and Image Analysis
All subjects underwent brain MR imaging at 1.5T or 3T with sagittal 3D T1-weighted (MPRAGE) scans. Further details on the MR imaging scanner protocol and MR image acquisition are available elsewhere (http://adni.loni.usc.edu/methods/documents/mriprotocols/). Because we selected only subjects with late MCI from the ADNI-GO, all subjects were therefore previous members of ADNI-1 and had their original scan with the ADNI-1 protocol. The MPRAGE sequence was processed with NeuroQuant (NQ; CorTechs Labs, San Diego, California) (Original Version 1.0), which is a commercially available automated image-analysis software program (http://www.cortechslabs.com/neuroquant/). Separate left and right volumes, available in the generated morphometry report, were combined into a total volume for the following 11 brain regions (features): amygdala, caudate, cerebellum, cortical gray matter, forebrain parenchyma, hippocampus, inferior lateral ventricle, lateral ventricle, pallidum, putamen, and thalamus.
In addition, the MPRAGE sequence was also processed with Neuroreader (NR; Brainreader Aps, Horsens, Denmark), another commercially available software program for volumetric segmentation (http://brainreader.net/). 23 Both NQ and NR are FDA 510(k) cleared and "intended for automatic labeling, visualization and volumetric quantification of segmentable brain structures from a set of MR images" (http://www.accessdata.fda.gov/ cdrh_docs/pdf6/K061855.pdf and https://www.accessdata.fda. gov/cdrh_docs/pdf14/K140828.pdf). Because the NR volumetric report yielded many more regions compared with NQ, 11 regions yielding the best individual predictions were chosen from the NR morphometry report to keep the models equally parsimonious. Combined bilateral volumes were obtained for the following 11 brain regions (features): amygdala, cerebellum, frontal lobe, hippocampus, lateral ventricle, occipital lobe, parietal lobe, putamen, temporal lobe, thalamus, and ventral diencephalon. Technical aspects of these commercial software packages have been previously described in detail. 24,25 Of note, all ADNI studies included 2 echo-spoiled gradient echo/ MPRAGE sequences performed back-to-back, the original and a repeat sequence accelerated by parallel imaging. In cases in which the original MPRAGE sequence could not be processed by the NR or NQ programs, the repeat MPRAGE sequence was used instead. If processing also failed on the repeat sequence, with either NR or NQ, the case was excluded from all further analysis. All image processing and analyses were performed by 2 authors (T.P.T. and J.I.). Sample segmentations from NQ and NR are shown in Fig 1A.

Statistical Analysis
We compared demographics between the MCI converter and nonconverter groups by using a 2-sample Student t test, assuming equal variance when appropriate. To assess the prognostic performance of individual features and multivariable models combining the features, we used the receiver operating characteristic (ROC) curve methodology. Area under the curve (AUC) was used to summarize the performance. Comparison of AUCs was performed by using the method of Delong et al. 26 To test whether a combination of the features would outperform models based on a single brain region, we constructed 2 multivariable models: a classic multivariable logistic regression model and a more novel machine-learning method called the Random Forest Classifier. 27 For the Random Forest Classifier, 2000 trees were used. The leave-one-out cross-validation method was used for training and testing of the multivariable models. In addition to the data-driven multivariable models, we also tested one a priori multivariable model, known as the Hippocampal Occupancy (HOC) score, defined as the ratio of hippocampal volume to the sum of hippocampal and inferior lateral ventricle volumes. This measure is thought to differentiate individuals with congenitally small hippocampi from those with degeneration. 28 The HOC score for each separate hemisphere was averaged to provide a single composite measurement. Because inferior lateral ventricle (temporal horn) volume was not available for NR, this measure was calculated only with NQ.
To assess correlations between assessment of the same volume by the NQ and NR software, we used the Pearson correlation. Statistical analyses were performed with Matlab, Version 8.1.0.604 R2013a (MathWorks, Natick, Massachusetts); R statistical and computing software, Version 3.1.3 (http://www.r-project.org); and JMP, Version 11.0 (SAS Institute, Cary, North Carolina) software packages.

RESULTS
Initially 281 subjects were identified in the ADNI databases who met the inclusion criteria. Eighty-four (29.9%) subjects were excluded due to failure to generate an NQ morphometry report. Five (1.8%) subjects were excluded for other reasons, including failure to generate an NR morphometry report. A total of 192 remaining subjects met the inclusion and exclusion criteria. All subjects had a 3-year clinical follow-up visit recorded. Mean follow-up was 3.05 Ϯ 0.14 years (range, 2.47-3.63 years). All 192 included subjects were diagnosed as having MCI at baseline, and all started under the ADNI-1 protocol. At the end of Characteristics of the study populations are listed in Table 1. Two subjects did not have the 13-item Alzheimer's Disease Assessment Scale (ADAS-13) scores at baseline. There were no significant differences (P Ͼ .05) between the MCI (nonconverter) and AD (converter) groups in terms of age, sex, or education. There was a significant difference (P Ͻ .0001) in the ADAS-13 score at baseline and the Mini-Mental State Examination score at baseline between the 2 groups. The AUC for predicting conversion at baseline was 0.76 and 0.68 for the ADAS-13 and Mini-Mental State Examination, respectively.
AUC values for NQ and NR are listed in Table 2. With NQ, the most predictive feature for conversion of MCI to AD was hippocampal volume, with an AUC of 0.69. The most predictive feature for NR was also hippocampal volume, with an AUC of 0.68. The multivariable analysis did not improve on either of these.  Intracranial volume normalization did not result in improvement of the performance of hippocampal volume in the linear model (AUC ϭ 0.65 for NQ, and 0.61 for NR), and adding age as a covariate to the normalized intracranial hippocampal volumes did not result in improvement (AUC ϭ 0.66 for NQ and 0.62 for NR).
With the NR, the AUC value for the hippocampus was significantly greater (P Ͻ .05) than the AUC value for the cerebellum, lateral ventricle, and the thalamus. With the NQ, the AUC value for the hippocampus was significantly greater (P Ͻ .05) than the AUC value for the caudate, cerebellum, lateral ventricle, pallidum, putamen, and thalamus.
There was no statistically significant difference in AUC values by using hippocampal volumes obtained by NR versus NQ (P ϭ .657).
Pearson correlation coefficients are listed in Table 3. There was a statistically significant correlation between NQ and NR absolute volumes in all compatible regions tested: thalamus, lateral ventricle, hippocampus, cerebellum, putamen, and amygdala (P Ͻ .05). The scatterplot and Bland-Altman plot for hippocampal volume are shown in Fig 1B. There was a small bias for NR with respect to NQ of Ϫ0.12 mL (95% CI, Ϫ0.02 to Ϫ0.22 mL).
The AUC for the HOC score was 0.64 (95% CI, 0.56 -0.72). There was no statistically significant difference in AUC values by using the NQ HOC score versus NQ hippocampal volume (P ϭ .1827).

DISCUSSION
Our study, with the complete set of raw volumetric measures available in 2 fully automated, commercially available brain volumetric software packages, confirms previous studies documenting the prognostic utility of hippocampal volume in patients with MCI. Indeed, hippocampal volumes provided the highest prognostic value of all individual regions available in both the NQ and NR volumetric reports. Multivariable analysis, including the use of a cross-validated machine-based learning classifier algorithm, which incorporated other brain regions available in these software packages, did not provide additional predictive value compared with a model based on just hippocampal volumes. The results of our study are in agreement with previous studies that have found hippocampal volumes to be most predictive of conversion of MCI to AD, with little added benefit from additional brain regions. For example, a large study of patients in the ADNI using semiautomated methods reported that hippocampal volumes were the most predictive of conversion compared with other regional and whole-brain measures. 29 Automated volumetric measurements of the hippocampus also had high predictive values for conversion. 2 A 2-year clinical follow-up study of patients with MCI with manual methods demonstrated that baseline hippocampal volumes had high predictive value. 5 Despite the differences in segmentation techniques (manual, semiautomated, and automated) among the 3 studies, all these studies reported that hippocampal volumes were the most predictive of MCI conversion.
On the other hand, a number of previous studies have implicated brain regions other than the hippocampus as either adding to the prognostic value of hippocampal volume or having prognostic value in their own right, though direct comparison with our study is limited due to differences in methodologies and primary outcomes. In one study using manual methods, a bivariate model of hippocampal volumes and follow-up changes to either ventricular volume or whole-brain volume was shown to have prognostic value in predicting conversion of MCI to AD. 7 Another study with automated methods found that amygdala and caudate nucleus volumes were predictive of MCI conversion, whereas hippocampal volumes were not. 12 A separate study with automated methods reported that deep gray matter structures, including the amygdala, thalamus, putamen, and nucleus accumbens, were predictive of conversion of MCI to AD, in addition to the hippocampus. 30 Temporal horn volumes were shown to be more predictive than hippocampal volumes in one study using semiautomated methods, 31 whereas the left lateral temporal lobe and left parietal cortex were the most predictive factors in another study using automated methods. 32 A likely reason for discrepancies with these studies is that volume loss in AD-affected regions is correlated with that of the hippocampus, and these collinear effects are therefore lost in the multivariable model when one accounts for hippocampal volume. For example, in our study, volume of the hippocampus was highly correlated with that of the amygdala (r ϭ 0.71) and cortical gray matter (r ϭ 0.52) for NQ, and amygdala (r ϭ 0.75) and temporal lobe (r ϭ 0.57) for NR. Another explanation for the differences in results, compared with our study, could be the use of different segmentation methods. In our study, we used an automated segmentation method with NQ and NR, which may be  less accurate in measuring smaller and deeper structures compared with manual or semiautomated segmentation methods. Also of note in this study, the HOC score, a composite index of hippocampal and temporal horn volume, did not outperform hippocampal volume alone. The HOC score has been advocated as a more accurate measurement of hippocampal tissue loss, which accounts for individual variations in hippocampal size by accounting for temporal horn volume. In our study population, temporal horn volume was somewhat collinear with hippocampal volume (r ϭ Ϫ0.33), and we did not find the HOC to be more predictive of conversion than hippocampal volume alone, suggesting that the additional variance added by temporal horn volume might be redundant information and/or noise. Although temporal horn volume was a significant predictor of conversion in the univariable logistic regression model, it lost significance when hippocampal volume was added to the model and negligibly increased the AUC. Nevertheless, our results do not exclude the possibility that combining these 2 measures may be helpful in the preclinical or mild dementia phases of the disease spectrum.
Our study has several limitations. It was theoretically possible that the ROC area for hippocampal volume might have been larger if hippocampal volumes were normalized to intracranial volume and adjusted for age. 33 Such corrections might adjust for bias across subjects since ICV-normalized and age-adjusted values are already available in the volumetric reports. Nevertheless, for our primary analysis, we chose to use the raw values, in keeping with our purpose of comparing single-versus-combined measures within the same subjects, which should not be affected by normalizing by intracranial volume, a practice that could introduce additional noise into the measures. Although age may differentially affect various brain regions, we did not have age-adjusted values available for most regions in the NQ volumetric report and therefore did not include these measures in our primary analysis. Even so, secondary analyses of hippocampal volume with intracranial volume normalization, as well as including age as a covariate in the linear model, did not yield additional prognostic efficacy. Of note, 1 group, the Coalition Against Major Diseases, reported higher AUC values for the prediction of MCI conversion to AD (in 2 years) in their de novo analysis, based on automated hippocampal volume obtained by several methods: 0.7565 (LEAP; Learning Embeddings for Atlas Propagation, http://www.ixico.com/additional-information/leap-analysis), 0.7516 (NeuroQuant), 0.7536 (FreeSurfer; http://surfer.nmr.mgh.harvard.edu), and 0.7290 (HMAPS; Hippocampus Multi-Atlas Propagation and Segmentation). 34 However, the same committee also reported a range of AUC values between 0.60 and 0.77 based on their literature review, and our results are within this range.
Both the NQ and NR programs provide a large number of data values (55 for the general morphometry report through NQ, 140 for the NR report) that we did not fully include in our machine-based learning classifier program. Increasing the number of features can hurt the performance of a classifier, given a limited sample size. Initially, we found that the addition of the full dataset resulted in a worse predictive value for MCI conversion, which we speculate was due to overfitting of the training samples. This is a common phenomenon when the number of predictor variables is high compared with the number of subjects in the training set. 35 Hence, we decided to limit our analysis to the most promising features provided in the NQ and NR volumetric reports.
In some cases, we were not able to process the MR images for the baseline sequence, which excluded some patients from the study, all of whom were chosen through the ADNI data base. In total, 89 of 281 patients were excluded due to inadequate NQ or NR morphometry data. This was most often due to patient age and sex missing from the anonymized header information, a requirement of the NQ processing pipeline but a situation unlikely to occur in clinical practice because these are standard DICOM data fields. In addition, some patients had inadequate MPRAGE sequences due to technical factors, including motion, which we were not able to segment by using the NQ or NR programs; in these cases, we substituted a repeat MPRAGE sequence. Despite occasional differences in our ability to process a particular sequence, we found a strong linear correlation (r Ͼ 0.60) between NQ and NR for all compatible regions tested. Of interest, for hippocampal volume, there was a small underestimation bias for NR with respect to NQ, which may be attributable to different segmentation algorithms. Two outliers were noted, showing differences of approximately Ϯ3 mL between software packages, which were attributable to segmentation differences.
Another limitation is that we did not incorporate biomarkers, neuropsychological assessments, or longitudinal imaging measures in our study. Rather, we sought to limit our study to testing the prognostic efficacy of 2 commercially available brain volumetric software packages in their own right, rather than by using additional data, which might confound direct comparisons. Of note, the Mini-Mental State Examination and ADAS-13 AUC scores were as high as, or higher than, those of hippocampal volume alone, in agreement with previous studies looking at the prognostic efficacy of multiple biomarkers in MCI. 36 This result is not surprising given that a dichotomous end point measure, conversion, was used to assess cognitive decline, and cognitive measures at baseline would be expected to strongly predict how close a subject is to the point of conversion. Future effort in evaluating the prognostic efficacy of these software packages should involve combining volumetric data with a suite of biomarkers and neuropsychological assessments by using machine-based learning approaches and continuous, rather than categoric, outcomes of cognitive decline. Subject factors such as financial or socioeconomic ones also remained unadjusted in the analysis. These factors would unlikely affect the within-subject design and were omitted to keep the comparison straightforward between single-versusmultiple volumetric outputs directly available from the imaging software packages.
Despite these limitations, our study had several strengths, including a large sample size of 192 patients through the ADNI data base. Indeed, ADNI uses a wide variety of vendor platforms, field strengths, and harmonized pulse sequences that patients are likely to encounter in future clinical settings. We also selected patients with 3-year clinical follow-up, which provided a more accurate designation of future conversion status, compared with studies with shorter follow-up duration.

CONCLUSIONS
Of the multiple regional volume measures available in current FDA-cleared brain volumetric software, hippocampal volume remains the best single predictor of MCI conversion to AD at 3-year follow-up. Combining volumetrics, by using multivariable approaches including a machine-learning classifier, does not appear to add additional prognostic efficacy. Therefore, future prognostic studies in mild cognitive impairment, combining such tools with demographic and other biomarker measures, are justified in using hippocampal volume as the only volumetric biomarker.