Combining MR Imaging, Positron-Emission Tomography, and CSF Biomarkers in the Diagnosis and Prognosis of Alzheimer Disease

BACKGROUND AND PURPOSE: Different biomarkers for AD may potentially be complementary in diagnosis and prognosis of AD. Our aim was to combine MR imaging, FDG-PET, and CSF biomarkers in the diagnostic classification and 2-year prognosis of MCI and AD, by examining the following: 1) which measures are most sensitive to diagnostic status, 2) to what extent the methods provide unique information in diagnostic classification, and 3) which measures are most predictive of clinical decline. MATERIALS AND METHODS: ADNI baseline MR imaging, FDG-PET, and CSF data from 42 controls, 73 patients with MCI, and 38 patients with AD; and 2-year clinical follow-up data for 36 controls, 51 patients with MCI, and 25 patients with AD were analyzed. The hippocampus and entorhinal, parahippocampal, retrosplenial, precuneus, inferior parietal, supramarginal, middle temporal, lateral, and medial orbitofrontal cortices were used as regions of interest. CSF variables included Aβ42, t-tau, p-tau, and ratios of t-tau/Aβ42 and p-tau/Aβ42. Regression analyses were performed to determine the sensitivity of measures to diagnostic status as well as 2-year change in CDR-SB, MMSE, and delayed logical memory in MCI. RESULTS: Hippocampal volume, retrosplenial thickness, and t-tau/Aβ42 uniquely predicted diagnostic group. Change in CDR-SB was best predicted by retrosplenial thickness; MMSE, by retrosplenial metabolism and thickness; and delayed logical memory, by hippocampal volume. CONCLUSIONS: All biomarkers were sensitive to the diagnostic group. Combining MR imaging morphometry and CSF biomarkers improved diagnostic classification (controls versus AD). MR imaging morphometry and PET were largely overlapping in value for discrimination. Baseline MR imaging and PET measures were more predictive of clinical change in MCI than were CSF measures.

A␤42 in neuritic plaques. 4 A full spectrum of imaging and CSF analysis methods is seldom used; thus, knowledge is limited on how they may best be combined. The ADNI, a large multisite study, was launched to enable analyses of combinations of different candidate biomarkers for AD.
Recent findings indicate that MR imaging can be used to quantify regional atrophy in MCI, distinguishing early and later preclinical stages of AD, 5 and such measures are predictive of clinical decline across 1 year. [6][7][8] A pattern of parietotemporal metabolic reductions in MCI and AD and frontal metabolic reductions later in the disease has been established through the last decades of research 1,9,10 and has recently been confirmed in ADNI PET data. 11 The relative sensitivity of FDG-PET and MR imaging morphometry to AD-related changes is, however, not well established. It has been assumed that metabolic changes associated with neocortical dysfunction may be detectable by FDG-PET before atrophy appears. Consistent with this assumption, De Santi et al 12 reported that metabolism reductions exceeded volume losses in MCI, and Mosconi et al 13 found the same in presymptomatic early-onset familial AD. However, Jagust et al 14 found that cingulate hypometabolism was a significant risk factor in addition to MR imaging measures of hippocampal atrophy, but the latter was a more statistically robust risk factor in a group of cognitively impaired but not demented elderly. 15 Different brain characteristics relevant for the understanding of MCI and AD may be captured by FDG-PET and MR imaging morphometry. For instance, a report based on ADNI data has indicated that FDG-PET and MR imaging measures may be complementary and differentially sensitive to memory in health and disease, with metabolism being the stronger predictor in healthy controls and morphometry most related to memory function in AD. 16 As for CSFϪMR imaging relations, recent reports [17][18][19][20][21][22] indicate that cerebral anatomic differences are related to tau and A␤42 and behavioral cognitive measures in AD and MCI. However, MR imaging and CSF biomarkers have not simultaneously been related and compared with information obtained by FDG-PET. It is important to test the specific sensitivity of all biomarkers simultaneously to be able to optimize the combination of measures in diagnosis and prognosis. We investigated the following: 1) which methods are the most sensitive to established AD-related pathology, 2) to what extent the methods provide unique-versus-overlapping information, and 3) which methods are the most predictive of clinical decline across 2 years.

Materials and Methods
The raw data used in the preparation of this article were obtained from the ADNI data base (www.loni.ucla.edu/ADNI). ADNI was launched in 2003 by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, the FDA, private pharmaceutical companies, and nonprofit organizations. The primary goal of ADNI has been to test whether serial MR imaging, PET, other biologic markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. The Principal Investigator of this initiative is Michael W. Weiner, of the Veterans Administration Medical Center and University of California-San Francisco. There are many coinvestigators, and subjects have been recruited from Ͼ50 sites across the United States and Canada. The ADNI has recruited 229 healthy elderly subjects, 398 patients with MCI, and 192 patients with AD to participate and be followed for 2-3 years. For up-to-date information, see www.adniinfo.org.

Sample
ADNI eligibility criteria are described at http://www.adni-info.org/ index.php?optionϭcom_content&taskϭview&idϭ9&Itemidϭ43. Briefly, participants were 55-90 years of age, had an informant providing an independent evaluation of functioning, and spoke English or Spanish. Subjects were willing and able to undergo test procedures, including neuroimaging and longitudinal follow-up, and all gave informed consent. Specific psychoactive medications were excluded. General inclusion/exclusion criteria of the ADNI study are as follows: 1) healthy subjects: MMSE 23 scores between 24 and 30 inclusive (no person enrolled as an NC in the present sample had an MMSE score below 26); CDR of 0, nondepressed, non-MCI, and nondemented; 2) subjects with MCI: MMSE scores between 24 and 30 inclusive (exceptions made on a case-by-case basis, but no such exceptional cases were enrolled as patients with MCI in the present sample), a memory complaint, objective memory loss measured by Wechsler Memory Scale Logical Memory II, 24 CDR of 0.5, absence of significant levels of impairment in other cognitive domains, essentially preserved activities of daily living, and an absence of dementia; and 3) mild AD: MMSE scores between 20 and 26 inclusive (exceptions made on a case-by-case basis), CDR of 0.5 or 1.0, and met the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association criteria for probable AD. 25 Only ADNI subjects for whom adequate processed and quality checked MR imaging, FDG-PET, and CSF baseline data were available were included. This yielded a total of 153 participants. Demographics are shown in Table 1.

Standard Protocol Approvals, Registrations, and Patient Consents
The protocol was approved by the institutional review boards of participating sites. Written informed consent was obtained from all subjects or from guardians of patients.

MR Imaging Acquisition and Analysis
All scans used here were from 1.5T scanners. Data were collected across a variety of scanners with protocols individualized for each scanner, as defined at http://www.loni.ucla.edu/ADNI/Research/ Cores/index.shtml, and processed as described elsewhere. 5,16 Briefly, raw DICOM MR imaging scans (including 2 T1-weighted volumes per case) were downloaded from the ADNI site (http://www.loni. ucla.edu/ADNI/Data/index.shtml), reviewed for quality, automatically corrected for spatial distortion due to gradient nonlinearity 26 and B 1 field inhomogeneity, 27 registered, and averaged to improve signal intensity-to-noise ratios. Scans were segmented as described by Fischl et al, 28 yielding volumetric data for the hippocampal formation (consisting of the dentate gyrus, Cornu Ammonis fields, subiculum/parasubiculum, and the fimbria 29 ). The procedure 28,30 uses a probabilistic atlas and applies a Bayesian classification rule to assign a neuroanatomic label to each voxel. The cortical surface was reconstructed to measure thickness at each surface point by using a semiautomated approach described elsewhere. [31][32][33][34][35][36] Thickness measurements were obtained by reconstructing representations of the gray/white matter boundary 31,32 and the pial surface and then calculating the distance between those surfaces at each point across the cortical mantle. The measurement technique used here has been validated via histologic 37 and manual measurements. 38 The entire cortical surface was parcellated into numerous cortical areas. 30,39 To limit multiple comparisons, we selected candidate regions of interest on the basis of previous MR imaging and PET findings 1,5,7,11,16,[40][41][42][43][44] indicating sensitivity to AD-related pathology: the hippocampi, and the entorhinal, parahippocampal, retrosplenial, precuneus, inferior parietal, supramarginal, middle temporal, and lateral and medial orbitofrontal gyri.
In the parcellation method used here, 39 the entire cingulate cortex was defined and divided into 4 separate regions, including the rostral and caudal anterior cingulate, the posterior cingulate, and the isthmus cingulate, the latter referred to here as the retrosplenial cortex for consistency with other published studies. 5,16,40 The retrosplenial region may also be referred to as the isthmus of the cingulate or caudal posterior cingulate area in other contexts. For a depiction of the exact regions of interest used, see Fig 1.

FDG-PET Acquisition and Analysis
Subjects were scanned after a 4-hour fast (water only). Plasma glucose had to be Յ180 mg/dL for FDG to be injected. An intravenous catheter was placed in 1 arm for injection of 18F-FDG. Imaging began at 30 minutes postinjection, and the scan was acquired as six 5-minute frames. For each subject, FDG-PET frames were averaged and registered to the corresponding distortion-corrected and intensity-normalized MR imaging volume. PET activity for each subject was sampled onto their reconstructed cortical surface, averaged within each region of interest, and normalized to activity within the pons. 45

CSF Acquisition and Analysis
CSF samples obtained by lumbar puncture were examined for t-tau, p-tau, and A␤42 by using an immunoassay method. 46 The measurements were performed by L. Shaw and J. Trojanowski of the ADNI Biomarker Core at the University of Pennsylvania School of Medicine. We analyzed the following CSF biomarkers for the present article: A␤42 (202 Ϯ 56, 159 Ϯ 51, 136 Ϯ 39 pg/mL for NC, MCI, and AD, respectively), t-tau (68 Ϯ 28, 100 Ϯ 65, 125 Ϯ 67 pg/mL for NC, MCI, and AD, respectively), and p-tau (26 Ϯ 17, 36 Ϯ 19, 45 Ϯ 23 for NC, MCI, and AD, respectively). The ratios of tau and A␤42 (tau/A␤42; 0.37 Ϯ 0.21, 0.74 Ϯ 0.67, 0.98 Ϯ 0.56 for NC, MCI, and AD, respectively) and the p-tau A␤42 ratio (p-tau/A␤42; 0.16 Ϯ 0.16, 0.26 Ϯ 0.19, 0.36 Ϯ 0.22 for NC, MCI, and AD, respectively) were also included. A 1-way analysis of variance on the residual CSF values after age and sex were regressed out showed significant (P Ͻ .001) main effects of group on all variables. Post hoc tests controlling for multiple comparisons showed significant (P Ͻ .05) differences between NC and MCI, NC and AD, and MCI and AD, with a few exceptions where trends (P Ͻ .10) were observed (differences in t-tau between MCI and AD, p-tau between NC and MCI, and t-tau/A␤42 between MCI and AD).

Clinical and Cognitive Measures
Change scores were calculated by subtracting baseline scores from scores obtained at the 2-year follow-up. In addition to CDR-SB 47 and MMSE, 23 delayed recall on the Wechsler Memory Scale-Revised 48 was included. This test requires the subject to recall a story read by the examiner after a 30-to 40-minute delay and is sensitive to the episodic memory deficits in MCI.

Statistics
A repeated-measures general linear model with the 10 regions of interest ϫ hemisphere (left, right) ϫ diagnostic group (NC, MCI, AD) with age and sex as covariates showed no significant effect of hemisphere across regions of interest (F [1,148] ϭ 1.530, P ϭ .218) and no interaction of hemisphere ϫ diagnostic group (F [2,148] ϭ 0.847, P ϭ .431). Hence, values were averaged across hemispheres, effects of age and sex were regressed out, and the standardized residuals were used in the analyses. Correlation analyses with MR, FDG-PET, and CSF measures were run to assess their covariance. To select the measures yielding the most explained variance for each method, we entered the values in 3 separate logistic stepwise regressions by using MR, PET, and CSF measures respectively, predicting NC versus AD. The selected MR, PET, and CSF variables were then entered simultaneously in multimethod stepwise logistic regression analyses predicting NC versus AD and NC versus MCI. Next, the variables identified by the NC-versus-AD classification analysis were correlated with 2-year follow-up CDR-SB, MMSE, and delayed logical memory change scores in the MCI group and were entered as predictors in stepwise regression analyses with the respective behavioral change scores as the dependent variables.

Results
Correlation analyses in the MCI group for morphometry and metabolism for the 10 regions of interest and the 5 CSF variables showed no significant (P Ͻ. 05, corrected for 10 regionof-interest comparisons) correlations among CSF variables and morphometry or metabolism in any region of interest, whereas moderate correlations were found between morphometric and metabolic measures for the hippocampus and entorhinal, retrosplenial, and inferior parietal regions (on-line Table). Table 2 shows the results of the separate logistic stepwise regressions predicting NC-versus-AD classification on the basis of MR imaging, FDG-PET, and CSF measures. Hippocampal volume and entorhinal and retrosplenial thickness, for MR imaging, were included in the final model, yielding an overall classification accuracy of 85.0%, and approximately 71% explained variance (Nagelkerke R 2 ). Entorhinal, retrosplenial, and lateral orbitofrontal metabolism, for FDG-PET, were in-cluded in the final model, yielding an overall classification accuracy of 82.5% and approximately 62% explained variance. For CSF, the ratio of t-tau/A␤42 was the single unique predictor, yielding an overall classification accuracy of 81.2%, and approximately 52% explained variance. Thus, hippocampal volume; entorhinal and retrosplenial thickness; entorhinal, retrosplenial, and lateral orbitofrontal metabolism; and t-tau/A␤42 ratio were entered in a logistic regression analysis to classify NC versus AD, and the results are shown in Table 3.
In the final model, hippocampal volume, retrosplenial thickness, and t-tau/A␤42-ratio were included as predictors, yielding an overall classification accuracy of 88.8% and approximately 78% explained variance. Figure 2 depicts the ROC curves for these variables when using 1 (hippocampal volume) versus a combination of 2 (hippocampal volume and t-tau/A␤42-ratio) and all 3 variables (hippocampal volume, t-tau/A␤42-ratio, and retrosplenial thickness) shown to be unique predictors of NC-versus-AD classification. Predicted values from logistic regressions were used for calculation of the ROC curves. Statistical comparisons of the AUCs of these classifiers were performed by using the method of Hanley and McNeil. 49 This approach yielded a significant difference (P Ͻ. 05) between the AUCs using hippocampal volume alone versus using hippocampal volume and t-tau/A␤42 ratio in combination and hippocampal volume, t-tau/A␤42 ratio, and retrosplenial thickness in combination. The difference of the AUCs using hippocampal volume and t-tau/A␤42 ratio versus hippocampal volume, t-tau/A␤42 ratio, and retrosplenial thickness in combination was clearly smaller and not significant (P Ͼ .05). Note however, that all meaningful differences in AUCs (eg, in terms of sensitivity versus specificity causing the curves to cross) may not necessarily be captured as statistically significant. The same set of predictor variables was entered in an analysis to predict diagnostic classification for NC and MCI, which revealed that hippocampal volume and t-tau/ A␤42 ratio were unique predictors, yielding an overall classi- fication accuracy of 79.1% and approximately 40% explained variance. Table 4 shows correlations for each variable included in the regression models and the cognitive change scores (CDR-SB, MMSE, delayed logical memory). In MCI, baseline retrosplenial thickness correlated with a 2-year change in CDR-SB and MMSE, where a thicker cortex was associated with less CDR-SB elevation and less MMSE reduction. Retrosplenial and entorhinal metabolism correlated negatively with MMSE change. Hippocampal volume correlated positively with delayed logical memory. There were no significant correlations with clinical change measures for t-tau/A␤42 in MCI. T tests of the Fisher z-transformed correlation coefficients showed that the t-tau/A␤42 ratio correlated significantly lower (P Ͻ .05) with CDR-SB and MMSE change than did retrosplenial thickness and it correlated significantly lower with MMSE change than did entorhinal and retrosplenial metabolism. Lateral orbitofrontal metabolism also correlated significantly lower with change in CDR-SB and delayed logical memory than did retrosplenial thickness and hippocampal volume.
In the stepwise regression analysis predicting CDR-SB change, only retrosplenial cortical thickness was included as a unique predictor (y ϭ 1.231 Ϫ0.731x, P ϭ .002), explaining 18% of the variance. In predicting MMSE change, retrosplenial metabolism was included in the first step (y ϭ Ϫ1.193 ϩ 1.534 x 1 , P ϭ .002 for x 1 , R 2 ϭ 0.22), and retrosplenial thickness was added in the second (y ϭ Ϫ1.197 ϩ 1.177x 1 ϩ 0.776 x 2 , P ϭ .009 for x 1 and .042 for x 2 , R 2 ϭ 0.29). Only hippocampal volume was included as a predictor of delayed logical memory change (y ϭ 0.240 ϩ 1.669 x 1 , P ϭ .003 for x 1, R 2 ϭ 0.17). The regression plots for CDR-SB and MMSE change predicted from retrosplenial thickness and metabolism and delayed logical memory predicted from hippocampal volume are shown in Fig 3. There was 1 outlier for the MMSE change score, with a 13-point decline. Without this outlier, only retrosplenial metabolism was included in the model for predicting MMSE change (y ϭ Ϫ1.023 ϩ 1.091 x 1 , P ϭ .004 for x 1 , R 2 ϭ 0.16), but a trend was observed for retrosplenial thickness (P ϭ .079).  Table 2, were included in the set of predictor variables, i.e. for MR: hippocampal volume, retrosplenial, and entorhinal thickness; for PET: entorhinal, retrosplenial, and lateral orbitofrontal metabolism; and for CSF: the ratio of T-tau to Abeta 42. R 2 is Nagelkerke R 2 .

Fig 2.
Comparison of ROC curves for using 1 versus a combination of 2 and all 3 variables shown to be unique predictors of NC-versus-AD classification. Yellow is the predicted probability based on hippocampal volume alone (AUC ϭ 0.900, SE ϭ 0.033). Blue is the predicted probability based on hippocampal volume and t-tau/A␤42 ratio (AUC ϭ 0.950, SE ϭ 0.022). Red is the predicted probability based on hippocampal volume, t-tau/A␤42 ratio, and retrosplenial cortical thickness (AUC ϭ 0.961, SE ϭ 0.018).  Table 2, were included in the set of predictor variables (ie, for MR imaging, hippocampal volume and retrosplenial and entorhinal thickness; for PET, entorhinal, retrosplenial, and lateral orbitofrontal metabolism; and for CSF, the ratio of t-:A␤42). b P Ͻ .05, corrected for 7 comparisons.

Discussion
Morphometry, metabolism, and CSF biomarkers were all sensitive to diagnostic status. The best classification accuracy of NC versus AD was obtained by MR imaging morphometry measures (hippocampal volume, entorhinal and retrosplenial cortical thickness). However, classification accuracies close to those obtained by MR imaging were also obtained by FDG-PET (entorhinal, retrosplenial, and lateral orbitofrontal metabolism) and CSF measures (t-tau/A␤42-ratio). In the multimodal analysis, FDG-PET measures appeared to provide largely redundant information, whereas hippocampal volume, retrosplenial thickness, and the t-tau/A␤42 ratio were unique predictors of diagnostic status. In particular, the inclusion of the CSF biomarker in addition to MR imaging hippocampal volume did result in a significant improvement in classification in terms of AUC. Thus, the combination of MR imaging morphometry and CSF biomarkers yielded the highest diagnostic classification accuracy. Contrary to this finding, in the prediction of clinical change during 2 years, FDG-PET and MR imaging morphometry were the best predictors. However, with the exception of retrosplenial metabolism and thickness in the prediction of change in MMSE scores, the 2 measures were largely redundant. Thus, it seems that the benefits of including both MR imaging morphometry and FDG-PET are modest in predicting clinical decline in MCI. Whereas CSF biomarkers added to the diagnostic accuracy at baseline, they did not predict 2-year clinical decline in the current MCI group. This finding may be somewhat surprising because previous studies have found decreased CSF A␤42 and/or tau or tau/A␤42 levels to be predictive of future dementia in patients with MCI. 2 Several factors may have contributed to the discrepancies. First, the ongoing ADNI study may have a more heterogeneous MCI group than some of the previously published CSF studies. As pointed out by Hansson et al, 50 participants included in CSF studies have generally been highly selected, for example, by inclusion of only patients with MCI who progress to AD. In ADNI, the ultimate end point is not known for many patients with MCI. Further, studies have often used dichotomized variables for CSF values and prognosis. 50,51 A stable/conversion dichotomization involves clinical judgment, which may vary from physician to physician and demands long follow-up intervals impractical for clinical trials. It may be advantageous to identify other preselection criteria, biomarkers, or clinical measures of decline than conversion. Therefore, it is important to relate the biomarkers to easily administered continuous behavioral measures. Most interesting, another study investigating continuous variables 52 did not find any association between MMSE change and change in CSF levels of either A␤42, tau, or p-tau (r ϭ 0.18, Ϫ.03, and Ϫ.07, respectively). This does not mean that CSF measures are not related to clinical change. CSF tau/ A␤42 ratio did correlate in the expected negative direction with change in logical delayed memory in the present sample, but the effect size was too modest to reach significance. Select MR imaging morphometry and FDG-PET measures at baseline were significantly more sensitive to 2-year change in CDR-SB and MMSE than were CSF measures. Both cortical thickness and metabolism of parietal regions of interest served as unique predictors of clinical decline, indicating that even though FDG-PET did not contribute uniquely to diagnostic classification when MR imaging morphometric variance was accounted for, some additional prognostic information can be obtained by combining the 2 imaging modalities.
While the present findings show that the different biomarkers all were sensitive to diagnostic group, a question of great interest is whether the findings regarding specific measures can be applied on an individual subject basis. McEvoy et al 7 recently reported that semiautomated individually specific quantitative MR imaging methods identical to those used here can be used to identify a pattern of atrophy in MCI that is predictive of conversion to AD after 1 year. Hence, in light of the present findings also indicating somewhat superior sensitivity of such MR imaging morphometry measures compared with other biomarkers, it does seem that these measures are prime candidates to be used on an individual basis (eg, to enrich clinical trials). However, as seen from Fig 3, while the MR imaging morphometry measures evaluated here do predict 2-year change in screening and memory parameters among patients with MCI, the regression plots also show considerable scatter. Hence, while these measures can yield individual prognostic information, this will be associated with considerable uncertainty, and at present, any such estimate must be made with great caution.
The present results are limited by a number of factors: Participants were selected on the basis of willingness and ability to undergo MR imaging and PET scanning and lumbar puncture and may thus not be fully comparable with other samples. However, imaging is an integral part of the ADNI protocol, so participants did enter with the intention of having brain scans performed, and approximately half of the ADNI participants have also agreed to have CSF samples drawn. 53 In terms of age, MMSE score, A␤42, t-tau, p-tau, and ratios of t-tau/A␤42 and p-tau/A␤42, the subgroups studied in the present article do appear to be representative of the larger ADNI sample. The mean values for these indices in the present sample appear very similar to those reported by Shaw et al 53 for 410 participants with CSF measures, and all the present mean values for age, MMSE, and CSF measures for NC, MCI, and AD deviate less than one-fifth of the SDs from the means reported by Shaw et al for the larger groups. Still, the present sample may, of course, not be fully representative of the general population. Furthermore, the multisite design of the ADNI is likely to add some noise in data collection. Finally, the ADNI study is still ongoing, and the ultimate status of the current MCI group is unknown. That being said, the present study involving multiple sites and 2 years of follow-up likely represents a more realistic model for current clinical trial designs than longer interval single-site studies.

Conclusions
Each of the biomarkers demonstrated potential to inform diagnosis and/or prognosis and enrich clinical trials. As a single classifier, MR imaging morphometry (hippocampal volume) was the most sensitive to diagnostic group, but the inclusion of CSF biomarkers (t-tau/A␤42) did result in significant improvement of classification (NC/AD). Still, both quantitative MR imaging morphometry and regional metabolism as assessed by coregistered FDG-PET data provided better prediction of clinical decline than did CSF biomarkers. MR imaging morphometry showed somewhat superior diagnostic and prognostic sensitivity and is the least invasive, least expensive, and most widely available method. MR images are often routinely required as part of the diagnostic work-up, so a broader application of MR imaging morphometry may be feasible and useful.