Abstract
BACKGROUND AND PURPOSE: Patients with mild cognitive impairment (MCI) are at risk for developing Alzheimer disease (AD). To diagnose AD at an early stage, one must develop highly specific and sensitive tools to identify it among at-risk subjects. The purpose of this study was to evaluate and compare the ability of fluorodeoxyglucose–positron-emission tomography (FDG-PET), single-photon emission tomography (SPECT), and structural MR imaging to predict conversion to AD in patients with MCI.
MATERIALS AND METHODS: Relevant studies were identified with MEDLINE from January 1990 to April 2008. Meta-analysis and meta-regression were done on the diagnostic performance data for each technique from eligible studies. We estimated and compared the weighted summary sensitivities, specificities, likelihood ratios (LRs), and summary receiver operating characteristic curves of each imaging technique.
RESULTS: Twenty-four eligible studies were included, with a total of 1112 patients. FDG-PET performed statistically better in LR+ and odds ratio (OR), whereas no statistical difference was found in pooled sensitivity, specificity, and LR− for each technique. No statistical difference was confirmed between SPECT and MR imaging. The Q* index estimates for FDG-PET, SPECT, and structural MR imaging were respectively 0.86, 0.75, and 0.76. In meta-regression, statistical significance was found only between technique and log OR, with a regression coefficient of −0.575.
CONCLUSIONS: This meta-analysis showed that FDG-PET performs slightly better than SPECT and structural MR imaging in the prediction of conversion to AD in patients with MCI; parallel performance was found between SPECT and MR imaging.
The prevalence of Alzheimer disease (AD) doubles steadily every 5–10 years for individuals older than 60 years of age.1 Before symptoms are manifest for clinical diagnosis of probable AD, histologic changes of AD may present in asymptomatic individuals, which are followed by mild cognitive impairment (MCI). MCI is a heterogeneous entity characterized by differences in cognitive profile and clinical progression, possibly due to the interplay of genetic, physiologic, and environmental factors.2 The outcome of MCI is uncertain: Many subjects remain stable or even revert to a normal state, whereas others present progressive neurodegeneration and eventually develop AD or other dementias.3 Many patients have been classified as having MCI (or related synonyms) (Table 1), all referring to individuals “whose cognitive impairment, particularly memory impairment, is beyond that expected in people of their age and educational level, but is not severe enough to reach criteria for dementia, delirium, or organic amnesia syndrome.”3,4 To diagnose AD at an early stage, one must develop highly specific and sensitive tools to identify it among at-risk subjects, such as patients with MCI. One approach to evaluate predictive ability would be to collect longitudinal data on a sample of subjects and retrospectively analyze the initial characteristics of those who eventually convert.
To assess progression and identify patients with progressive MCI, several noninvasive imaging methods are available; however, there is no consensus on the ideal one. Although the ability of imaging techniques to predict progression of MCI has been investigated, the results are based on studies that differ in terms of population characteristics, methodology, follow-up interval, and reference standards, some of which are ineffective to evaluate the predictive value due to small sample sizes. A meta-analysis thus needed to be performed. By combining relevant evidence, for example a series of effect sizes such as odds ratio (OR) and risk ratio, from similar studies, statistical power is increased and more precise estimates can be obtained. Effects of confounding factors would be lessened by pooling results from all studies, making the results more applicable to the wider population. Most important, meta-analysis provides a framework for the assessment of between-study heterogeneity—that is, the methodologic, epidemiologic, clinical, and biologic dissimilarity across the various studies.5 Furthermore, studies comparing fluorodeoxyglucose–positron-emission tomography (FDG-PET), single-photon emission CT (SPECT), and structural MR imaging within the same patient population are unlikely to be performed because of increased cost and longer work-up time.
The aim of our meta-analysis was to evaluate and compare the ability of noninvasive imaging methods (FDG-PET, SPECT, and structural MR imaging) to predict conversion to AD in patients with MCI from studies reported in the peer-reviewed literature.
Materials and Methods
Identification and Eligibility of Relevant Studies
MEDLINE through OVID was used to identify studies. The search strategy was based on a combination of terms: 1) positron-emission tomography or tomography, emission-computed, single-photon OR MR imaging; 2) mild cognitive impairment AND Alzheimer Disease AND predict*. Searches were limited to human subjects. Abstracts were read by 2 reviewers (Y.Y., Z.-X.G.) for all articles retrieved. For relevant abstracts, full articles were obtained and examined. References of retrieved articles were screened for additional studies. Further articles citing these articles were also examined by using the Social Sciences Citation Index (1998 to present) through the Web of Science.
The following inclusion criteria were adopted to conform variables that may introduce bias or explain heterogeneity of the results on the basis of the Quality Assessment of Diagnostic Accuracy Assessment (QUADAS) tool6: 1) published between January 1990 and April 2008; 2) peer-reviewed; 3) original studies; 4) reported in English; 5) including at least 10 subjects; 6) patients at baseline who could not be classified as healthy or demented but were cognitively impaired; 7) longitudinal studies that retrospectively analyzed the initial characteristics of those who were progressive and those who remained stable; 8) not visually rated; 8) either clinical diagnosis (AD based on the National Institute of Neurologic and Communicative Disorders and Stroke-Alzheimer disease and Related Disorders Association criteria,7 dementia based on the Diagnostic and Statistical Manual criteria [DSM-IV],8) or histopathologic diagnosis used as the reference standard; 9) sufficient data provided either directly or indirectly through a 2 × 2 table to enable calculation of point estimates and 95% confidence intervals (CIs) for the operating characteristics; 10) the largest or the most recent articles for reports including overlapped patients; 11) for PET studies, only those using [18F] fluorodeoxyglucose (18F-FDG) as tracer, and for MR imaging, only structural MR imaging; and 12) only articles in which the answer yes for the 14 questions in the QUADAS quality assessment tool6 was given more than 9 times. The studies with results of different diagnostic methods that were presented in combination and could not be separated were excluded.
Data Extraction
In each report, 2 investigators (Y.Y., Z.-X.G.) extracted and recorded data on author names, year of publication, number of patients analyzed, age, male-female distribution, years of education, score of the Mini-Mental State Examination (MMSE) at baseline, inclusion and exclusion criteria, reference standard, follow-up interval, image and data analytic methods, the regions of cerebral analysis, and a 2 × 2 contingency table. We also extracted the following imaging features: for FDG-PET and SPECT, the tracer used (FDG-PET tests exclusively selected 18F-FDG as tracer) and amount of tracer; for MR imaging, magnetic field strength. To resolve disagreement between reviewers, a third reviewer (W.-S.W.) assessed all discrepant items, and the majority opinion was used for analysis.
Data Synthesis and Statistical Analysis
Stata, Version 8.2 (Stata, College Station, Tex) and Meta-DiSc (Javier Zamora, Boston, Mass)9 were used for statistical analysis. For each study, we constructed a 2 × 2 contingency table in which all participants were classified as having positive or negative imaging results at baseline and as being cognitively progressive or stable during the follow-up interval. To calculate the log OR (log odds of true positive rate and log odds of false positive rate), we added 0.5 to each cell in any 2 × 2 table with zero. For each technique, we estimated the weighted summary sensitivity, specificity, positive and negative likelihood ratios (LR+, LR−), and OR and described a set of operating characteristics across eligible studies by constructing a summary receiver operating characteristic (ROC) curve. We defined the maximum joint sensitivity and specificity as point Q* on a symmetric ROC curve, which is a global measure of test accuracy, similar to the area under the ROC curve (AUC). To determine whether these values were significantly affected by heterogeneity between individual studies, we performed meta-regression analysis. We considered variates to be explanatory if their regression coefficients were statistically significant (P < .05). Publication bias was examined by using funnel plots with the log OR plotted against the standard error of the log OR in each study.
Results
Eligible Studies
Tables 2–4 summarize the included studies, classified according to technique.10–33 Our search strategy identified 326 articles. After we ruled out the obviously irrelevant abstracts, 66 studies were left, and their full texts were obtained, 19 of which satisfied all inclusion criteria.11,13,14,16–21,23–28,30–33 Forty-seven articles were excluded for the following reasons: 1) enrolling exclusively demented (n = 16) or healthy subjects (n = 3); 2) cross-sectional studies (n = 4) or longitudinal studies not following clinical diagnosis or histopathologic diagnosis as a reference standard (n = 1); 3) presented results from a combination of other diagnostic modalities such as neuropsychological tests that could not be differentiated (n = 3); 4) not presenting sufficient data to create a 2 × 2 contingency table (n = 16); 5) review or symposium (n = 3); and 6) clinical trials of medicine (n = 1).
Although the imaging analytic method was not available in the study by Silverman et al,15 it was stated that to avoid bias, at the time the scan findings were reported, readers were blinded to clinical follow-up data of scanned patients; therefore, the study was also included. Our manual search of the reference lists of retrieved articles and several reviews and articles citing these articles identified 5 articles that met all inclusion criteria.10,12,15,22,29 Although the article of Huang et al, published in 2003,22 overlapped another eligible study published in 2002,23 the end point of patients in the latter was diagnosis of AD, which was more relevant to the aim of our meta-analysis than the former with the end point of dementia; thus, analyses were also done including the article in 2002. Therefore, 24 eligible studies10–33 (6 FDG-PET, 8 SPECT, and 10 MR imaging) were finally included in the meta-analysis.
Some enrolled patients were diagnosed with synonyms of MCI as mentioned before. Because these criteria were similar to those of MCI, we accepted these patients as having MCI (Table 1). Four studies narrowed their included patients to those having amnestic MCI,11,13,21,32 the most common form of MCI with a higher risk of progressing to AD (10%–30% per year).5 Although other forms of MCI were not so tightly associated with memory disturbance and might progress to disorders other than AD, we still treated these studies as eligible, taking into account the limited number of included studies. To evaluate and explain the heterogeneity, we performed subgroup analyses between studies enrolling patients of different diagnoses. Although most articles adopted the diagnosis of AD as an end point, progression was evaluated according to patients’ changes in MMSE scores in the study of Silverman et al.15 In the 4 other studies, patients progressed to AD as well as to other types of dementia.17,22,27,29 Taking into account the relatively small portion of dementia other than AD (non-AD-dementia/all subjects, 16.9%), we still enrolled these studies. Subgroup analyses were executed afterward to evaluate the contribution of the 4 studies to the heterogeneity. Allowing both between- and within-study variation, we selected a random-effects model whose results were more conservative than those with a fixed-effects model.
Data Synthesis
For each technique, the weighted summary of sensitivity, specificity, LR, OR, and their 95% CIs, P value for heterogeneity, and I2 value are summarized in Table 5. No 95% CI of OR and LR included 1, confirming the diagnostic value of all modalities. Although no statistically significant difference was found (P > .05) for each technique in pooled sensitivity, sensitivity, and LR−, FDG-PET had the highest pooled OR and LR+ (P < .05). No statistical difference was confirmed between SPECT and MR imaging. The summary ROC curves, Q* index, and AUC for FDG-PET, SPECT, and MR imaging are shown in Fig 1. The Q* index estimates for FDG-PET, SPECT, and MR imaging were 0.86, 0.75, and 0.76, respectively. No significant difference was found in analyses, including the article of Huang et al in 2002.23
I2 is an index for heterogeneity: I2 = (Q − [k − 1]) / Q × 100%, where Q is the χ2 value of heterogeneity and k is the number of studies included. Along with P < .05 for heterogeneity, I2 > 50% further indicates heterogeneity between studies. The heterogeneity in the LR− test of FDG-PET and the LR+ test of SPECT was highly statistically significant (P < .001 and I2 > 80%), confirming that there was strong evidence of between-study heterogeneity (Table 5). To assess possible explanations for the heterogeneity, we applied single-factor meta-regression analysis by adding the publication year, age, male-female distribution, follow-up interval, years of education, and mean score of the MMSE at baseline separately as variates. No apparent relationships were found between these variables and log OR (P > .05). Statistical significance (P = .027) was found only between technique and log OR, with a regression coefficient of −0.575. Subgroup analyses for diagnoses at baseline (MCI versus synonyms of MCI) and patients’ end points (exclusively AD versus dementia including AD) were also executed regardless of technique, due to the limited amount of studies. The OR of studies evaluating patients converting to AD was higher than that evaluating patients with dementia (P = .03), whereas the sensitivity, specificity, LR+, and LR− showed no statistical difference (P > .05). No significant difference was found between studies differing in inclusive diagnoses. Fig 2 demonstrates the funnel plots of all modalities. Marked asymmetry, with studies missing from the bottom left quadrant, suggests a publication bias; additional studies with an OR of approximately 1 might have been conducted but not published because of unfavorable results.34
Discussion
This study aimed to estimate and compare the ability of FDG-PET, SPECT, and structural MR imaging to predict conversion to AD in patients with MCI across a number of published studies. Our meta-analysis, including data from 1112 patients, showed that FDG-PET had moderately better concordance with follow-up results for the prediction of conversion. Approximately 88.9% of the patients with progressive MCI were scanned as positive by FDG-PET, whereas 84.9% of stable patients had negative FDG-PET at baseline. In addition, this study aimed to formally assess the heterogeneity in these studies. The heterogeneity in the LR− test of FDG-PET and the LR+ test of SPECT was highly statistically significant for the between-study random effect (P < .001 and I2 > 80%); however, formal meta-regression analyses showed no significant associations between the mean score of the MMSE at baseline, age, years of education, male-female distribution, follow-up interval, and log OR; statistical significance was found only between technique and log OR. We performed subgroup analyses, respectively, between studies enrolling MCI and synonyms of MCI and between studies adopting different end points (exclusively AD and dementia including AD); still, a statistical difference was found only in OR in the latter comparison. However, it was recognized that estimates of such effects may be imprecise if only a small number of studies were available. It might be that these relationships did exist; meta-regression could detect such associations only if there was sufficient variability in the explanatory variable between studies. On the other hand, the only significant difference found in subgroup analyses might be just by chance due to the small ratio of studies adopting non-AD patients and the small ratio of non-AD patients in these studies.
The analyses we performed to detect heterogeneity were far from sufficient; this feature might explain the mostly negative results in the heterogeneity tests. Although meta-regression analyses were performed, some results were difficult to assess definitively because certain variables were only available in half of the eligible articles, such as years of education. Other possible factors, such as administered medications, genetic status,35,36 drop-out rate, and interval between clinical diagnosis and imaging, were not discussed because few eligible studies reported them. Another example of difference lay in the reliability of the image-analysis technique. Although we required the eligible studies to have adopted semiquantitative analytic techniques such as region-of-interest, statistical parametric mapping, and 3D stereotactic surface projection to avoid prepossession of clinical diagnosis, the parameters differed among studies. Similar results occurred for scanning methodology and acquisition protocol, especially in MR imaging. Although variables differed among studies, such as TR, TE, FOV, flip angle, section thickness, and matrix size, there were too many variables in methodology and acquisition protocol of MR imaging to include. Subgroup analyses were also impossible due to the limited number of studies in each subgroup.
This study has several limitations. A notable one was the small number of studies included, which obviously restricted the statistical analysis we performed and thus impaired the explanatory power of heterogeneity in our meta-analysis. We attempted to include as many studies as possible; however, achieving an adequately homogenous sample of studies required the exclusion of many studies. It was possible that publication bias in this field restricted the publication of studies with less-promising results, because the funnel plots suggested a lack of studies with an OR value of approximately 1. Another might be that we only included studies published in English, which might invoke the so-called Tower of Babel bias.37 Although certain less-qualified studies would be neglected by limiting publication language to English, the Tower of Babel bias, which refers to the fact that investigators working in a language other than English could be sending only studies with positive results to international journals, would make it possible that studies with negative results could have been left out.
The limited number of studies included led to another limitation: It prohibited subgroup analyses and gave a low explanatory power to heterogeneity tests. Therefore, our use of tests to guide us toward the relevant subset was not entirely satisfactory, and further analyses of variables within each technique group, such as tracer, region of interest, and magnetic field strength, were impossible.
In the contrast, compromises were required to ensure a sufficient sample of studies. For example, the variability in follow-up interval was considerable between included studies and was deemed a further limitation. The reported conversion rate of MCI to AD was 12.0% per year.38 It is conceivable that a relatively higher percent of patients with baseline MCI would progress to dementia after a longer follow-up interval and thus impact the prediction value of the imaging technique. Although no significant contribution to heterogeneity of follow-up interval was demonstrated by using meta-regression analysis, we could not exclude the possibility that heterogeneity did exist, accounting for the misidentification that might arise from the small sample size in meta-regression.
Finally, a significant limitation was related to the fact that there were no gold standards for the progression or the interpretation of PET, SPECT, or MR imaging, though high rates of interobserver concordance were assured in the latter. In some respects, a region-of-interest approach may enhance reliability, though the selection of region of interest may influence the results. A recent study39 found that atrophy of the left lateral temporal lobe and left parietal cortex independently predicted conversion to dementia; however, in our meta-analysis, we did not compare further the diagnostic value of each region of interest and the influence it posed to heterogeneity due to the very limited eligible subgroup studies. Automated deformation-based analysis has been used to detect a specific pattern of brain atrophy in AD, but it lacked an established model to derive the individual risk of AD in patients with MCI.32
Conclusions
This meta-analysis assessed and compared the diagnostic ability of the noninvasive imaging methods that are currently used for prediction of conversion to AD in patients with MCI, which demonstrated that FDG-PET is a useful supplement to current surveillance techniques, with a predictive accuracy better than that of SPECT or MR imaging.
Acknowledgments
We thank Professor Jian-Chao Bian of the School of Public Health, Fudan University, for his statistical analysis advice and reviewers for their insightful comments, which certainly helped us to improve this work.
Footnotes
This research was funded by a grant from the Science and Technology Commission of Shanghai Municipality Funds (No. 07DJ14005).
References
- Received August 6, 2008.
- Accepted after revision September 10, 2008.
- Copyright © American Society of Neuroradiology