A Meta-Analysis on the Diagnostic Performance of 18F-FDG and 11C-Methionine PET for Differentiating Brain Tumors

SUMMARY: 18F-FDG-PET has been widely used in patients with brain tumors. However, the reported sensitivity and specificity of 18F-FDG-PET for brain tumor differentiation varied greatly. We performed this meta-analysis to systematically assess the diagnostic performance of 18F-FDG-PET in differentiating brain tumors. The diagnostic performance of 11C-methionine PET was assessed for comparison. Relevant studies were searched in PubMed/MEDLINE, Scopus, and China National Knowledge Infrastructure (until February 2013). The methodologic quality of eligible studies was evaluated, and a meta-analysis was performed to obtain the combined diagnostic performance of 18F-FDG and 11C-methionine PET with a bivariate model. Thirty eligible studies, including 5 studies with both 18F-FDG and 11C-methionine PET data were enrolled. Pooled sensitivity, pooled specificity, and area under the receiver operating characteristic curve of 18F-FDG-PET (n = 24) for differentiating brain tumors were 0.71 (95% CI, 0.63–0.78), 0.77 (95% CI, 0.67–0.85), and 0.80. Heterogeneity was found among 18F-FDG studies. Subsequent subgroup analysis revealed that the disease status was a statistically significant source of the heterogeneity and that the sensitivity in the patients with recurrent brain tumor was markedly higher than those with suspected primary brain tumors. Pooled sensitivity, pooled specificity, and area under the receiver operating characteristic of 11C-methionine PET (n = 11) were 0.91 (95% CI, 0.85–0.94), 0.86 (95% CI, 0.78–0.92), and 0.94. No significant statistical heterogeneity was found among 11C-methionine studies. This meta-analysis suggested that 18F-FDG-PET has limited diagnostic performance in brain tumor differentiation, though its performance may vary according to the status of brain tumor, whereas 11C-methionine PET has excellent diagnostic accuracy in brain tumor differentiation.

tion. Commonly the glucose use in malignant brain tumors is increased, and 18 F-FDG-PET performs well in identifying highgrade gliomas. 1 However, because of the high physiologic glucose metabolism in normal brain tissue, the diagnostic accuracy of 18 F-FDG-PET in brain tumors is reduced, especially in low-grade brain tumors, which typically have lower levels of glucose metabolism. 2,3 Moreover, 18 F-FDG uptake in brain tumors demonstrates great variety and might not be closely associated with the malignant grade, 3 making 18 F-FDG-PET unreliable in differentiating brain tumors. 4 A number of 18 F-FDG-PET studies for differentiating brain tumors, including suspected primary brain tumor (SPBT) and/or suspected recurrence of brain tumors after treatment (SRBT), have shown a wide range of sensitivity and specificity, from 0.25 to 1.00 and 0.22 to 1.00, respectively. 1,2, This great disparity of diagnostic values causes confusion on the application of 18 F-FDG-PET for brain tumor differentiation. Therefore, although 18 F-FDG-PET remains the dominant approach for brain tumor imaging, a systematic assessment of the diagnostic performance of 18 F-FDG-PET for brain tumor differentiation is imperative.
In view of the limitation of 18 F-FDG, amino acid and amino acid analog PET tracers have been developed. PET imaging with these tracers improved the ability to differentiate brain tumors due to high tumor uptake and low uptake in normal brain tissue. 28 Among these tracers, 11 C-methionine (MET) is one of the most extensively investigated. 11 C-MET accumulates extensively in proliferating tumors by the mechanism of increased amino acid transport and protein synthesis. Several reports confirmed that 11 C-MET PET differentiated brain tumors with high sensitivity and specificity. 29,30 A recent meta-analysis concluded that 18 F-FDG and 11 C-MET PET have moderately good overall accuracy for diagnosing recurrent glioma. 31 However, other types of brain tumors with high clinical incidence, such as SPBTs and nongliomas, have not been well studied so far. A thorough understanding of the diagnostic effectiveness of 18 F-FDG and 11 C-MET PET for differentiating brain tumors could be highly referential in clinical practice. Therefore, we performed a meta-analysis to comprehensively investigate the diagnostic performance of 18 F-FDG and 11 C-MET PET for differentiating brain tumors with various statuses and histologic types.

Identification and Eligibility of Studies
A systematic search was performed in MEDLINE, Scopus, and the China National Knowledge Infrastructure data bases from January 1991 to February 2013, restricted to human studies in English and Chinese. The detailed search strategies are presented in the On-line Appendix. To search for more potential studies, we also screened references of the retrieved studies.
Studies using 18 F-FDG or/and 11 C-MET for the assessment of SPBT or SRBT were enrolled. Inclusion criteria were the following: 1) the purpose of the study was to differentiate SPBT or SRBT, 2) the study population consisted of a minimum of 10 patients, 3) histology or clinical follow-up was used as a reference standard, and 4) the reported primary data were sufficient to calculate both sensitivity and specificity. Exclusion criteria were the following: 1) the PET tracer not being 18 F-FDG or 11 C-MET; 2) animal or in vitro studies; 3) abstracts, systematic reviews, editorials, letters, comments, and case reports; and 4) studies for staging, searching for metastasis, and evaluating the therapeutic response of definitely diagnosed brain tumors.

Data Extraction and Study Quality
Two experienced nuclear medicine physicians (C.Z., Y.Z.) screened titles, abstracts, and full texts of eligible studies independently. Characteristics of eligible studies were extracted to 2 predefined forms, including the first author's name, year of publication, study country of origin, study design, sample size, mean or median age of patients, male to female ratio, type of brain tumor, disease status, PET tracers, prior imaging tests, reference standard, clinical or radiologic follow-up after PET, PET scanner type, injected dose, time of scanning after injection, scan time, analysis method for diagnostic performance, positive criteria of visual assessment, and cutoff value of quantitative parameters. After data extraction, discrepancies were resolved by consensus and discussion.
The methodologic quality of eligible studies was assessed by using 14 items of the Quality Assessment of Diagnostic Accuracy Studies. 32 Each item of the Quality Assessment of Diagnostic Accuracy Studies was described as yes (score ϭ 2), unclear (score ϭ 1), and no (score ϭ 0). The total score was summarized from all the items with a range of 0 -28. Two experienced reviewers (C.Z., Y.Z.) independently evaluated the quality of selected studies, and disagreements were resolved by discussion.
True-positive, false-positive, true-negative, and false-negative values for 18 F-FDG or 11 C-MET PET in differentiating brain tumors were extracted for each eligible study to construct a contingency table.

Data Analysis
Pooled estimates for sensitivity, specificity, likelihood ratios (LRs) with corresponding 95% confidence intervals, and area under receiver operating characteristic curve (AUC) for 18 F-FDG and 11 C-MET PET were analyzed for the primary meta-analysis on the basis of the bivariate mixed-effects regression model. 33 The bivariate model uses a random effects approach for both sensitivity and specificity, which allows heterogeneity beyond chance as a result of clinical and methodologic differences between studies, and the bivariate model is a more valid statistical model for a diagnostic meta-analysis. 34 The LRs indicate by how much a given test would raise or lower the probability of having a disease. Generally, a good diagnostic test may have LRϩ above 5 and LRϪ below 0.2. The AUC is the average true-positive rate over the entire range of false-positive rate values and serves as a global measure of test performance. The guidelines for the interpretation of intermediate AUC values are the following: low accuracy, 0.5 Յ AUC Յ 0.7; moderate accuracy, 0.7 Ͻ AUC Յ 0.9; or high accuracy, 0.9 Ͻ AUC Յ 1. 35 To graphically describe the results, we plotted the hierarchic summary receiver operating characteristic (HSROC) curves. The HSROC curve is a recommended standard method for diagnostic meta-analysis. 36 Heterogeneity among the studies was checked by using the 2 -based Q-test and the I 2 statistic. 37,38 The existence of significant heterogeneity was assumed with a P value Ͻ .05 for the Q-test and/or an I 2 statistic Ն 50%. If significant heterogeneity was observed, subgroup analysis by using meta-regression was adopted to explore a potential source of heterogeneity by calculating the I 2 statistics. The covariates investigated included study design, imaging method, analysis method for diagnostic performance, malignant grade of brain tumor, disease status, and histology. The malignant grades of brain tumors in the studies were assigned according to the classification of World Health Organization. 39 The stability of our analysis model was tested by 1-way sensitivity analysis if heterogeneity existed. We excluded each study in turn and checked how the new summary diagnostic values could be influenced by the removed one.
We also performed direct comparison of the diagnostic values of 18 F-FDG and 11 C-MET PET from the 5 studies with the same population of the patients [22][23][24][25][26] to diminish the potential bias induced by pooling data from all the studies, though the data subset was smaller.
Publication bias was tested by using the linear regression method and funnel plot of Deeks et al. 40 A P value Ͻ .05 in this linear regression indicates potential publication bias.
Statistical analyses were performed with STATA 12.0 (StataCorp, College Station, Texas). The commands used are presented in the On-line Appendix. A P value Ͻ .05 was considered statistically significant. All P values were 2-sided.

Study Characteristics, Quality, and Publication Bias
According to the search strategies, the electronic search yielded 1579 articles: 895 from PubMed, 190 from Scopus, and 494 from the China National Knowledge Infrastructure. After we screened article types, titles, and abstracts, 76 studies remained and the full-text versions were reviewed. After we reviewed full texts, 48 studies were excluded and 2 studies identified from the reference lists of other eligible studies were included (Fig 1). Finally 30 eligible studies were enrolled, including 19 for 18 F-FDG-PET, 1,2,5-21 6 for 11 C-MET PET, 29,30,41-44 and 5 for both. [22][23][24][25][26] The characteristics of the studies are summarized in On-line Tables 1 and 2. The quality of included studies was assessed on the basis of the Quality Assessment of Diagnostic Accuracy Studies (On-line Table 3). The overall quality of the included studies was considered acceptable for most of the items. The total score varied from 13 to 24 in 18 F-FDG studies and from 18 to 22 in 11 C-MET studies. The proportion of studies with a total score of Ͼ20 in 18 F-FDG studies (11/24) was apparently lower than that in 11 C-MET studies (9/ 11); this difference indicated the overall higher quality of 11 C-MET studies. A common poor-quality item (item 6) in most studies was the failure to use the same reference standard.
We found no significant evidence of publication bias in both 18

Heterogeneity of 18 F-FDG-PET Studies and Sensitivity Analysis
The sensitivity and specificity of 18 F-FDG-PET for brain tumor differentiating across 24 eligible studies ranged from 0.25 to 1.00 and 0.22 to 1.00, respectively. The test of heterogeneity revealed significant statistical heterogeneity (Q-value for sensitivity ϭ 83.23, P ϭ .00, I 2 ϭ 72.37%; Q-value for specificity ϭ 57.11, P ϭ .00, I 2 ϭ 59.73%).
We excluded 1 study from the overall pooled analysis each time to check the influence of the removed dataset on the summary estimates. When a single study was excluded, the new pooled sensitivity and specificity remained close to those obtained with all eligible studies (Online Table 4).

Heterogeneity of 11 C-MET PET Studies
The sensitivity and specificity of 11 C-MET PET for brain tumor differentiation across 11 eligible studies ranged from 0.75 to 1.00 and 0.6 to 1.00, respectively. The test of heterogeneity indicated no significant statistical heterogeneity (Q-value for sensitivity ϭ 12.81, P ϭ .23, I 2 ϭ 21.92%; Q-value for specificity ϭ 10.98, P ϭ .36, I 2 ϭ 8.89%).

Subgroup Analyses of 18 F-FDG and 11 C-MET PET Studies
Metaregression was performed for 18 F-FDG-PET studies to explore the potential source of heterogeneity. The results of subgroup meta-analyses are shown in Table 1. The source of the heterogeneity among 18 F-FDG-PET studies was not observed with respect to study design, imaging method, malignant grade of brain tumor, and histology (P Ͼ .05). However, the disease status had a statistically significant influence on the heterogeneity (I 2 ϭ 77.72; P ϭ .01). The sensitivity in SPBT (0.43; 95% CI, 0.28 -0.59) was markedly lower than that in SRBT (0.75; 95% CI, 0.67-0.81).
Metaregression was not performed for 11 C-MET PET studies because no statistically significant heterogeneity was found. However, the sensitivity, specificity, and LRs in subgroups by study design, PET measurement, imaging method, malignant grade, disease status, and histologic finding were also calculated and are listed in Table 2. No apparent difference was observed among the subgroups.

DISCUSSION
We analyzed 24 18 F-FDG-PET studies with an accumulated population of 857 patients for differentiating brain tumors, including SPBT and/or SRBT. The meta-analysis showed that 18 F-FDG-PET has moderately good pooled sensitivity (0.71; 95% CI, 0.63-0.78) and specificity (0.77; 95% CI, 0.67-0.85) for differentiating brain tumors. In the assessment of intracranial masses and the recurrence of brain tumors with 18 F-FDG-PET, a positive 18 F-FDG lesion without the presence of tumor (false-positive) often indicates inflammatory tissue or other nontumor tissues and thus limits the specificity. 3,45 On the other hand, absent or decreased 18 F-FDG uptake in pathologically identified brain tumors (falsenegative) reflects the lower levels of glucose metabolism and is usually highly influenced by the high physiologic glucose metabolism in surrounding normal brain tissue, leading to a decrease of sensitivity. 11 The relatively low pooled sensitivity and specificity of 18 F-FDG-PET for differentiating brain tumors demonstrated in our meta-analysis indicates a considerably high incidence of both false-positives and false-negatives.
In the subgroup analyses of 18 F-FDG studies, the disease status was identified as the only possible source of heterogeneity. We found that the sensitivity of 18 F-FDG-PET was the worst (0.43; 95% CI, 0.3-0.58) when applied to the patients with SPBT. However, because no study on only SPBT was found, the SPBT data were extracted from 3 eligible studies on brain tumors with various statuses. As a result, the number of patients with SPBTs in subgroup analysis was limited, and the reliability of the subgroup analysis might be impaired to some extent. In subgroup analysis by malignant grade, low-grade brain tumors showed slightly less sensitivity (0.60; 95% CI, 0.35-0.81) compared with high-grade ones (0.74; 95% CI, 0.68 -0.80). These results were consistent with those in previous reports by other investigators indicating that 18 F-FDG-PET was less effective in low-grade brain tumors, 2,29 though this difference was not a statistically significant source of heterogeneity (P ϭ .46) in our analysis. In subgroup analysis by histology, the patients with glioma showed sensitivity and specificity similar to that of pooled data, suggesting that the type of brain tumor has no apparent influence on the diagnostic performance of 18 F-FDG-PET in brain tumor differentiation, regardless of the grading of glioma.
Qualitative assessment was used for image interpretation in most of the eligible 18 F-FDG studies, but there were various criteria for visual assessment as shown in On-line Table 2. These different criteria for visual assessment in 18 F-FDG studies may inevitably bring bias to our pooled data. Moreover, Tripathi et al 24 reported that even within the same 18 F-FDG study, interobserver agreement for visual interpretation was not good. The difficulty in discriminating the lesion and surrounding normal brain tissue in some patients and the subjectivity of visual assessment of the interpreter may together contribute to the low diagnostic ac-  curacy and require us to explain the pooled results prudently when using qualitative evaluation in 18 F-FDG-PET. Quantitative assessments were adopted in four 18 F-FDG-PET studies 20,22,24,26 that used a lesion-to-normal tissue ratio or standard uptake value as the criterion and showed better sensitivity but worse specificity than qualitative assessment. However, this difference was not a statistically significant source of heterogeneity (P ϭ .09) in our analysis. The setting of the cutoff value in quantitative assessment may greatly influence the results of diagnostic estimates and consequently affect the reliability of direct comparison of the diag-nostic values by quantitative and qualitative assessment. The limited number of patients analyzed with quantitative methods may also bring bias to the subgroup analysis by the method of assessment. In addition, due to the lack of the ability to distinguish lesion and normal brain tissue in 18 F-FDG-PET, the veracity of quantitative or semiquantitative methods for the interpretation of 18 F-FDG-PET images for evaluating brain tumor was also unreliable and could not provide additional information compared with visual assessment. 1,10,27 On the basis of the results of our meta-analysis, 18 F-FDG-PET does not appear to be an ideal ap-  proach for differentiating brain tumors, and we do not recommend its routine use for this purpose because a rather large number of diseases would be missed. On the other hand, the meta-analysis in 11 11 C-MET PET studies demonstrated excellent pooled sensitivity (0.91; 95% CI, 0.85-0.94) and specificity (0.86; 95% CI, 0.78 -0.92) for differentiating brain tumors. The overall high diagnostic accuracy of 11 C-MET PET (AUC ϭ 0.94) over 18 F-FDG-PET (AUC ϭ 0.80) for brain tumor differentiation is likely due to the high uptake in tumor and low accumulation in normal brain tissue. 25,41 The sensitivity and specificity of 11 C-MET PET for differentiating brain tumors in the subgroups by various conditions showed higher values than those in pooled and subgroup 18 F-FDG-PET analyses, indicating the superiority of 11 C-MET PET over 18 F-FDG-PET and the stability of the diagnostic effectiveness of 11 C-MET PET in patients with various tumor types. This result was further verified by the direct comparison of the diagnostic values of 18 F-FDG and 11 C-MET PET from the 5 studies. [22][23][24][25][26] After removing the potential bias caused by pooling all the data, the overall diagnostic accuracy of 11 C-MET PET (AUC ϭ 0.96) in the pooled data of the 5 studies was much higher than that of 18 F-FDG-PET (AUC ϭ 0.81). Although there were 4 eligible 11 C-MET PET studies that included patients with different types of brain tumors, 23,29,41,42 no heterogeneity was observed. Therefore, the results of metaanalysis in 11 C-MET PET studies are more convincing, though there were fewer patients in 11 C-MET PET studies than in 18 F-FDG-PET studies. On the basis of our meta-analysis, it is always preferable to use 11 C-MET PET instead of 18 F-FDG-PET for differentiating brain tumors if possible.
However, although 11 C-MET PET appears to be a promising tracer for brain tumor differentiation, the use of 11 C-MET PET is restricted to the PET centers with a cyclotron due to the short half-life of 11 C (20 minutes) and the rapid catabolism of 11 C-MET. 25 As substitutes for 11 C-MET and 18 F-FDG, 18 F (half-life ϭ 110 minutes) labeled PET tracers such as O-(2-[ 1 8 F]fluoroethyl)-L-tyrosine ( 18 F-FET), 8 3,4-dihydroxy-6-18 F-fluoro-L-phenylalanine ( 18 F-FDOPA), 46 and 3Ј-deoxy-3Ј-18 F-fluorothymidine ( 18 F-FLT) 18,20 have been developed and applied in brain tumor imaging. Among these 18 F-labeled tracers, 18 F-FET and 18 F-FDOPA show superiority over 18 F-FDG in brain tumor differentiation, especially in low-grade brain tumors, 1,47 on the basis of the advantage of their high uptake in tumor tissue and low uptake in normal brain tissue. A meta-analysis with 18 F-FET PET studies in patients with SPBT has demonstrated the high performance of 18 F-FET PET with a pooled sensitivity and specificity of 0.82 and 0.76, respectively. 48 However, these values were not as good as our results for 11 C-MET PET. As for the 18 F-FLT PET, Choi et al 7 reported that although 18 F-FLT PET is useful for evaluating tumor grade and cellular proliferation in brain tumors, it is not useful enough for differentiating tumors from nontumorous lesions. Because the PET studies using 18 F-FDOPA for brain tumor differentiation are still insufficient, further systematic evaluation using this tracer is expected in the future.
The recent meta-analysis by Nihashi et al 31 on the diagnostic accuracy of PET for recurrent glioma reported that 18 F-FDG-PET had moderately good sensitivity and specificity in either pooled glioma with different grades (sensitivity ϭ 0.77, specificity ϭ 0.78) or high-grade glioma (sensitivity ϭ 0.79, specificity ϭ 0.70).
The subgroup analysis in our study showed similar sensitivity and specificity of 18 F-FDG-PET in the glioma group. However, Nihashi et al reported a pooled sensitivity as low as 0.7 for 11 C-MET PET for high-grade gliomas, which is considerably different from our results. We believe our results are more reliable because we enrolled more eligible studies with more abundant data and investigated more brain tumor types.
However, a few limitations should be addressed in this study. First, although the overall sample size was as much as 857 in 18 F-FDG-PET and 416 in 11 C-MET PET settings, there was still incomplete data collection as reflected by the failure to access the full text of an eligible study. 49 In addition, the insufficient data in certain subtypes of brain tumors may impair the reliability of the analysis. We searched potential sources for heterogeneity in 18 F-FDG-PET studies and found that disease status was a significant source. Although the use of the bivariate model could at least correct this issue to some extent, the results should be interpreted cautiously. Second, methodologic quality might be a source of heterogeneity in 18 F-FDG-PET studies. We found that some included studies had low methodologic quality with few Y scores (Յ6 items) in the Quality Assessment of Diagnostic Accuracy Studies. 5,10,13 In addition, a number of eligible studies included patients with various types of brain tumors. 1,7,9,10,14,16,17,23 Because the characteristics of different brain tumors may vary greatly (eg, brain lymphoma usually presents with higher 18 F-FDG uptake than normal brain tissue), 50 the composition of different brain tumors might bring bias by a nonrepresentative patient spectrum in 18 F-FDG-PET studies. Third, some characteristics of the eligible studies were not in good agreement. For example, the starting time of PET imaging after tracer injection differed widely, especially in the 18 F-FDG-PET studies. The differences of study characteristics may also contribute to the heterogeneity of our meta-analysis.

CONCLUSIONS
Our meta-analysis shows that 18 F-FDG-PET has limited diagnostic performance in brain tumor differentiation. However, 11 C-MET PET has excellent diagnostic performance in brain tumor differentiation and should be considered as a preferential approach for this purpose if available. Due to the inconvenience of the supply of 11 C, other 18 F-labeled tracers such as 18 F-FDOPA could be considered as potential alternatives for brain tumor differentiation and deserve future systematic evaluation after accumulating relevant studies in the future.