Diagnostic Accuracy of PET for Recurrent Glioma Diagnosis: A Meta-Analysis

These authors compared the diagnostic accuracy of PET with that of CT and MRI in the diagnosis of recurrent glioma in 26 previously published articles. PET studies with either FDG or carbon methionine were obtained once glioma recurrence was suspected on CT and/or MRI. Diagnostic accuracies were heterogeneous and studies did not compare PET with other imaging modalities. Despite these limitations, PET with both tracers appears to have a moderately good accuracy as an add-on test for diagnosing recurrent glioma. BACKGROUND AND PURPOSE: Studies have assessed PET by using various tracers to diagnose disease recurrence in patients with previously treated glioma; however, the accuracy of these methods, particularly compared with alternative imaging modalities, remains unclear. We conducted a meta-analysis to quantitatively synthesize the diagnostic accuracy of PET and compare it with alternative imaging modalities. MATERIALS AND METHODS: We searched PubMed and Scopus (until June 2011), bibliographies, and review articles. Two reviewers extracted study characteristics, validity items, and quantitative data on diagnostic accuracy. We performed meta-analysis when ≥5 studies were available. RESULTS: Twenty-six studies were eligible. Studies were heterogeneous in treatment strategies and diagnostic criteria of PET; recurrence was typically suspected by CT or MR imaging. The diagnostic accuracies of 18F-FDG (n = 16) and 11C-MET PET (n = 7) were heterogeneous across studies. 18F-FDG PET had a summary sensitivity of 0.77 (95% CI, 0.66–0.85) and specificity of 0.78 (95% CI, 0.54–0.91) for any glioma histology; 11C-methionine PET had a summary sensitivity of 0.70 (95% CI, 0.50–0.84) and specificity of 0.93 (95% CI, 0.44–1.0) for high-grade glioma. These estimates were stable in subgroup and sensitivity analyses. Data were limited on 18F-FET (n = 4), 18F-FLT (n = 2), and 18F-boronophenylalanine (n = 1). Few studies performed direct comparisons between different PET tracers or between PET and other imaging modalities. CONCLUSIONS: 18F-FDG and 11C-MET PET appear to have moderately good accuracy as add-on tests for diagnosing recurrent glioma suspected by CT or MR imaging. Studies comparing alternative tracers or PET versus other imaging modalities are scarce. Prospective studies performing head-to-head comparisons between alternative imaging modalities are needed.

noninvasive neuroimaging modalities are needed to better guide the management of patients with suspected recurrence.
PET is a promising molecular neuroimaging technique that provides metabolic tumor information complementing the CT and MR imaging examinations. 8 Several studies have evaluated PET by using various tracers (eg, 18 F-FDG, 11 C-MET, 18 F-FET, or 18 F-FLT) as a test for aiding the differential diagnosis of suspected glioma recurrence. 18 F-FDG is the most widely used tracer; its uptake correlates with the amount of glucose consumption and the local metabolic rate within the glioma lesion. 9 Uptake of 18 F-FDG in high-grade glioma is typically similar to or less than that in normal gray matter; uptake in low-grade glioma is similar to that in white matter. 8,10 Due to the low contrast between tumor and healthy brain tissue with 18 F-FDG, however, more specific tracers have been developed. Amino acid tracers such as 11 C-MET and 18 F-FET offer higher contrast than 18 F-FDG based on the increased intracellular amino acid use and extracellular matrix production of tumor cells. 8,10 Further, uptake of 18 F-FLT correlates well with thymidine kinase-1 activity, a cytosolic enzyme with high concentration in proliferating cells but low in resting cells. 8,10 Because cell proliferation rates are higher in malignant glioma cells compared with scar tissue, 18 F-FLT can also differentiate tumor recurrence from treatment-induced necrosis.
Studies evaluating novel PET tracers have small sample sizes and use heterogeneous designs, making interpretation of published data difficult. Furthermore, the comparative effectiveness of alternative imaging modalities such as advanced MR imaging techniques (eg, perfusion MR imaging or MR spectroscopy) is currently uncertain. We performed a systematic review to provide a comprehensive summary and quantitative synthesis of information on the diagnostic accuracy of PET by using various tracers to diagnose disease recurrence in patients with previously treated glioma. We also aimed to compare PET with other imaging modalities for differentiating recurrent or progressive glioma from treatment-induced necrosis, when used as add-on tests to conventional MR imaging.

Search Strategy, Study Eligibility, and Data Abstraction
We searched the Medline and Scopus databases (from inception through June 30, 2011) with no language restriction. The complete search strategies are presented in the On-line Appendix. To complement our data base searches, we examined the reference lists of eligible studies and relevant review articles.
Two reviewers (T.N., T.T.) independently screened abstracts and further examined full-text articles of potentially eligible citations. Studies that assessed PET by using any tracer for differentiating disease recurrence from treatment-induced necrosis in patients with suspected glioma recurrence after any form of treatment were eligible. We included both prospective and retrospective studies, and we considered pathologic confirmation with or without clinical follow-up as the reference standard. We included only English language publications that evaluated at least 10 patients; smaller studies do not provide meaningful estimates of accuracy. We excluded studies that did not provide adequate information to allow the calculation of sensitivity and specificity.
We also excluded editorials, comments, letters to the editor, and review articles.
One of 2 reviewers (T.N., T.T.) extracted descriptive data from each eligible study, which were verified by a second reviewer. We extracted the following information from eligible studies: first author, year of publication, journal, patient demographic and clinical characteristics, therapeutic interventions, technical specifications of PET, and interpretation of PET results. Two reviewers (I.J.D., T.T.) independently extracted quantitative data regarding imaging results and final diagnoses. Discrepancies were resolved by consensus. When studies performed a direct comparison between different imaging modalities (eg, 18 F-FDG PET versus thallium 201 [ 201 Tl] SPECT), we extracted data on accuracy for all imaging tests investigated.
We took particular care to identify publications with at least partially overlapping populations by comparing authors, centers, recruitment periods, patient demographic characteristics, and glioma histologies. We included all relevant publications in qualitative synthesis but only included studies with nonoverlapping patient populations in meta-analyses, to avoid double counting of evidence. Specifically, when multiple publications with potentially overlapping patient populations were available, we only included the publication with the largest sample size in the meta-analysis.

Validity Assessment
To assess the validity and reporting quality of studies, we evaluated 14 items that were considered relevant to the review topic on the basis of the Quality Assessment of Diagnostic Accuracy Studies instrument. 11,12 The complete operational definition of each item is available from the authors on request. For comparative studies of diagnostic tests, we extracted the proportion of study participants receiving each comparator test. We operationally defined an "optimal direct comparison" as the performance of both tests at the same time point in at least 90% of eligible patients. This cutoff was chosen to limit the potential for patient selection and disease progression bias. Two reviewers (T.N., T.T.) independently assessed study quality, and discrepancies were resolved by consensus.

Data Synthesis
For each study, we constructed a 2 ϫ 2 contingency table consisting of true-positive, false-positive, false-negative, and true-negative results. Patients were categorized according to whether they were test positive or negative (on the basis of imaging) and whether they had relapsed glioma by the reference standard. We extracted results of visual and quantitative assessments separately. When a study reported test results at multiple time points during clinical follow-up, we only recorded the results of the test performed closest to the completion of treatment (ie, the first instance of PET performance after recurrence was suspected). Also, in studies in which histologic results were negative and clinical follow-up results were also reported, we planned to only consider the clinical status as the reference standard because it is more important from the patient's perspective. However, no cases of discrepant results between pathologic and clinical reference standards were reported in the studies we reviewed.
We recorded the counts of true-positive, false-positive, false-negative, and true-negative results based on the cutoff values specified by each study (when reported). When studies did not specify cutoff values but did report numeric data of quantitative assessment for each enrolled patient, we used the following methods to construct a 2 ϫ 2 of test results: For 18 F-FDG, we determined the optimal cutoff threshold for defining positive and negative scans by ROC analysis; for 11 C-MET results, we used a cutoff value of 1.5 of the tumor-to-normal reference ratio or similar indices to define positive (Ͼ1.5) and negative (Յ1.5) results as the main analysis as recommended by experts 9 and the optimal cutoff threshold determined by ROC analysis as a sensitivity analysis.
We calculated sensitivity and specificity for each study with their corresponding 95% CIs. We obtained summary estimates of sensitivity and specificity with their corresponding 95% CIs by using bivariate random effects meta-analysis with the exact binomial likelihood, when Ն5 studies were available (because of model complexity at least 5 studies are required for estimation). 13,14 Summary positive and negative likelihood ratios were calculated from the summary sensitivity and specificity estimates. We assessed between-study heterogeneity visually, by plotting sensitivity and specificity separately in forest plots, and also in the ROC space. We constructed summary ROC curves and confidence regions for summary sensitivity and specificity when appropriate. 13,15 For each ROC curve, we estimated the Q* statistic, the point on the curve where sensitivity and specificity are equal, as a global measure of diagnostic accuracy. When a study reported results based on both visual and quantitative assessments of PET imaging, visual assessment was preferred over qualitative assessment for 18 F-FDG PET and quantitative assessment was preferred over visual assessment for 11 C-MET PET because the respective assessment methods were in the majority of cases. Alternative approaches (ie, by using quantitative assessment for 18 F-FDG PET and visual assessment for 11 C-MET PET) were explored in sensitivity analyses.
To explore heterogeneity, we performed a subgroup analysis limited to high-grade gliomas only. To further explore whether study-level characteristics could explain betweenstudy heterogeneity, we performed univariat (single predictor) meta-regression analyses by using the bivariate model (2 outcomes, sensitivity and specificity, modeled jointly). We assessed the following, a priori selected, covariates: year of publication, study design (prospective versus retrospective), study size, relapse rate, proportion of use of temozolomide, and type of reference standard (pathology only versus pathology and clinical follow-up).
Analyses were conducted by using STATA 11.1/SE (StataCorp, College Station, Texas) and Meta-Analyst, Version 3.0 ␤ (Tufts Medical Center, Boston, Massachusetts). All tests were 2-sided, and statistical significance was defined as a P value Ͻ .05.

Study Selection and Characteristics
Our PubMed and Scopus searches identified 2808 and 3835 citations, respectively, of which 48 were considered potentially eligible and were retrieved in full text for further assessment. We identified an additional potentially eligible article by perusing the reference lists of relevant review articles. After full text review, 23 publications were excluded and 26 studies were considered eligible for this review (Fig 1).  A complete list of excluded studies along with reasons for exclusion is provided in the Appendix.
Studies used variable diagnostic criteria both for visual and quantitative assessments (On-line Table 4). The 2 indexes most commonly used in quantitative assessments were maximum standardized uptake values within the region of interest of the suspected lesions and the ratio of uptake in the suspected lesion to that in a reference area. No study explicitly reported how the region of interest was specified. Studies typically reported pairs of sensitivity and specificity on the basis of the optimal cutoff values estimated by ROC analysis. Only 1 study reported inter-rater agreement when multiple interpreters were involved in the interpretation of PET results. 25

Assessment of Validity
No study adequately reported all 14 items relevant to study validity that we assessed (On-line Table 5). Reporting was particularly poor regarding the following items: blinding of interpreters of the index and reference standard tests, whether the decision to perform biopsy was based on PET results, and whether additional treatments were applied during clinical follow-up. Most studies had a retrospective design and did not clearly report whether consecutive patients were included. Six of 26 studies (23%) adopted a reference standard comprising biopsy only without clinical follow-up.

Meta-Regression Analyses
We performed meta-regression analyses only for studies evaluating 18 F-FDG PET (both for all glioma histologies and for highgrade glioma alone) because this was the only tracer evaluated in Ͼ10 studies. In meta-regression analyses, year of publication, study design, sample size, relapse rate, proportion of use of temozolomide, or the type of reference standard did not affect test performance statistically significantly (all P values Ͼ .05).

Comparisons among Different PET Tracers
Six studies 20,25,31,36,39,40 reported on 9 comparisons among pairs of different PET tracers. In all except 1 study, both tests were performed in at least 90% of study participants. Generally, test comparisons showed trade-offs between sensitivity and specificity (consistent with diagnostic threshold effects), suggesting that different tracers may have broadly similar diagnostic accuracy (Online Fig 3).

Comparisons between PET and Other Imaging Modalities
Six studies 17,21,27,32,33,39 reported on 10 comparisons between 18 F-FDG PET and other imaging tests. Five comparisons were between 18 F-FDG PET and advanced MR imaging techniques; only 1 of these comparisons involved Ͼ90% of study participants. No study explicitly stated how patients were selected for additional diagnostic testing, suggesting that selection bias may have affected results. ROC plots did not show any clear pattern (Online Fig 3).
Four studies 20,30,38,39 reported on 8 comparisons between 18 C-MET PET and other imaging tests. Among 4 comparisons between 11 C-MET PET with advanced MR imaging techniques, only 1 involved Ͼ90% of study participants. Studies again did not report why patients were excluded from additional testing. No consistent pattern regarding comparative diagnostic accuracy was evident from these studies (On-line Fig 3).

DISCUSSION
This systematic review suggests that PET by using 18 F-FDG or 11 C-MET has moderately good overall accuracy for diagnosing disease recurrence, independent of histologic grade, among patients with glioma for whom recurrence was suspected by conventional anatomic imaging tests such as CT or MR imaging. These results are mainly based on visual assessment for 18 F-FDG, and quantitative assessment for 11 C-MET; however, various diagnostic criteria and thresholds were adopted across studies. Evidence on other tracers is sparse. Furthermore, evidence is limited regarding comparisons among different PET tracers, as well as for the comparison of PET with non-PET imaging modalities. 18 F-FET and 18 F-FLT are relatively new PET tracers and have been in clinical use only for the past decade. Their diagnostic accuracy has been assessed only in a limited number of referral centers. Although promising pilot data have been reported, further validation is needed. The studies we reviewed typically focused on individual imaging modalities and did not allow a complete comparative evaluation of the available imaging technologies. Furthermore, most studies retrospectively and jointly assessed low-and high-grade gliomas treated with heterogeneous treatment strategies and adopted heterogeneous methodologies for confirming disease relapse with diverse follow-up protocols. Our findings expand the findings of previous narrative reviews 9,44 by providing a quantitative overview of the diagnostic accuracy of PET for differentiating between disease recurrence and treatment-induced changes. Additionally, our work provides a comprehensive review of studies directly comparing different PET tracers and those comparing PET with alternative imaging tests in this clinical setting. Several limitations of the available evidence are worth noting. Most studies had limited internal and external validity; therefore, reported accuracy estimates may not be replicable or relevant to other clinical settings. Also, few studies used treatment strategies that would be consistent with the current standards of care (ie, surgery alone for low-grade glioma and temozolomide-based multimodality therapy for high-grade glioma). Thus, our results may be less applicable to current clinical practice. In addition, our analyses on comparative evidence are based on a limited number of studies. Furthermore, studies comparing PET with MR imag- ing modalities are based mostly on selected patients; therefore, our results should be interpreted with caution. Finally, no studies of 18 F-choline or 3,4-dihydroxy-6-18 F-fluoro-L-phenylalanine PET 9 fulfilled our inclusion criteria, and we did not consider noncomparative studies of non-PET imaging modalities.
Pseudoprogression is a clinically benign phenomenon characterized by the appearance and subsequent stabilization (or spontaneous regression) of enhancing lesions on routine MR imaging, within 2 months after completion of concurrent treatment with temozolomide and radiation. 6 It is unclear whether our results are directly applicable to the use of PET for differentiating between pseudoprogression and true tumor progression; few patients in our review had been treated with temozolomide-based therapies, and most patients had been evaluated with PET at later time than when pseudoprogression is typically suspected.
Future studies of PET for the evaluation of glioma relapse should use prospective designs, focus on clinically relevant patient populations treated with standardized protocols, and avoid potential biases in evaluating test accuracy. An important research priority is the assessment of test performance for distinguishing pseudoprogression from true tumor progression in the context of temozolomide-based treatment for high-grade glioma. Given the current emphasis on the comparative effectiveness of health care interventions, research efforts should focus on the relative benefits and risks of competing imaging modalities in real-world clinical settings. 45 Data on head-to-head comparisons among individual imaging modalities (ie, comparisons among different PET tracers [eg, 18 F-FDG PET versus 11 C-MET PET] and comparisons of PET versus novel MR imaging techniques [eg, MR spectroscopy, diffusion-weighted imaging, and perfusion-weighted imaging]) and more complex testing strategies (eg, PET alone versus PET plus another non-PET technique) are particularly needed.

CONCLUSIONS
Both 18 F-FDG and 11 C-MET PET have moderately good overall accuracy for detecting recurrent glioma in patients with suspected recurrence following active treatment. Data on other PET tracers, though seemingly promising, are scarce. Comparative evidence is generally limited and whether a specific PET tracer outperforms others or whether PET is superior to alternative imaging modalities remains unclear. Prospective comparative studies are needed to elucidate the optimal imaging strategy for evaluating patients with suspected recurrent glioma.