Introduction

High-grade gliomas (HGG) are the most common primary brain tumours in adults and have low survival rates [1]. Current standard therapy consists of surgical gross total or subtotal resection followed by concomitant chemoradiotherapy (CCRT) and adjuvant chemotherapy with temozolomide (TMZ) [2]. Decisions about continuation or discontinuation of treatment for individual patients with high-grade gliomas depend on adequate imaging. Similarly, identification of new active drugs often depends on assessment of an objective response rate, which is established by changes in the tumour seen on imaging [3].

Traditionally, response assessment in HGG is done on the basis of assessment by contrast (gadolinium) enhanced T1-weighted MRI. However, this technique represents a disruption of the blood-brain barrier and thereby does not measure tumour activity specifically [4]. In many situations, changes in enhancement do not correlate with response. Up to 50% of the patients show pseudo-progression, in which an increase in contrast enhancement does not reflect tumour progression, but treatment induced changes [5].

To overcome limitations of anatomical T1-weighted MRI with gadolinium, more advanced imaging techniques have been employed in patients with HGG [4]. Diffusion-weighted MRI is frequently performed in routine clinical practice to image changes in cytoarchitecture and cellular density [6, 7]. However, even newer imaging methods based on MRI can identify tumour-induced neovascularization (perfusion weighted MRI) and changes in concentrations of metabolites (magnetic resonance spectroscopy) [68].

Many small limited studies have shown the potential usefulness of the different advanced techniques for assessment of treatment response in HGG [68]. However, a systematic review and meta-analysis demonstrating the diagnostic accuracy of the anatomical and all advanced MRI techniques is lacking.

To this end, we conducted a systematic review and meta-analysis to provide an overview of the diagnostic accuracy of treatment response assessment in HGG patients. We hypothesized that advanced MRI techniques show a higher diagnostic accuracy compared to anatomical MRI techniques in patients treated for HHG.

Methods

This systematic review and meta-analysis was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) criteria [9]. Additionally, the AMSTAR guidelines and the Cochrane handbook for review of diagnostic test accuracy were also used [10].

Search strategy

See electronic supplementary material.

Selection criteria

Studies including HGG patients that received first line standard therapy according to the Stupp protocol and underwent anatomical or advanced MRI imaging were included [2]. Studies were included if 2x2 tables could be extracted. The definitive diagnosis, either treatment induced changes or tumour progression, was established by histological follow-up, imaging follow-up, clinical follow-up, or a combination of these.

Reasons for exclusion were other intracranial malignancies, metastases, and brainstem or optic gliomas. Studies among paediatric patients (<18 years) and case reports were also excluded. Studies that were conducted before 2005 were excluded as TMZ was not incorporated in standard therapy before 2005, while TMZ might increase the occurrence of treatment related imaging changes [7, 11]. Finally, studies that used a MRI <1.5 Tesla were excluded as this does not represent current clinical practice.

Study selection, data extraction, and quality assessment

After duplicates were eliminated, studies were screened for eligibility based on title, abstract, and subsequently on full text by two authors independently (BD, AH). Data from the included studies were extracted with the use of a data extraction form. Extracted data contained true positives, false positives, true negatives, false negatives, and general characteristics. General characteristics included total number of patients, study design, mean age, and age range of patients, gender, tumour histology, selection criteria of included patients, reference standard (histology/imaging/clinical follow-up), MRI characteristics and time-point of progression on MRI, and the cut-off value of the index test. If 2x2 tables could not be generated, the authors were requested to provide these data. Study quality was assessed according to the quality assessment of diagnostic accuracy studies (QUADAS-2) [12].

Statistical analysis

Sensitivity and specificity with 95% confidence interval (CI) were calculated for all MRI modalities in RevMan 5.3 (Cochrane collaboration, Copenhagen, Denmark). Analyses of study heterogeneity are not recommended, because it is a univariate measure that does not account for heterogeneity explained by phenomena such as positivity threshold effects [13]. Visual inspection of the generated forest plots was done to assess heterogeneity. We evaluated whether the following factors could explain heterogeneity; study design, mean age of patients, WHO type, cut-off value of the index test, type of follow-up, and time point of progression on MRI (see also Table 1). We performed subgroup analysis (≥5 studies) to explore and explain heterogeneity in test characteristics. Moreover, we evaluated whether outliers could be explained by study or patient characteristics, and we performed sensitivity analysis without outliers to evaluate how robust the results are.

Table 1 Characteristics of the included studies

Bivariate random effects models are used, because heterogeneity is to be expected in diagnostic test accuracy studies [49]. Pooled estimates of sensitivity, specificity, positive likelihood ratios, and negative likelihood ratios with 95%CI were calculated for each index test consisting of five or more studies, using the MIDAS module for meta-analysis of diagnostic test accuracy studies in STATA/SE 12.1 (College Station, TX, USA).

To provide insight in the potential clinical consequences, we established a hypothetical cohort of 100 HGG patients suggestive of progression for each MRI technique. We calculated 2x2 tables by using the mean tumour prevalence of the reference standard, pooled sensitivities and specificities of each MRI modality, and we present the number of misclassifications, false positives and false negatives. The hypothetical tumour prevalence was based on the mean tumour prevalence of the cohort studies included in this meta-analysis.

Results

A total of 1371 unduplicated studies were identified through our electronic database search (Fig. 1). After selection based on title and abstract, the remaining studies underwent full-text eligibility assessment. Full text assessment resulted in the identification of 45 relevant studies [1448, 5059]. We requested data to generate 2x2 tables from ten studies, but none of the authors could provide the requested data, resulting in no unpublished data in this meta-analysis. Thus, final inclusion consisted of a total of 35 studies in this systematic review of which four (11%) were abstracts only [25, 26, 37, 38]. The study characteristics of the included and excluded studies are shown in Table 1 and Table 2, respectively.

Fig. 1
figure 1

Flow chart of included studies. Flow chart of included studies. Abbreviations: ADC = apparent diffusion coefficient; ASL = arterial spin labelling; DCE = dynamic contrast enhanced; DSC = dynamic susceptibility contrast; MRI = magnetic resonance imaging; MRS = magnetic resonance spectroscopy

Table 2 Characteristics of the excluded studies

The included studies consisted of 1174 patients with a mean age of 51.6 years of whom 61.3% were male (Table 3). The initial lesion was proven to be WHO type III in 11% (N = 124) and WHO type IV in 81% (N = 951). The remaining 8% (N = 99) was unspecified HGG. Mean tumour prevalence of the 34 cohort studies was 60% (range 31–85%). One case-control study was not taken into account for the calculation of the tumour prevalence [42]. Histological follow-up was used in 43% of patients (N = 502), imaging follow-up in 35% of patients (N = 406), clinical follow-up in <1% of patients (N = 3), and a combination of follow-up methods was used in 22% of patients (N = 263).

Table 3 General characteristics of included patients

Several of the included studies analysed multiple MRI modalities; therefore, a total of five anatomical MRI studies (N = 166) [23, 29, 39, 44, 47], seven apparent diffusion coefficient (ADC) studies (N = 204) [14, 15, 24, 25, 30, 33, 41], 18 dynamic susceptibility contrast (DSC) studies (N = 708) [1517, 19, 20, 2228, 30, 31, 37, 38, 40, 45], five studies on dynamic contrast enhanced (DCE) (N = 207) [18, 21, 32, 40, 42], two arterial spin labelling (ASL) studies (N = 102) [20, 40], and nine magnetic resonance spectroscopy (MRS) studies (N = 203) were included [22, 24, 3436, 40, 43, 46, 48].

Methodological quality of included studies

See electronic supplementary material and Fig. 2.

Fig. 2
figure 2

Quality assessment of included studies. The risk of bias in four different domains and concerns about applicability are shown for the included studies. High risk (), unclear risk () and low risk ()

Main findings

The forest plots and pooled results are demonstrated in Fig. 3 and Table 4, respectively. The anatomical MRI forest plot (five studies, 166 patients) shows a high variation in both sensitivity and specificity, with wide confidence intervals for three studies [23, 29, 44]. The wide confidence intervals of two references could be explained by the small sample size [23, 29]. The moderate methodological quality might explain the wider confidence intervals in the other study [44]. Anatomical MRI showed a pooled sensitivity and specificity of 68% (95%CI 51–81) and 77% (95%CI 45–93), respectively.

Fig. 3
figure 3

Forest plots with diagnostic accuracy of different MRI techniques. Diagnostic accuracy and the 2x2 table are displayed with true positives (TP), false positives (FP), false negatives (FN) and true negative (TN). Sensitivity and specificity with the 95% Confidence intervals (CI) are given

Table 4 Pooled accuracy of MRI techniques

Sensitivity and specificity were both homogeneous in the forest plot of the ADC (seven studies, 204 patients); however, the confidence intervals are rather wide for the specificity. For ADC pooled sensitivity and specificity were 71% (95%CI 60–80), and 87% (95%CI 77–93), respectively. One abstract was included in this group [25], but sensitivity analysis excluding this study showed comparable sensitivity (75%, 95%CI 65–83) and specificity (85%, 95%CI 72–93) [15].

The sensitivity of the DSC (18 studies, 708 patients) is homogeneous with small confidence intervals. The specificity showed slightly more variability with wider confidence intervals. DSC showed a sensitivity of 87% (95%CI 82–91) and specificity of 86% (95%CI 77–91). This group included four abstracts [25, 26, 37, 38]. Sensitivity analysis excluding these studies showed minor increase in of the sensitivity with 87% (95%CI 81–92) and specificity of 89% (95%CI 80–95).

The confidence interval of the specificity of one study for the DCE (five studies, 207 patients) was also wide without clear reason [32], but the other studies showed small confidence intervals in both the sensitivity and specificity. For DCE the pooled sensitivity was slightly higher compared to the DSC with a sensitivity and specificity of 92% (95%CI 73–98) and 85% (95%CI 76–92), respectively.

For ASL, too few studies (two studies, 102 patients) were included in the meta-analysis for pooled accuracy estimate calculation. ASL showed a sensitivity range of 52–79% and a specificity range of 64–82%.

The forest plot of the MRS (nine studies, 203 patients) was overall homogeneous and showed small confidence intervals, with one exception in the specificity, possibly due to a moderate methodological quality as blinding was not assured both for the interpretation of the MRI as well as the reference standard [36]. MRS showed the highest pooled sensitivity and specificity with 91% (95%CI 79–97) and 95% (95%CI 65–99), respectively. Sensitivity analysis with the exclusion of one study [36] showed that it has only minor influences on the results altering the group sensitivity and specificity to 92% (95%CI 78–97) and 96 (95%CI 74–100).

Study design, mean age of patients, WHO type, cut-off value of the index test, type of follow-up, and time point of progression on MRI (see also Table 1) were evaluated as covariates and showed to be unable to explain differences in sensitivity and specificity of the studies.

To provide insight in the clinical implication of the investigated MRI techniques we also calculated the missed number of patients with true progression and total number of misclassifications in a hypothetical cohort of 100 HGG patients. We used the found tumour prevalence (60%) in this current analysis and the pooled sensitivity and specificity of each MRI technique. With anatomical MRI 19 recurrent tumours would be missed. For ADC and DSC this would be 17 and eight missed tumours, respectively. Both DCE and MRS would result in the least missed cases of progression (N = 5). Anatomical MRI would show a total of 28 misclassified patients. This would be 22, 14, and 11 for ADC, DSC, and DCE, respectively. MRS would induce the lowest number of misclassifications, with a total of seven out of the 100 patients being misclassified.

Discussion

This meta-analysis including 35 studies, is the first pooling the results of all diagnostic MRI techniques in HGG patients following treatment. We demonstrated that all advanced MRI techniques showed a higher diagnostic accuracy than anatomical MRI in the differentiation between treatment induced changes and true progression. Among the advanced MRI techniques, MRS showed the highest diagnostic accuracy followed by perfusion MRI.

Diffusion derived ADC values showed the lowest accuracy of all advanced MRI techniques; however, it is currently most commonly available. We showed that the employment of novel advanced MRI techniques had higher diagnostic accuracy in the differentiation between true progression and treatment induced changes. Therefore, we suggest the incorporation of other advanced MRI in treatment assessment in HGG on top of DWI. This is supported by several studies that showed that diagnostic accuracy could significantly be enhanced by a combination of two or more advanced MRI techniques [60, 61]. Most important, adding MRS to perfusion weighted techniques could increase the diagnostic accuracy up to 90% in one study [40].

With a pooled sensitivity and specificity of 91% and 95%, respectively, we found MRS to be the most promising advanced MRI technique for the treatment response assessment in HGG. MRS, however, has several limitations. First, the voxel sizes are relatively large possibly leading to partial volume effects between recurrent tumour and treatment induced changes [4]. Detection of smaller lesions on MRS is, therefore, challenging. Secondly, due to low metabolite concentrations, a considerable number of acquisitions are required, resulting in long scan times [7]. Finally, MRS is technically challenging because of the need to exclude signal contamination from tissues adjacent to the tumour, such as lipids (from the scalp) and water (from the ventricles). Surgical clips also disrupt the local field homogeneity and may affect the quality of the data. These limitations challenge the incorporation of MRS in daily practice; however, a multivoxel technique should be feasible to perform in most clinics.

Various metabolic ratios were used in the MRS studies. In this meta-analysis we were unable to differentiate between the various metabolite ratios in MRS, because of the variability of the included ratios. Moreover, three of the included studies did not specify the investigated metabolite ratio [35, 43, 46]. However, five out of the nine included studies identified choline/creatine ratio as the best predictor in the differentiation between true progression and treatment induced changes [22, 24, 40, 43, 48]. One study reported similar results for choline/creatine and lactate/choline ratios, with the latter showing a slightly higher accuracy [34]. Furthermore, the included studies used various thresholds, or did not specify the used thresholds. Only one study used a considerably low cut-off value of 1.07, possibly explaining the low specificity of this study [40].

Among the perfusion techniques, DSC is the most widely used method. However, DSC is a dynamic parameter and values can vary over time. Yet, there is no consensus about the optimum time point. Furthermore, steroids are known to influence DSC measures, which are regularly prescribed if clinical deterioration due to true progression or treatment effects is present. Finally, there is no automatic post-processing method for identifying regions of interest, and is thus highly operator dependant [4]. This operator-dependant variability is also displayed in our meta-analysis by the different rCBV thresholds among studies (range 0.71–3.7).

DCE showed highest diagnostic accuracy among the perfusion techniques in the differentiation between treatment induced changes and true progression in this meta-analysis. At present, DCE is not widely used in a clinical setting primarily due to complicated quantification of the DCE parameters. Although DCE MRI has limited temporal resolution, the spatial resolution is higher than DSC MRI. This makes DCE more accurate in mixed lesions showing both true progression and treatment induced changes [7].

Although ASL is a complete non-invasive and quantitative method, the universal availability remains its largest limitation [8]. We could only identify two ASL studies and, therefore, it is not possible to make judgments reliably on the diagnostic accuracy of ASL in differentiating between true progression and treatment induced changes.

In our hypothetical cohort of 100 patients, ADC showed fewer misclassifications than anatomical MRI and could thus provide guidance to the definite diagnosis. ADC is a quantifiable measurement and can be achieved fast and easily [4]. However, the reliability of ADC can be affected by oedema and the formation of fibrosis in treatment induced changes [6].

A limitation that also should be noted is the inclusion of four abstracts. Inclusion of abstracts prevent a publication bias. However, quality and extend of information provided in abstracts is limited and they have not undergone the full peer review process as full articles.

Another possible limitation is that not all studies applied the same reference test. However, either histology or imaging follow-up was performed in all except three patients to provide definite diagnosis. Although we considered both histological follow-up and imaging follow-up to be reliable diagnostic methods, the reliability may not be equivalent. According to the Response Assessment in Neuro-Oncology (RANO) criteria, the development of pseudo-progression is limited to the first 3 months after CCRT [3]. However, it is suggested that 30% of pseudo-progression cases occur after more than three months post-CCRT [62]. Therefore, the accuracy of the reference test could differ between the included studies depending on the follow-up duration. However, no difference could be seen between early follow-up studies and studies that were conducted more than three months after CCRT.

Large multicentre longitudinal prospective trials are needed to define the optimum time for assessment of metabolic and physiological MRI parameters using advanced techniques. These should be in relation to histopathological changes in HGG, treatment effects, and patient outcomes. This would allow for testing all techniques in the same population, which would overcome one major limitation of the current meta-analysis with indirect comparisons only as a direct comparison between tests in a meta-analysis can only be performed if both contain >10 studies. These new prospective trials should use standardised cut-off values also, although they might remain arbitrary because of the heterogeneity in the biological activity of HGG and the use of different MRI systems. An advice with the best cut-off values and ratios for the anatomical and advance MRI sequences most precisely defining post therapy changes from tumour progression is currently hindered by the high variability of the used cut-offs and variables. However, it would be a valuable guideline for the clinician in daily practise. The latter could be addressed using normalised cut-off values. Despite these possible limitations, implication into clinical practice would be an important step in making an accurate treatment decisions for HGG patients.

Conclusion

Our meta-analysis demonstrated a clear advantage of advanced MRI techniques for differentiation between true progression and treatment-induced changes in patients with HGG. All advanced MRI techniques showed a higher diagnostic accuracy than anatomical MRI. MRS showed the highest diagnostic accuracy followed by perfusion. Although a diffusion technique with ADC values is currently the most common used advanced technique, it showed the lowest diagnostic accuracy of all advanced MRI techniques. This study supports the extension of other advanced MRI techniques for assessment of treatment response in patients with HGG.