Introduction

Glioblastoma is the most common primary brain tumor in the United States, with a reported incidence of approximately 3.19 cases per 100,000 person-years (~9400 cases per year) [1]. Median overall survival (OS) for patients with newly diagnosed glioblastoma ranges from 12–18 months when treated with the current standard of care [2, 3], and fewer than 10% of patients survive beyond 5 years after diagnosis [4]. The poor prognosis associated with glioblastoma underscores the ongoing need for the development and characterization of new therapeutic regimens.

Important measures of treatment efficacy in phase 2 and 3 trials of patients with glioblastoma include OS, radiographic response, and duration of tumor control (ie, progression-free survival [PFS]) [5, 6]. Although OS is considered the gold standard clinical end point, it does not directly measure the impact of a specific regimen because of the confounding effects of salvage therapy and other variables. As a consequence, both radiographic response rate (RR) and PFS are valuable end points when attempting to isolate the relative efficacy of a given treatment and to understand the nature of on-study progression [5, 7]. These surrogate measures of tumor burden, however, have well-documented limitations, including the potential for variability, the potential for false-positive signals, and the discordance in radiograph interpretation between observers [8]. As a result, methodologies and techniques that are used to determine tumor response and progression continue to evolve, with the goal of minimizing inherent errors and enhancing accuracy. Neuro-oncologists have also included additional measures (eg, neurologic examinations, use of steroids) in response assessments to strengthen their value. The continued refinement of response assessments is particularly important in the context of an increasing number of agents that are evaluated in patients with glioblastoma.

Historical Means of Response Assessment in Gliomas and the Macdonald Criteria

Prior to 1990, the primary means of assessing response to therapy in patients with glioblastoma were the Levin criteria and World Health Organization (WHO) oncology response criteria [9, 10]. The Levin criteria involve qualitative imaging assessments that account for a number of factors, including edema and mass effect [9]. The WHO response criteria use contrast-enhanced computed tomography to measure tumor area by multiplying the maximal cross-sectional enhancing tumor diameters [10]. These criteria were useful but limited in clinical practice by the subjective variability of interpreting radiographs and poorly defined response designations, and from subsequent observations that contrast enhancement can be affected by factors unrelated to the tumor [5].

The Macdonald criteria were subsequently proposed and adopted to improve on previous assessment methodologies, specifically by standardizing definitions of radiographic response. Because contrast enhancement may be nonspecific or indeterminate in nature, additional measures were included to account for the impact of corticosteroid use and to ensure that clinical assessments—specifically, neurologic status—were considered when applying response designations. The Macdonald criteria classify responses into 4 categories; complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD), borrowing terminology familiar to solid-tumor oncology [11] (Table 1). As the most frequently used means of response assessment in glioblastoma, these more objective criteria facilitated comparison of RR and PFS across clinical trials [12••].

Table 1 Response assessment criteria in the first-line treatment of glioblastoma

The Response Evaluation Criteria in Solid Tumors (RECIST) guidelines for solid tumors (including gliomas), used unidimensional measurements of target lesions but were not specific for brain tumors and did not take the use of steroids or clinical status into account [13, 14]. However, comparisons of unidimensional (diameter), bidimensional (area), and volumetric measurements of T1 gadolinium-enhanced and T2-weighted MRI scans in a small series of high-grade gliomas that were not treated with antiangiogenic agents have generally shown good agreement between the various methods of tumor assessments [15, 16]. A retrospective analysis also demonstrated strong concordance between different methods of assessment, including RECIST guidelines and the Macdonald criteria, when measuring response to and progression on irinotecan and bevacizumab combination therapy in relapsed glioblastoma [17•].

Modern Challenges to Response Assessment

Over the ensuing 2 decades since the publication of the Macdonald criteria, improvements in imaging technology, the impact of new therapeutic agents on underlying disease, and other factors have prompted the need for corresponding advances in response assessment of glioblastoma [12••, 18, 19]. Most apparent has been the need to broaden the scope of the Macdonald criteria to incorporate non-contrast-enhancing components of tumor, account for the variability in contrast-enhancing lesions, and define the timeframe of radiographic scans.

The evaluation of non–contrast-enhancing lesions using T2-weighted or fluid-attenuated inversion recovery (FLAIR) MRI sequences is critical for measuring infiltrative or diffuse tumor growth. Pathologically, glioblastoma includes both infiltrative and vascularized components [20, 21]. Moreover, diffuse radiographic patterns of disease are apparent in glioblastoma, even at diagnosis, although local patterns of disease and recurrence predominate [20]. Finally, infiltrative tumor growth, potentially occurring via cooption of preexisting vasculature [22, 23], is a radiologic and histopathologic finding after treatment with agents that inhibit vascular endothelial growth factor (VEGF-A) [20, 2426]. Consequently, nonenhancing regions of the tumor, which are not included in response designations per Macdonald criteria, should be considered to accurately assess the extent of biologically and clinically relevant glioblastoma [27]. That said, variations in T2-weighted or FLAIR images have proven difficult to quantitatively assess, and no standardized threshold for determination of PD has been established.

Another challenge with response criteria is related to assessment variability of contrast-enhancing tumor. On the basis of their mechanisms of action, agents that target VEGF-A or its receptor (ie, bevacizumab, aflibercept, and cediranib) affect vascular permeability [26, 28, 29], which, in turn, influences both the leakage of gadolinium into the brain and the extent of contrast enhancement on imaging. Furthermore, many tumor-extrinsic factors also modulate vascular permeability [3034], and the effects of radiation (or chemoradiation) are of particular relevance in the setting of newly diagnosed glioblastoma. Increased contrast enhancement or edema that occurs during or after treatment can mimic early tumor progression on MRI scans [35•, 36]. This increase, termed pseudoprogression, has been observed in 10%–30% of all patients with glioblastoma who are undergoing their first post-therapy MRI examination following radiotherapy with concurrent temozolomide [37, 38] and in 30%–48% of patients who exhibit progression within 1 month of the end of radiotherapy [39, 40]. The finding appears to be most common in patients with tumors that have a methylated MGMT promoter [37]. The observed increase in contrast enhancement often subsides without modifying therapy, but it does complicate the interpretation of results in clinical trials and in general practice because pseudoprogression may prompt the discontinuation of otherwise effective adjuvant therapy and/or the improper assignment of a PFS event [5]. Clinical researchers now frequently consider pseudoprogression when designing clinical trials to ensure patients enrolled in trials in recurrent settings are real progressors, but the effect of pseudoprogression on response criteria and PFS in the frontline setting has only recently begun to be addressed.

Finally, specifications on the timing of baseline and follow-up radiographic scans are needed because of their relationship to interpretation. Imaging at a number of time points may be informative, including immediate post-treatment and mid-radiotherapy scans, but it may not be appropriate for making clinical decisions. Many studies require a new baseline scan at study entry, and, although of critical importance, such scans may be difficult to measure and interpret because of differences between the 2-to-4–day postoperative and preradiotherapy scans, the variability in the 2-to-6–week postoperative interval related to surgery and normal healing, the prevalent use of corticosteroids, and the variability of contrast associated with perioperative ischemia, which evolves over time [19, 27, 41]. Furthermore, very early progression in the time between surgery and the start of radiotherapy has been described, which may affect further evaluation.

Ultimately, it is essential to have an appropriate baseline assessment and a consistent approach to pseudoprogression in newly diagnosed glioblastoma to accurately determine the time of tumor progression with regard to time of clinical trial enrollment and/or start of drug therapy.

Modified Response Criteria in Relapsed Glioblastoma

In recognition of the mechanism of antiangiogenic therapies and the potential challenges with response assessment [39, 42, 43], pivotal studies of bevacizumab, a humanized monoclonal antibody targeting VEGF-A, used modified criteria in the setting of recurrent disease. The BRAIN study, initiated in 2006, was the first randomized, multicenter phase 2 trial to evaluate the effectiveness and safety of bevacizumab with or without irinotecan in patients with relapsed glioblastoma [44]. In BRAIN, response assessment of contrast-enhancing lesions was based on WHO radiographic criteria, with an additional requirement of stable or decreasing doses of corticosteroids for determination of a response. A confirmatory MRI scan was performed 4 or more weeks after an observed response. While non-contrast-enhancing lesions were considered nontarget lesions for tumor evaluation, any new area of nonenhancing T2 or FLAIR signal consistent with tumor was considered progressive disease. Furthermore, clinical progression could be used as an indicator of progressive tumor in the absence of radiographic documentation [44].

In a subsequent phase 2 trial of single-agent bevacizumab in recurrent glioblastoma, MRI scans were evaluated according to both modified Macdonald and Levin criteria [45]. The study required that a scoring of SD or PR be accompanied by stable or decreasing doses of corticosteroids, as well as by stable or improved areas of T2/FLAIR abnormality. RRs of 35% and 71% were reported according to the modified Macdonald and Levin criteria, respectively. The authors suggested that the Levin criteria were a superior method for assessing VEGF inhibition in situ because they allow for early decreases in enhancement, edema, and mass effect rather than for reductions in the diameter of the enhancing tumor. Moreover, these early decreases assessed with the Levin criteria were better correlated to PFS than response evaluated with Macdonald criteria, although this observation is limited given that it was the result of a small single-center analysis [45].

Modified Response Criteria in Newly Diagnosed Glioblastoma: The Phase 3 AVAglio Study

The initiation of several trials evaluating bevacizumab in patients with newly diagnosed glioblastoma, specifically the phase 3 AVAglio study, prompted further modifications of assessment criteria in an attempt to improve accuracy and reproducibility. In the AVAglio study (BO21990, NCT00943826), one modification was to standardize all aspects of radiological response assessment, including image acquisition, timing of scheduled assessments, evaluation techniques, and centralized review. The AVAglio study, which began enrollment in 2009, is a randomized, placebo-controlled, phase 3 trial investigating the effectiveness and safety of radiotherapy and temozolomide with or without bevacizumab following surgical resection or biopsy in newly diagnosed glioblastoma [46]. The coprimary end points of the study are OS and investigator-assessed PFS; secondary end points are independent review facility (IRF)-assessed PFS, 1- and 2-year survival rates, safety, and health-related quality of life. Objective RR is being evaluated as an exploratory end point.

The AVAglio protocol incorporated a number of adaptations to the Macdonald criteria, including the expansion of the radiographic element to encompass assessment of nonenhancing tumor and more specific definitions of both neurologic examinations and changes in corticosteroid use that impact response assessment (Table 1). Neurologic examination and the mini-mental state examination (MMSE) are used to assess the patient’s neurologic function relative to the last disease assessment, with neurologic function reported as improved, unchanged, or worsened. Similarly, corticosteroid use is reported as increased, unchanged, or decreased at each evaluation. All radiologic assessments are made using MRI, and index (contrast-enhancing) and nonindex (both small contrast—enhancing and nonenhancing) lesions selected at baseline are assessed consistently at each subsequent time point. Timing of radiologic assessments and treatment administration is shown in Fig. 1.

Fig. 1
figure 1

Overview of treatment and radiologic assessment schedule in AVAglio. Disease assessment consists of radiologic assessment, neurologic examination, including the mini-mental state examination, and determination of corticosteroid use. Cy cycle, d days, HRQoL health-related quality of life, PD progressive disease; w weeks

According to the radiographic aspects of imaging criteria in the AVAglio protocol, index lesions are defined as all measurable lesions (ie, contrast-enhancing lesions with clear borders and having both diameters ≥10 mm) identified at baseline. Investigators are instructed to select index lesions based on their size (ie, lesions with the longest cross-sectional diameters) and their suitability for accurate, repeat measurement. Radiologic assessments for index lesions are categorized as (1) CR, which is defined as the disappearance of all index lesions that is sustained for ≥4 weeks; (2) PR, which is defined as a ≥50% decrease of all index lesions (sum of the product of the greatest diameters) that is sustained for ≥4 weeks; (3) SD, which is defined as no sufficient decrease or increase of index lesions to qualify as a PR or PD; and (4) PD, which is defined as a ≥25% increase of all index lesions or any new index lesion relative to baseline (Table 1).

Nonindex lesions include contrast-enhancing lesions that are too small or irregular in shape to be considered measurable, as well as all nonenhancing lesions consistent with tumor. In AVAglio, nonindex lesions are evaluated qualitatively for evidence of progression, and individual nonindex lesions are recorded as being present, absent, or unable to assess at each time point. Radiologic findings for nonindex lesions are categorized as (1) CR, which is defined as the disappearance of all nonindex lesions that are sustained for ≥4 weeks; (2) SD, which is defined as showing no significant change in nonindex lesions; and (3) PD, which is defined as any unequivocal increase of existing nonindex lesions or any new nonindex lesion (Fig. 2).

Fig. 2
figure 2

Illustration of a disease assessment of progressive disease based on non–contrast-enhancing lesions in a 35 year old male with left frontal glioblastoma. T1 contrast magnetic resonance imaging (MRI) obtained immediately after completion of chemoradiation and bevacizumab (a); 6 months later and continuing chemotherapy and bevacizumab (b); and 12 months after completion of chemoradiation and bevacizumab while continuing chemotherapy and bevacizumab (c); show no evidence of contrast-enhancing tumor progression. T2 MRI obtained immediately after completion of chemoradiation and bevacizumab (d); and 6 months later while continuing chemotherapy and bevacizumab (e); show no tumor progression, but MRI obtained 12 months after completion of chemoradiation and bevacizumab while continuing chemotherapy and bevacizumab (f); shows clear evidence of T2 non–contrast-enhancing tumor progression with architectural distortion of the left lateral ventricle (arrow)

A categorization of unable to assess is used for situations in which both index and nonindex lesions cannot be reliably measured for technical reasons (eg, radiograph is not comparable to baseline or is of poor quality), but not for situations of possible doubt of interpretation.

Patients who have undergone gross total resection and have neither contrast-enhancing (index lesions) nor nonenhancing lesions (nonindex lesions) on baseline MRI are followed for recurrence. If no signs of progression are observed, according to MRI, the radiologic assessment is categorized as no change. The appearance of any index or nonindex lesion consistent with tumor is categorized as PD.

To summarize, by incorporating a T2/FLAIR imaging component and qualitative assessment of all nonindex lesions, the criteria set forth in AVAglio attempt to account for changes in non–contrast-enhancing lesions and difficult-to-measure residual disease.

Overall Implementation of Response Criteria in AVAglio

Importantly, the AVAglio protocol and imaging guidelines serve to make the modified Macdonald criteria operational in very specific terms. Information on neurologic status, corticosteroid use, and radiologic assessment is interpreted in an attempt to arrive at a dichotomous overall disease assessment of PD or non-PD, allowing the investigator to readily assess atypical cases not normally covered under standard response criteria (Table 2). As a result, patients are not penalized with premature cessation of treatment because of a too-strict application of response criteria.

Table 2 Definitions of progressive disease, pseudoprogression, and nonprogressive disease in AVAglio

All MRI scans used by the investigator for the radiologic evaluation of overall tumor response in participating patients are centrally reviewed by an IRF, facilitating an unbiased analysis of PD. Thus, under strict and explicit guidelines, both investigators and IRF reviewers are able to follow the integrated response algorithm based on the assessment of both index and nonindex lesions, with an objective of reducing the variability observed thus far in radiologic response designation (Table 3).

Table 3 Integrated response assessment algorithm in AVAglio

Assessment of Pseudoprogression in AVAglio

The AVAglio protocol also standardizes the assessment of possible pseudoprogression. The potential occurrence of pseudoprogression is evaluated at the first disease assessment after radiotherapy (ie, 4 weeks after radiotherapy; see Table 2). If the investigator observes a ≥25% increase in index lesions and/or unequivocal progression of existing nonindex lesions relative to the baseline disease assessment, then the recommended assessment is pseudoprogression, and treatment is continued under the provision that a confirmatory scan be performed 2 months later (ie, 12 weeks after radiotherapy) (Fig. 3). If the confirmatory scan shows further tumor progression as compared to the previous MRI, the assessment is PD, and the patient is discontinued from further treatment. In such cases, the date of the first disease assessment after radiotherapy is considered to be the date of PD. Conversely, if the confirmatory scan shows SD or PR as compared to the previous MRI, the initial assessment is confirmed to be pseudoprogression, and the patient is continued on treatment. In cases of confirmed pseudoprogression, the measurements of first MRI after radiotherapy are used as the new baseline (Fig. 4) but as a consequence, patients with confirmed pseudoprogression are excluded from the response analysis population.

Fig. 3
figure 3

Decision-making flow chart in patients with signs of pseudoprogression in AVAglio. PD progressive disease

Fig. 4
figure 4

Illustration of pseudoprogression in a 52-year-old patient with left parietal malignant glioma. T1 contrast-enhancing magnetic resonance imaging (MRI) shows a gross total resection (a). Four weeks after completion of chemoradiation, MRI showed a new area of contrast enhancement surrounding the collapsed surgical cavity (b). Two months after initiation of maintenance temozolomide, MRI showed diminution of the contrast enhancing area (c), which was further diminished after an additional months (d)

Patterns of Progression

Additional to the assessment of response, an exploratory end point in AVAglio is the assessment of patterns of tumor progression. Progression patterns are categorized as local (focus of enhancement or non-enhancing tumor with mostly distinct or well-defined borders), multifocal (>1 enhancing or nonenhancing tumor with intervening areas of normal brain signal), or distant disease (a single new focus of enhancement or nonenhancing tumor centered outside a 30 mm margin around the primary site or margin of the resection cavity). Once the tumor pattern was determined (local, multifocal, distant), the diffuse (infiltrative) pattern was reported as either present or not present. Scans are categorized for tumor pattern at each disease assessment, and an additional MRI is performed 9 weeks after the determination of PD for all patients to further assess this radiographic end point.

The Response Assessment in Neuro-Oncology (RANO) Working Group Criteria

After the initiation of the AVAglio study, the RANO Working Group published updated response criteria to implement a consistent approach for future trials in high-grade glioma [12••]. Because the RANO criteria were only recently published, they have yet to be validated in a large clinical trial. It is notable that the majority of the steps taken by the AVAglio study to improve assessment methodology are also addressed in the RANO criteria (Table 1). Both criteria consider (without measuring) nonenhancing lesions via T2/FLAIR imaging in assessing response, and both also contain specific guidelines to aid reviewers in distinguishing pseudoprogression from true PD [12••].

There are, however, several nuances of the AVAglio criteria and the proposed RANO criteria that warrant further discussion. In the RANO guidelines, patients must be “stable or improved clinically” to qualify for a CR, PR, or SD designation, but the RANO guidelines do not include a precise recommendation on how clinical deterioration is to be measured, although it is suggested that KPS, Eastern Cooperative Oncology Group performance status, or WHO performance score may be considered when making clinical evaluations [12••]. By contrast, when AVAglio specifies that patients exhibiting a response or SD must also have stable or improved neurologic symptoms, specific instructions on how neurologic status is to be evaluated are included in the protocol; namely, by regularly scheduled neurologic examinations and MMSEs, which take place concomitantly with radiologic assessments.

Pseudoprogression is also handled differently in the guidelines proposed by RANO than in those by AVAglio. The RANO criteria suggest that within the first 12 weeks after the completion of radiotherapy, “progression can only be determined if the majority of new enhancement is outside of the radiation field or if there is pathologic confirmation of progressive disease” [12••]. In AVAglio, a prespecified follow-up scan 12 weeks after radiotherapy is required upon an initial observation of increased contrast-enhancement to confirm or rule out pseudoprogression. Although this measure precludes the conduct of restaging scans at an earlier time point, it serves to standardize the review process by integrating pseudoprogression into the response assessment algorithm.

A more recent publication from the RANO Working Group has outlined a separate but related set of guidelines for response assessment criteria in postoperative cases of newly diagnosed glioblastoma [47]. These guidelines apply to surgically delivered therapies (eg, brachytherapy, implanted chemotherapy wafers), radiosurgery, and immunotherapy, which are not specifically addressed in the AVAglio study protocol since these treatments were not planned in AVAglio.

Conclusions

In the 20 years since its initial publication, the Macdonald criteria have performed admirably and supported many crucial advances in the treatment of patients with glioblastoma and other high-grade gliomas. Most importantly, the criteria helped to standardize the interpretation of data and allowed for better cross-trial comparison of certain results [12••]. However, advances in imaging capabilities coupled with newly available therapies, which affect contrast enhancement, have been the driving forces behind the development of updated criteria capable of improving response assessments in modern clinical trials for glioblastoma.

The application of MRI and other imaging modalities has facilitated the use of imaging end points, such as PFS, as surrogates for OS in clinical trials [18, 48]. Current imaging techniques allow for more accurate measurement of contrast-enhancing lesions, while incorporating information on nonenhancing lesions into overall response assessments. Despite having strong concordance, criteria that integrate FLAIR appear to reduce response rates and PFS relative to criteria that only consider contrast enhancement in relapsed disease [17•]. Given the relatively short survival time of patients with glioblastoma, accurate response measurements are needed to ensure the continuation of effective therapy and, conversely, the timely termination (or modification) of ineffective therapy in nonresponding patients [3, 27].

Optimally, the PFS end point should be reinforced by other measures of therapeutic response to reflect an overall clinical benefit. Performance status is one such measure, in which benefit is evaluated on the basis of an improvement in or the maintenance of the patient’s status. Although clear guidelines have not been established, the duration of functional independence (KPS >70) could be considered. Second, neurocognitive status might appear as an important component in defining overall clinical benefit. The AVAglio study used neurological examination and MMSE as global assessments of neurocognitive function. While the MMSE is a well-validated screening tool for assessing dementia and cognitive decline [49], use of more adaptive neurocognitive assessment tools may provide additional sensitivity. However, such tools may be difficult to implement in a large, multicenter randomized trial setting. Corticosteroid use can also serve as an objective measure, but different criteria employ different methodologies to evaluate corticosteroid dosing during response assessment (ie, which interval to use for analyses and what constitutes an increase or decrease in dosing). Lastly, quality-of-life measures that consider patients and caregivers may be taken into account to supplement or reinforce radiologic results [50].

The AVAglio study protocol proactively adapted the existing Macdonald criteria to address specific limitations and, subsequently, provided a more precise and operational assessment of response and progression than permitted by the previous criteria. These criteria were developed and initiated prior to those proposed by the RANO Working Group. Although the 2 sets of criteria are very similar with respect to their major advances (ie, addressing pseudoprogression and the integration of a T2/FLAIR component), the criteria set forth in AVAglio include operational elements whose incorporation avoids some of the ambiguity found in the RANO criteria and guidelines. This allows for more consistent and rigorous application of the AVaglio criteria, which is especially important in a multicenter trial. Results of the AVAglio study may serve to validate and critically assess the modifications to assessment contributed by study investigators.

A potential complication of the AVAglio algorithm related to the observation of pseudoprogression is the issue of rebaselining. Since the first MRI following treatment initiation is used as the new baseline scan in cases of confirmed pseudoprogression, it is necessary to exclude such patients from response analyses. Indeed, the increase in the sum of the product of the greatest tumor diameters associated with pseudoprogression in the new baseline scan could lead to an artificial assessment of PR during subsequent assessments. Therefore, adaptations may be necessary to prevent a decline in the population eligible for response analysis, particularly in studies in which the sample size is smaller than in AVAglio.

Further evolution of both the AVAglio and RANO response criteria is expected, and future refinements may improve measurement techniques, allow for the ability to assess nonenhancing lesions more objectively, or allow for the categorization of delayed (continuing beyond 12 weeks after radiotherapy) or late (starting beyond the 12-week time window) pseudoprogression. Future modifications may also include some measure of multimodal imaging (ie, perfusion, diffusion, single-photon emission computed tomography, or positron emission tomography) on a case-by-case basis to aid investigators in determining response in certain situations.