Evaluation of the Implementation of the Response Assessment in Neuro-Oncology Criteria in the HERBY Trial of Pediatric Patients with Newly Diagnosed High-Grade Gliomas

BACKGROUND AND PURPOSE: HERBY was a Phase II multicenter trial setup to establish the efficacy and safety of adding bevacizumab to radiation therapy and temozolomide in pediatric patients with newly diagnosed non–brain stem high-grade gliomas. This study evaluates the implementation of the radiologic aspects of HERBY. MATERIALS AND METHODS: We analyzed multimodal imaging compliance rates and scan quality for participating sites, adjudication rates and reading times for the central review process, the influence of different Response Assessment in Neuro-Oncology criteria in the final response, the incidence of pseudoprogression, and the benefit of incorporating multimodal imaging into the decision process. RESULTS: Multimodal imaging compliance rates were the following: diffusion, 82%; perfusion, 60%; and spectroscopy, 48%. Neuroradiologists' responses differed for 50% of scans, requiring adjudication, with a total average reading time per patient of approximately 3 hours. Pseudoprogression occurred in 10/116 (9%) cases, 8 in the radiation therapy/temozolomide arm and 2 in the bevacizumab arm (P < .01). Increased target enhancing lesion diameter was a reason for progression in 8/86 cases (9.3%) but never the only radiologic or clinical reason. Event-free survival was predicted earlier in 5/86 (5.8%) patients by multimodal imaging (diffusion, n = 4; perfusion, n = 1). CONCLUSIONS: The addition of multimodal imaging to the response criteria modified the assessment in a small number of cases, determining progression earlier than structural imaging alone. Increased target lesion diameter, accounting for a large proportion of reading time, was never the only reason to designate disease progression.

T he recent Avastin in Glioblastoma (AVAglio) clinical trial investigated the use of bevacizumab (BEV) plus radiation therapy (RT) and temozolomide (TMZ) compared with a placebo plus RT-TMZ in adult patients with newly diagnosed glioblastoma. 1 This was subsequently investigated in a pediatric patient population. The Study of Bevacizumab (Avastin) in Combination with Temozolomide and Radiotherapy in Paediatric and Adolescent Participants with High-Grade Glioma (HERBY) (BO25041; clinicaltrials.gov NCT01390948) was a Phase II, open-label, randomized, multicenter, comparator study set up to establish the efficacy and safety of the addition of BEV to RT and TMZ in patients between 3 and 18 years of age with newly diagnosed non-brain stem high-grade glioma (HGG). 2 The radiologic aspects of the HERBY trial were expanded compared with AVAglio in a number of aspects. In HERBY, the determination of progression, recurrence, or response was mandated on the basis of meeting predefined clinical and radiographic criteria, as defined by the Response Assessment in Neuro-Oncology (RANO) criteria, 3,4 assessed by a central site-independent radiology review. In addition, changes in the tumor on MR diffusion and perfusion imaging were evaluated and correlated with the structural imaging review.
In addition to the conventional MR imaging (T1WI, contrastenhanced T1WI, and T2WI/FLAIR sequences) required for RANO, the optional acquisition of MR diffusion imaging, perfusion imaging, and proton spectroscopy was requested, to allow analysis of potential additional impact on the efficacy outcome measures of the trial.
To implement the radiologic assessment for HERBY, the trial steering group instigated the following:

Central Radiologic Review Committee
A Centralized Radiologic Review Committee (CRRC) was formed to oversee and advise on the MR imaging acquisition and analysis aspects of the HERBY trial. It consisted of a number of international expert pediatric neuroradiologists, an imaging physicist, and sponsor representatives.

MR Imaging Acquisition
MR imaging was requested following a standardized protocol (which can be accessed in the Supplementary Materials Section of Jaspan et al 5 ). Images were provided by the sites to the contract research organization of the trial, ICON Medical Imaging, and all radiologic reviews were conducted using their MIRA platform (ICON Medical Imaging, Dublin, Ireland).
For each patient, a baseline postoperative MR imaging scan was acquired no later than 72 hours following the operation, in addition to a first scan before the start of treatment and then subsequent scans every 3 months for 3 years after randomization or the unscheduled end of study due to an event-free survival (EFS) event. The imaging schedule is further detailed in Table 1 and Fig 1. Where available, preoperative imaging was requested.
Each participating imaging center was issued with both structural and multimodal (MM) imaging manuals to instruct them in protocol-specific image acquisition requirements, necessary documentation and data transfer instructions, data archiving and shipping, and the query resolution process for any clerical discrepancies and/or noncompliant data.

Central Radiology Reviews
Three image-review processes were implemented, performed by a selection of the 5 expert pediatric neuroradiologists on the CRRC: these were eligibility reviews, early progression reviews, and ret-rospective central efficacy radiology reviews. Consensus training was undertaken by the neuroradiologists in advance.
Eligibility Review. This optional review could be requested by the local site, for 1 of the neuroradiologists on the CRRC to assess whether the postoperative MR imaging showed findings consistent with newly diagnosed localized HGG but excluding gliomatosis cerebri (or multifocal HGG). The postoperative scans must not have shown evidence of substantial surgically related intracranial bleeding. Patients may have had either measurable or assessable-but-nonmeasurable disease and would still qualify for enrollment.
Early Progression Review. To aid the local site in its assessment of early tumor-related enhancement on the first postcontrast MR imaging following commencement of treatment, a pathway to seek advice from 1 of the neuroradiologists on the CRRC was implemented, to advise on identification of early tumor progression compared with early treatment effects (pseudoprogression). This optional review could be sought for any neuroimaging acquired between (and including) the first postoperative scan and FIG 1. Treatment and imaging schedule for HERBY. Four weeks after the operation, patients are randomized (R) and chemoradiotherapy commences for 6 weeks followed by a 4-week break. Multiple cycles of adjuvant treatment are then initiated, indicated by C1 through C12, with 4 weeks per cycle. Blue arrows indicate radiation therapy; purple blocks, temozolomide, and red arrows, bevacizumab treatment. those acquired up to the end of the 12-week period following completion of the first cycle of treatment (ie, up to 23 weeks after commencing treatment) (Fig 1). The opinion was provided within 2 weeks of receiving the request and was nonbinding; the site investigator determined whether treatment should continue if the local opinion differed from the advice given.
Central Efficacy Radiology Review. Pairs of the expert pediatric neuroradiologists on the CRRC were randomly assigned to cases to assess the structural MR imaging, in parallel but separately, according to the RANO criteria (Fig 2). When there were discrepant radiologic findings, a third pediatric neuroradiologist from the CRRC adjudicated. Following image review, an independent pediatric oncologist reviewed supportive clinical data and corticosteroid dosage and provided the final status for that time point. The structural MR imaging review produced a definitive response level per time point for each patient and determined the earliest occurrence of tumor progression or recurrence in support of the primary end point of the trial of event-free survival. On completing the structural MR imaging review, the same central reviewers performed an additional evaluation combining diffusion and perfusion MR imaging findings (when available) with the structural assessment to determine whether they would alter the structural MR imaging review response. The incorporation of the multimodal imaging into the RANO assessment followed that proposed previously, both with respect to the evaluation of the multimodal data (Supplementary Material in Jaspan et al, 2016, 5 ) and its incorporation into the overall response decision (Fig 3). This article evaluates the implementation of the radiologic aspects of HERBY, summarized above. To assess these, the aims of this work were as follows: 1) To evaluate the implementation of the RANO criteria in a Phase II trial of pediatric patients with HGG 2) To assess the feasibility of obtaining multimodal imaging of adequate quality from multiple sites 3) To assess the effect of including diffusion and perfusion imaging into the response criteria. 5

MATERIALS AND METHODS
The aims of this work were addressed by evaluating the following:

Compliance Rates and Data Quality
A pretrial assessment indicated the type of data expected from each participating site because multimodal MR imaging was not available at all sites. This was then compared with the actual collected data available for central review to determine compliance rates. Data quality was evaluated by the CRRC, with scans labeled as optimal, readable but not optimal, or not readable.

Early Progression Review
The number of requests for early progression reviews by a neuroradiologist from the CRRC was assessed, as an indication of the value of this process. In addition, of these requests, the proportion in which the opinion of the central read was adopted by the local site was ascertained.

Adjudication Rates
In cases in which there were discrepancies between reviewers 1 and 2 undertaking a structural MR imaging review, an additional neuroradiologist (reviewer 3, blinded to reviewer identities) adjudicated by selecting the preferred opinion, establishing the final CRRC decision for that patient. Adjudication rates for this trial were defined as the percentage of cases in which the date of progression or recurrence differed between reviewers 1 and 2.

Reading Times
Average reading times per scan were estimated post-trial by the neuroradiologists involved and were multiplied by the number of scans per patient and the number of reviewers.

Breakdown of RANO Decision Components
The proportion of cases in which enhancing or nonenhancing tumor (ie, measurable or nonmeasurable) was the factor that determined progression was calculated to determine which imaging sequence was most influential in the RANO criteria and how it compared with clinical findings.

RANO and Multimodal Imaging
The number of times that radiologic evaluation of perfusion or diffusion data changed the EFS time point (ie, the EFS incorporating diffusion and/or perfusion) was calculated to determine the potential influence of adding these imaging findings to the RANO criteria. In addition, a subjective score of the multimodal imaging influence in each case was recorded by each reader on a scale of 1 to 5, with the highest score indicating the most influence in the decision.

Central Radiology Review Committee versus Local Investigator
Discrepancies in the EFS determined by the CRRC and local investigators were calculated.

Pseudoresponse and Pseudoprogression
Within the first 12 weeks after completion of radiation therapy, evidence of progression was designated as "pseudoprogression." This was later revised after assessment of the subsequent scan by the CRRC according to the criteria of Chinot et al 4 (On-line Table  1). The CRRC would then assign patient status as stable disease, confirmed pseudoprogression, or true progressive disease.

Statistical Analysis
EFS distributions were compared with the related-samples Wilcoxon signed rank test, using SPSS Statistics for Windows, Version 23 (IBM, Armonk, New York). A 2 test of independence was performed to examine the relation between the treatment arm and both adjudication rates and pseudoprogression. Values of P Ͻ .01 were considered statistically significant.

RESULTS
Between October 2011 and February 2015, one hundred seventyfour patients were screened, and 121 were randomized to receive treatment (RT/TMZ, n ϭ 59; BEV ϩ RT/TMZ, n ϭ 62). Of these 174, three were children younger than 3 years of age with recurrent HGGs who were recruited at the request of the European Medicines Agency but were not included in this analysis. One patient was excluded following identification of metastatic disease in the spine, with a second exclusion due to gliomatosis. All 121 patients underwent an operation (total/near-total resection, n ϭ 60; other resection, n ϭ 39; biopsy, n ϭ 22). Five randomized patients did not receive treatment (RT/TMZ: withdrew consent, n ϭ 3; BEV ϩ RT/TMZ: failed to meet eligibility criteria, n ϭ 1; withdrew consent, n ϭ 1). Overall, 116 patients (RT/TMZ, n ϭ 56; BEV ϩ RT/TMZ, n ϭ 60) received study treatment at 50 sites. For a more detailed description of the trial see Grill et al. 2 Preoperative imaging, though not part of the initial HERBY protocol and performed according to the site standard of care, was available in 91/116 (78%) patients (MR imaging, n ϭ 89; CT, n ϭ 2). Postoperatively, there were 623 centrally reviewed MR imaging scans, with an average of 4.9 scan time points acquired per patient during the trial (range, 1-15).
A total of 50/85 sites (59%) successfully recruited patients and acquired imaging data (5 sites withdrew and 4 sites subsequently joined the study after the survey). The compliance rates for structural and MM imaging for these sites can be seen in On-line Table  2. All 30 sites that had committed to return diffusion imaging did return it at least once for each patient (100%); of 28 that had committed to return perfusion imaging 23 did (82%), and of 31 that had committed to return spectroscopy 22 did (71%). For all sites in the trial that offered to provide MM imaging, the percentage of scans actually acquired was 82% for diffusion, 60% for perfusion, and 48% for spectroscopy.
In terms of data quality, from the structural data available for central review, there were 10/623 (1.6%) cases for which both reviewers thought they could not provide a response on the basis of the quality of data available (unreadable scans). From the MM imaging data available for central review, there were 95 (22.5%) time points at which there were nonevaluable scans (55 diffusion and 89 perfusion). There were 20/116 cases in which scanners with different magnetic field strengths were used to scan the same patient. Of these, the change occurred in 5 cases between pre-and postoperative scans (4%), in 7 cases when the patient did not progress during the trial (6%), and in 8 cases (7%) when the scanner change occurred between the time points immediately before and at progression.

Early Progression Review
In 19 patients (23 scans), advice was sought from the CRRC regarding imaging performed either at week 10 (52%) or at the end of cycle 3 (48%). There were 5/11 patients who had an imaging event suggestive of progression documented on the week 10 scan, but for whom the local investigator decided to continue with treatment. Following cycle 3 scan reviews, treatment for 8/11 pa-tients was discontinued, including in all those for whom an imaging event had been documented at week 10.

Adjudication Rates
On an individual scan basis, of 613 structural imaging assessments, 304 (49.6%) were not adjudicated, while 309 (50.4%) required adjudication due to a divergent response of a neuroradiologist. On a per-patient basis, adjudication corresponding to the primary trial end point (ie, the number of all cases that were adjudicated for the date of progression or recurrence) was undertaken in 17/116 patients (14.7%).

Reading Times
The average time to read a scan was estimated at 15 minutes. With 2 reviewers and an adjudication rate of 50% and an average of 4.9 scans per patient in the trial, the average RANO reading time per patient was just Ͼ3 hours.

Breakdown of RANO Decision Components
Of a total of 86 cases of progression or recurrence, clinical reasons were reported in 41 cases (47.7%), and radiologic reasons, in 78 (90.7%). A breakdown of the different reasons for progression can be seen in Table 2.
The increase in diameter of the target-enhancing lesion was a reason for progression in 8 of 86 cases (9.3%). In these 8 cases, there were always other coexisting reasons for progression at the same visit, either clinical (neurologic deterioration with stable or increased corticosteroid use, n ϭ 4) or radiologic (unequivocal nontarget progression, n ϭ 3; new lesions, n ϭ 7).

RANO and Multimodal Imaging
The addition of diffusion/perfusion data changed the structural imaging EFS, with 1/89 (1.2%) patients for whom it occurred later and 5/89 (5.8%) for whom it occurred earlier. These differences in EFS were not significantly different according to a Wilcoxon signed rank test: structural imaging EFS median ϭ 300 days, diffusion/perfusion EFS median ϭ 288 days; Z ϭ 21.0, P ϭ .28. These 5 patients would have been classified as having an event, on average, 95 days earlier on the basis of inclusion of MM imaging, which mostly corresponded to the previous scan visit, though for 2 patients, it was evident from even earlier scans. Of these 5 pa-tients, there was only 1 case in which an earlier EFS would have been called on the basis of perfusion imaging alone. In the remaining cases, either an event was also apparent on diffusion images (n ϭ 4) and/or there was no perfusion imaging available (n ϭ 2).
The subjective influence of multimodal imaging is reported in Table 3. As perceived by the readers, for most time points that included multimodal scans (Ͼ65%), this additional evaluation had little-or-no influence on the reader's decision.

RANO Central Read versus Local Investigator
There were 34 patients of 86 (39.5%) for whom the EFS determined by local investigators was later than according to the EFS determined by the CRRC and 4 patients (4.7%) for whom it was earlier. A Wilcoxon signed rank test indicated that EFS determined by local investigators (median ϭ 322 days) was statistically significantly different from EFS determined by the CRRC (median ϭ 288 days, Z ϭ 943.0, P Ͻ .01).

Pseudoprogression
Initially, 33 cases (28.4%) were designated pseudoprogression at the week 10 scan. Of those, 19 were deemed at a later visit to have progressed or recurred and thus were retrospectively assigned as progressive/recurrent disease, while 14/116 (12.1%) were deemed to have been stable disease and therefore assigned as true pseudoprogression. A post hoc analysis indicated that leptomeningeal spread and evolution of distant lesions would further bring down the number of true pseudoprogression cases to 10/116 (8.6%). No confirmed cases of pseudoresponse were identified in this study cohort.

DISCUSSION
The HERBY study is one of the largest Phase II, open-label, randomized, international pediatric high-grade glioma trials that has been undertaken. The primary end point was to evaluate whether the addition of BEV to RT/TMZ would significantly increase the event-free survival (as determined by radiologic evaluation of the imaging by a panel of 5 experienced pediatric neuroradiologists who formed the CRRC) in children with newly diagnosed nonbrain stem high-grade gliomas. Structural imaging was analyzed by the CRRC using the now-established adult-based RANO criteria 3 and subsequently re-evaluated alongside multimodal imaging 5 with the prespecified aim to evaluate these criteria in the pediatric age group. While the quality of the structural imaging was generally high, the number of cases with consistently acquired multimodal imaging was relatively low and of variable quality, reflecting the reality of clinical practice in a wide range of centers Note:-SPD indicates sum of products of diameters. a New lesions may be enhancing or nonenhancing (T2WI/FLAIR). New enhancing lesions do not need to meet the size criteria for being considered measurable but must, in the best judgment of the reviewer, be true tumor lesions rather than benign or incidental findings. b At baseline, any radiologic evidence of disease beyond the designated target lesion may be identified as nontarget lesions. These include enhancing T1WI nonmeasurable and nonenhancing lesions on FLAIR/T2WI. investigating and managing children with these tumors. The number of cases with evaluable MR spectroscopy was too low to allow valid inclusion in this process. In addition, the European Medicines Agency only required that diffusion-weighted imaging and perfusion imaging be used for this study. The poor compliance rates for the multimodal imaging arm of the study reflected the overestimation of local site capability/commitment for acquiring these sequences (particularly the case for MR spectroscopy). Lack of preoperative imaging availability largely related to cases in which children were initially investigated in a nonspecialist hospital and subsequently transferred to the local primary treatment center without electronic transfer of the imaging to the clinical research organization of the study. The pretreatment imaging characteristics of the tumor provide important correlative information for the pathologic and, increasingly, molecular evaluation of the tumor type and in the future may help in individualizing treatment. Thus, inclusion of preoperative imaging should be considered a prerequisite for any future such studies.
Lack of adherence to the study imaging protocol was a concern for this trial, as has been the case for previous multicenter studies. Adoption, at a national level, of standardized imaging protocols that have been proposed in both North America 6 and Europe 7 and electronic dissemination of trial-specific scanner protocols using these agreed sequences offer the potential for ensuring consistent high-quality imaging, reducing the variability of response assessment and improving the validity and comparability of research in this field.
The logistics of undertaking independent analysis of imaging within a large treatment trial must be taken into consideration in the study design. After an initial training period, all 5 CRRC neuroradiologists undertook the study reads, working independently with a trial monitor. Each read required, on average, 15 minutes per time point. In the HERBY study, this involved a range of read times from 0.5 to 3.8 hours per patient, with an average of 1.3 hours per radiologist per patient. Read times for trials involving more imaging time points could be significantly higher. The early progression review was instituted to support local investigators who may value advice on imaging features. This was found to be a useful resource in 19 of 121 cases (16%) and could be considered in future study designs.
The EFS was assigned earlier by the CRRC than by local investigators on average by 1 month. This is shorter but similar to findings in adult trials (2-3 months). 8,9 Measured lesion diameters were never the unique reason for determining a radiologic event on their own and were always accompanied by unequivocal progression of either a nontarget lesion or the appearance of a new lesion. There were some cases (7%) in which the same patient was scanned with a different magnetic strength scanner near the point of progression. Although this could subtly affect the enhancement pattern, we believe the variability of these effects would be far less than the interobserver variability associated with drawing postoperative ROIs around poorly defined lesions and the related diameter measurements. In addition, even if there were subtle differences in radiologic interpretation due to changes in scanner magnetic field strength, because we found that change in lesion diameter was never the unique reason for assigning progression, the field strength change would not influence the assignment of progression. Lesion diameter measurements account for a large proportion of the reading time; however, we found that they did not influence the final response-assessment decision in this trial. Volumetric evaluation of tumor size was not part of the initial trial methodology but will be evaluated in subsequent imaging analysis of the HERBY cohort.
The addition of MM imaging to the response criteria only modified the assessment in a small number of cases, and in most of these cases, progression was determined earlier than by the assessment of structural imaging alone. The acquired rates of MM imaging were lower than indicated by the initial site survey responses. Diffusion was the most commonly acquired technique, which was also the technique that most influenced the modified response assessment. Unfortunately, due to the inconstant provision of diffusion and perfusion data, we cannot validate the pathway used in HERBY for incorporating diffusion and perfusion assessment into the final radiologic response. 5 However, the relatively low acquisition of diffusion and perfusion in practice across the contributing sites is, in itself, an important finding. In addition, in the cases in which multimodal imaging was provided, the observation that it modified the radiologic response of that imaging time point in only a small number of cases is also of note.
Imaging assessment of postsurgical and subsequent treatment surveillance in pediatric high-grade gliomas is challenging in view of the heterogeneous and poorly enhancing imaging characteristics of these tumors. This challenge can be further confounded by pseudoprogression (a local inflammatory reaction after RT and TMZ, resulting in increased enhancement in the early post-radiation therapy imaging, followed by radiologic improvement without therapy modification) and pseudoresponse. Interobserver variability in determining the date of progression can be as high as 40%-50%, 1 and a similar figure was observed in this study. No confirmed cases of pseudoresponse were identified in the HERBY study. Potential pseudoprogression, occurring within 12 weeks of initiation of treatment, was present in 33/116 cases. Of these 33 cases, 23 showed imaging features of continued tumor growth or development of distant lesions and were therefore subsequently re-assigned as progressive/recurrent disease. In the remaining 10/ 116 (8.6%) cases, follow-up imaging showed that the tumor had stabilized or regressed; therefore, the previous time point was maintained as pseudoprogression. This differs from the proportion of pseudoprogression generally reported in adult HGG studies: 31% 10 and 48%, 11 though the AVAglio study reported only 6%. 9 The greater biologic variation of these tumors in children, with a higher proportion of centrally located thalamic tumors and a higher proportion of poorly or nonenhancing tumors, may, in part, explain this contrast with adult HGG cases.
The protocol for HERBY was approved in 2011 and incorporated contemporary radiologic evaluation to assess the response, as reported here. Since then, a number of groups have suggested modifications and improvements to the RANO criteria [12][13][14][15] and its implementation in clinical trials. Reardon et al 16 provided guidance on incorporating imaging criteria into glioblastoma clinical trials and identifying a number of the issues also found in the HERBY trial, as well as highlighting the need to pay attention to early progression and early response, as implemented in HERBY. Ellingson et al 17 proposed modifications to the imaging protocol used for RANO, especially focusing on 3D MR imaging acquisitions and the volumetric parameters that can be calculated from them, as well as the promise of using subtractions maps of post-and precontrast imaging to increase lesion conspicuity. These are incorporated into more detailed proposed response assessments. While this article focuses on reporting results from the HERBY trial as performed, the newer proposed assessments can also be applied retrospectively to the HERBY imaging data, which may better characterize the imaging assessment. In addition to these modified and more quantitative metrics derivable from structural MR imaging, other quantitative metrics can be extracted from the multimodal imaging, 12,18-23 building on the qualitative radiologic assessment performed here, 5 to inform on their value in pediatric response assessment. This will form the basis for ongoing evaluation of data from the HERBY study.

CONCLUSIONS
This work evaluated the practical implementation of the use of RANO and RANO plus multimodal imaging to inform the end point of a large multinational trial of high-grade brain tumors in a pediatric cohort. Thus, the results reported provide an indication for future studies on practical issues. These include the following: • Appropriate radiologic resources needed to implement RANO (1.3 hours per radiologist per patient). • The expected compliance to the MR imaging protocol, which could be improved by incorporating radiology-specific site initiation to include evidence of imaging compliance in advance of opening the site. • Adjudication rates (of approximately 50%). • The variability of the RANO criteria when comparing retrospective central assessment with that performed locally (earlier event identification by an average of 1 month).
Of note, the finding that no assessment of progression or recurrence was dependent on the evaluation of the lesion diameters alone has implications in the use of this particular quantitative metric. In addition, the effect of implementing a new proposed pathway for incorporating multimodal imaging assessment into the structural RANO 5 criteria was assessed and was found to indicate earlier progression or recurrence (by an average of 95 days) in only 5/86 cases. These findings will inform the development of future radiology-focused response-assessment criteria in pediatric high-grade gliomas, in particular that the measurement of tumor diameters and compliance of diffusion or perfusion are not of primary importance, because we found that simple metrics derived from these made little difference in the determination of the time point of radiology-defined progression.