Quantitative Delta T1 (dT1) as a Replacement for Adjudicated Central Reader Analysis of Contrast-Enhancing Tumor Burden: A Subanalysis of the American College of Radiology Imaging Network 6677/Radiation Therapy Oncology Group 0625 Multicenter Brain Tumor Trial

BACKGROUND AND PURPOSE: Brain tumor clinical trials requiring solid tumor assessment typically rely on the 2D manual delineation of enhancing tumors by ≥2 expert readers, a time-consuming step with poor interreader agreement. As a solution, we developed quantitative dT1 maps for the delineation of enhancing lesions. This retrospective analysis compares dT1 with 2D manual delineation of enhancing tumors acquired at 2 time points during the post therapeutic surveillance period of the American College of Radiology Imaging Network 6677/Radiation Therapy Oncology Group 0625 (ACRIN 6677/RTOG 0625) clinical trial. MATERIALS AND METHODS: Patients enrolled in ACRIN 6677/RTOG 0625, a multicenter, randomized Phase II trial of bevacizumab in recurrent glioblastoma, underwent standard MR imaging before and after treatment initiation. For 123 patients from 23 institutions, both 2D manual delineation of enhancing tumors and dT1 datasets were evaluable at weeks 8 (n = 74) and 16 (n = 57). Using dT1, we assessed the radiologic response and progression at each time point. Percentage agreement with adjudicated 2D manual delineation of enhancing tumor reads and association between progression status and overall survival were determined. RESULTS: For identification of progression, dT1 and adjudicated 2D manual delineation of enhancing tumor reads were in perfect agreement at week 8, with 73.7% agreement at week 16. Both methods showed significant differences in overall survival at each time point. When nonprogressors were further divided into responders versus nonresponders/nonprogressors, the agreement decreased to 70.3% and 52.6%, yet dT1 showed a significant difference in overall survival at week 8 (P = .01), suggesting that dT1 may provide greater sensitivity for stratifying subpopulations. CONCLUSIONS: This study shows that dT1 can predict early progression comparable with the standard method but offers the potential for substantial time and cost savings for clinical trials.

glioblastoma volume are important for clinicians to assess treatment response and guide appropriate therapy, both in daily practice and in clinical trials. Contrast-enhanced MR imaging is the most widely used approach and the focus of recent consensus brain tumor imaging protocol recommendations. 1 However, although contrast-enhanced MR imaging has excellent spatial res-olution, even slight variations in image-acquisition parameters or vendor platforms can greatly impact image quality, lesion conspicuity, and measurement of tumor volume. 2 These problems are compounded by the fact that glioblastoma is histopathologically and radiographically heterogeneous in appearance, with geographically irregular margins, variable contrast enhancement, and regions of central necrosis or cystic changes. 3 Furthermore, assessment of posttreatment tumor volume can be confounded by the presence of blood products that appear bright on postcontrast MR imaging and that mimic contrast-enhancing tumor, 4 or in the context of therapies reducing blood-brain barrier permeability and contrast agent extravasation. For ex-ample, bevacizumab, 5 used to treat recurrent glioblastoma, can decrease contrast agent extravasation independent of its effect on tumor biology. 3,6 These challenges contribute to large interobserver differences (up to 50%-60%) in assessing tumor burden and evaluating treatment responses that impact both daily practice and clinical trials. 4,7 As a solution, difference maps, created from the subtraction of precontrast from postcontrast images, have been used to highlight regions of contrast enhancement. 8 However, unlike x-ray angiography or CT, pixel values in MR images can vary widely due to multiple factors, even for identical pulse sequences and tissue types; this variation can result in nonenhancing regions appearing in the subtraction image. In response, we developed quantitative dT1 images of contrast enhancement, which eliminate much of the normal variability in image contrast due to MR imaging system instabilities, field strength, slight differences in imaging parameters (TR, TE, and so forth), and sources of bright signal apparent on precontrast T1WI. 9,10 Because dT1 images are quantitative, delineation of enhancing lesions can be automated by applying the same predetermined threshold across time points and patients.
Consequently, the dT1 tool has the potential to cause a paradigm shift in how brain tumor burden is assessed. This study compares the use of dT1 technology for semiautomatic lesion identification with the accepted standard that relies on expert readers to manually delineate enhancing lesions. The approach was to determine whether the semiautomatic determination of ROIs using dT1 images would compare with the adjudicated reads from the American College of Radiology Imaging Network 6677/Radiation Therapy Oncology Group 0625 central reader study (as reported in the primary article 11 ) with regard to association with patient outcome.

MATERIALS AND METHODS
The Radiation Therapy Oncology Group (RTOG, now NRG Oncology), in collaboration with the American College of Radiology Imaging Network (ACRIN, now Eastern Cooperative Oncology Group [ECOG]-ACRIN), both funded by the National Cancer Institute, conducted a prospective, randomized, Phase II multicenter trial (ACRIN 6677/RTOG 0625) of bevacizumab in recurrent glioblastoma multiforme. Each participating institution obtained institutional review board approval before subject accrual and conducted the trial in compliance with the Health Insurance Portability and Accountability Act. Informed consent was obtained for all subjects.

Patients
A total of 123 patients from 23 institutions with recurrent histologically proved glioblastoma or gliosarcoma were enrolled in the ACRIN 6677/RTOG 0625 trial. Detailed inclusion and exclusion criteria are available on the RTOG Web site (https://www. rtog.org/ClinicalTrials/ProtocolTable/StudyDetails.aspx? studyϭ0625).
All patients were treated with bevacizumab (10 mg/kg IV on days 1 and 15 of a 28-day cycle) in combination with either temozolomide or irinotecan. 11 Of the 123 patients enrolled, 107 patients met the inclusion criteria, defined as having imaging beyond baseline and progression data. Of these, 105 datasets could be analyzed by central reader analysis of 2D manual delineation of enhancing tumor  (2D-T1), which required having an interpretable baseline image  and at least 1 additional interpretable time point. Of the 105 datasets, matched pre-and postcontrast T1 images were available  for 83 patients, enabling the creation of dT1 images. A matched dataset is defined as one for which the same imaging sequence and the same scanning options (eg, flow-compensation is either on or off for both) are used for both the pre-and postcontrast T1WI. Slight differences in TR and TE between the pre-and postcontrast images are acceptable. When we restricted attention to the weeks 8 and 16 time points, 74 patients were evaluable for comparison between the same 2D-T1 and dT1 image datasets at week 8 and 57 patents were evaluable for comparison at week 16.  11 For both pre-and postcontrast T1WI, all sites were required to collect the data using a spin-echo sequence with the following parameter ranges: TE/TR ϭ minimum (Ͻ15 ms)/400 -600 ms, FOV ϭ 220 -240 mm, phase FOV ϭ 75%, slice thickness/ gap ϭ 5/1 mm, matrix ϭ 256 ϫ 256, NEX ϭ 1. The imaging protocol remained fixed at each site and across all time points. Following IV injection of 0.1 mmol/kg of a standard gadolinium-based contrast agent (the brand used was dictated by the preference of each site), axial 2D spin-echo (2D-T1) and 3D volumetric T1WI postcontrast images were acquired. Patients participating in the optional advanced component of the trial had dynamic contrast-enhanced, dynamic susceptibility contrast, and/or spectroscopic MR imaging at baseline, week 2, and after every 2 cycles of treatment. Results from these advanced imaging cohorts were previously reported. [12][13][14] A complete listing of all MR imaging parameters for this protocol can be found on the ACRIN Web site (https://www.acrin. org/PROTOCOLSUMMARYTABLE/PROTOCOL6677/ 6677ImagingMaterials.aspx).

Image Analysis
Central Reader Analysis. As previously described, 11 all local imaging was transmitted to ACRIN for central review by 2 primary readers and 1 adjudicator, each with neuroradiology Certificates of Added Qualification and 8, 6, and 3 years, respectively, of postfellowship experience. For each distinct contrast-enhancing target lesion (Ն1-cm diameter, Ն1 cm from other enhancing lesions), the largest diameter of contrast enhancement and its maximum perpendicular diameter were measured. A 2D tumor area was computed by summing over all lesions the product of maximum perpendicular diameters. Pre-and postcontrast images were reviewed simultaneously to exclude blood products from 2D measurements. For all evaluable patients, images at each available time point were presented in random order to both primary readers who then independently made 2D-T1 measurements. After completing measurements for all time points, the primary readers were unblinded to the order of examinations. Consistent with the Macdonald and Response Assessment in Neuro-Oncology (RANO) criteria, each reader determined the time of progression on 2D-T1 when there was a Ͼ25% increase with respect to the nadir in maximal cross-sectional enhancing areas or the appearance of any new measurable enhancing tumor. Radiologic response was defined as a Ն50% decrease with respect to baseline, confirmed on the subsequent time point. Steroid dosage and clinical status were unavailable to the readers for this study. The adjudicator settled discordant times to progression between the readers by opining on the most correct times to progression. FLAIR images were not used to determine outcomes for either the 2D-T1 or dT1 analysis.

Creation of dT1 Maps
The dT1 method quantitatively compares calibrated pre-(T1) and postcontrast anatomic images, in which the calibration rule was machine-learned from input data of a given type (eg, T1WI spinecho). 9,10 Specifically, learning the calibration rule (historically referred to as the "standardization step") 15,16 requires the determination of mean intensity values at predefined landmarks, which correspond to percentiles in the distribution of pixel values, using a dataset of training images. This training step is performed only once. Next, each new input image of a given type is transformed to the standardized space (ie, calibrated) using a piecewise linear-intensity mapping function. The result is a constant dynamic range for the calibrated images so that for a given tissue type, it is possible to establish fixed gray-level windows without the need for a per-case window level adjustment. 16 For routine analysis, 2D pre-and postcontrast 2D-T1 images were coregistered using a rigid mutual-information cost function, followed by application of the machine-learned calibration rule to each T1-weighted image. The calibrated-registered precontrast T1WI was subtracted from the calibrated-registered postcontrast T1WI, resulting in a dT1 image. Figure 1 illustrates the superior conspicuity of a glioblastoma with dT1 compared with a simple difference map constructed from noncalibrated images.
Because dT1 maps are quantitative, delineation of enhancing lesions can be semiautomated by choosing a fixed threshold and applying it consistently across time points and patients. The threshold of 3000 (calibrated units) was determined by an experienced (Ͼ20 years) neuroradiologist (S.D.R.), as previously described. 10 Briefly, dT1 voxels were spatially correlated with raw dynamic susceptibility contrast MR imaging data. Voxels with no visually discernable dynamic susceptibility contrast MR imaging signal (ie, a lack of perfused tissue) were used to confirm a lack of contrast agent-perfused tissue and, thus, a lack of contrast agent enhancement. A threshold of 3000 was found to reliably make this distinction and is now routinely applied to dT1 images for the semiautomatic determination of contrast agent-enhancing ROIs. Note that the perfusion signal was used for the initial determination of a threshold. Its collection and use are not required for the routine use of dT1 maps. Generation of dT1 images was built into the IB Delta Suite software (Imaging Biometrics, Elm Grove, Wisconsin) used for this study.
A nonexpert reader (ie, an engineer with Ͻ4 years of MR imaging experience at the time of annotation) blinded to the central reader analyses coarsely defined the bounding region on each image slice using the dT1 maps. Care was taken to exclude the choroid plexus, vessels, and scalp. All pixels within the bounding region that were above 3000 were included as the final enhancing-tumor ROI. No manipulation of the tumor ROI was performed beyond identification of the initial bounding region. An experienced neuroradiologist (S.D.R.), blinded to the central reader results, reviewed and approved the final ROIs for any difficult cases. This approach mimics the current practice of having technologists preprocess data and radiologists perform a final review.

Statistical Analysis
2D-T1 results were reported previously. 11 Using dT1, for each post-baseline time point, we measured the dT1 volume against the nadir value, and progression and response (R) were determined as described above for the central reader analysis. If neither the progression nor R criteria were satisfied, the time point was  Agreement between the adjudicated 2D-T1 assessments and the dT1 assessments at weeks 8 and 16 was determined using a simple percentage agreement, as well as the Krippendorff ␣ statistic. The latter statistic corrects for chance agreement, 17 when the methods agree perfectly, ␣ ϭ 1, and when the methods agree as if chance had produced the results, ␣ ϭ 0.
For both 2D-T1 and dT1, separate landmark analysis sets were created for progression by weeks 8 and 16, and association with overall survival (OS) was reported using Kaplan-Meier curves with the log-rank test.
Statistical computations were performed using SAS Version 9.4 software (SAS Institute, Cary, North Carolina) or R Version 3.4.4 software (R project; http://www.r-project.org/), with P Ͻ .05 considered statistically significant.

RESULTS
An example of a dT1 image created from a study patient is shown in Fig 2. The dT1 map, shown in color and gray-scale (Fig 2C, -D), clearly highlights the enhancing tumor without being confounded by the bright signal on precontrast T1WI (Fig 2A) or subtle enhancement on the postcontrast T1WI (Fig 2B).
Given the perfect agreement for progression at week 8 between dT1 and the adjudicated 2D-T1 reads, the Kaplan-Meier curves for both methods were identical (Table 1 and Fig 3A), with a significant difference in OS (P Ͻ .0001). While 2D-T1 did not further distinguish between R and NR-NP (P ϭ .35), there was a significant difference in OS between R and NR-NP for dT1 (P ϭ .01; Table 1 and Fig 3B).
At week 16, a highly significant difference in OS was observed between progressors and nonprogressors for both T1 and dT1 (P Ͻ .0001, P ϭ .006; Table 2 and Fig 3C). No difference in OS was observed between R and NR-NP for either method (P ϭ .73 and P ϭ .61; Table 2 and Fig 3D).

DISCUSSION
The results of this study support the integration of dT1 into central reader analysis for the delineation of contrast-enhancing brain tumor. The dT1 method was comparable with expert reads for determination of early tumor progression and proved superior for further distinguishing R versus NR-NP at the week 8 time point. While agreement between the methods decreased at week 16, both methods showed a significant difference in OS based on progression status.
The result that dT1 proved better for stratifying subpopulations may be explained in several possible ways: First, due to the process of standardization (calibration) followed by subtraction, dT1 provides a consistent and objective delineation of enhancing lesions. It is less confounded by both systematic differences (vendor platforms, slight variations in imaging settings) and the subjectivity (interreader differences) that influences current approaches. This feature, in turn, can result in an improved sensitivity to enhancing lesions, that may not be apparent on postcontrast images, as illustrated in Fig 4. In addition, the superiority of using dT1 may be explained by the demonstrated benefit of volumetric measurements over standard bidimensional approaches for measuring tumor size 18 and the application of a fixed physiology-based threshold to dT1 images to determine enhancing tumor burden.
Before 2010, the MacDonald criteria were widely used to assess treatment response of high-grade gliomas 19 and included the 2D measurement of enhancing tumor in conjunction with a clinical assessment and corticosteroid dose. Tumor progression on FLAIR and the recognition that contrast enhancement is nonspecific prompted the development of the updated RANO criteria, which added FLAIR to the MacDonald criteria. 20 However, FLAIR also have important limitations and remain controversial. 21 In fact, the parent study did not find a statistically significant survival time reduction among the isolated FLAIR progressors compared with nonprogressors. 11 Even so, we are not suggesting that dT1 replace RANO as the standard assessment criteria. Rather, the results of this study show that dT1 has the potential to replace the current approach for delineating enhancing-lesion volumes, which is one aspect of the RANO assessment.
Measurement of the contrast-enhancing lesion remains cen-  tral to the assessment of treatment response and was the focus of a recent effort to standardize imaging protocols for tumor-volume assessment. 1 Even as new imaging biomarkers, such as those derived from perfusion-or diffusion-weighted MR imaging, are proving useful for the biologic assessment of tumor response, the analysis of such biomarkers depends on the accurate delineation of enhancing tumor. Therefore, it is necessary to be able to process these pre-and postcontrast T1WI data in a robust manner for both routine care and clinical trials. The standard approach for lesion segmentation is the labor-intensive and time-consuming manual delineation of contrast-enhancing lesions by expert readers. Due to the subjective nature of this approach, clinical trials rely on multiple expert readers and involve additional readers to adjudicate cases for which there is disagreement. In a study that enlisted 8 board-certified radiologists to measure high-grade tumor diameters, substantial interreader disagreement was demonstrated with a rate of consensus regarding tumor progression of only 45% and only moderate reproducibility. 22 This lack of agreement necessitates frequent adjudication. The primary study, from which this secondary analysis obtained its data, reported a 43% adjudication rate when using 2D-T1 and 42% for 3D-T1. 11 Even more detrimental, the turnaround time for central analysis may preclude certain study designs that require assessment of progression within 48 hours, for example.
By comparison, the dT1 technology can be used by nonradiologists, as demonstrated in the present study, and requires only seconds to identify enhancing lesions. Because it is a semiautomated, dT1 overcomes the subjectivity that confounds current methods and, therefore, has the potential to provide greater consistency in lesion identification across time points and patients. These capabilities derive from the unique standardization (calibration) algorithm incorporated into the process of creating dT1 maps. 10,15,23 The standardization algorithm serves to diminish slight differences in TE and TR settings for a given sequence type 16 and thus lessens the importance of such variations that can result in differences in lesion conspicuity. Finally, the standardization algorithm, which  has also been trained for use in the creation of relative CBV maps, resulted in substantial improvement in repeatability 24 as well as consistency across time. 23 Therefore, it is expected that dT1 should also result in greater repeatability; an hypothesis that should be tested in prospective studies. The dT1 images are different from the subsequently developed, yet possibly better known, Gaussian-normalized difference maps. 8 Gaussian-normalized maps require the determination of a new normalization for each patient and image, both pre-and postcontrast, which raises questions about consistency across time points and patients. dT1 uses the same calibration and threshold for each patient and image, enabling consistent quantification and automation across time points, patients, and sites.
Simple difference images, which eliminate some confounding bright signal on precontrast T1WI, are limited by variations in sequence parameter settings (eg, Fig 1). Furthermore, simple difference maps are not quantitative, thereby precluding the ability to automate lesion identification and resulting in little improvement over current methods. Consequently, the dT1 technology has the greatest likelihood of offering a substantial improvement over similar tools, with a greater potential for automation and clinical use.
Also of particular interest is that, in this study, only dT1 could predict differences in outcome for the NR-NP tumors at week 8. This greater sensitivity may result from more accurate and possibly more sensitive delineation of enhancing tumor (Fig 4) by dT1, free of precontrast bright signal, or it may be attributed to the physiologically motivated threshold used with dT1. Thus, a quantitative dT1-determined lesion may more accurately reflect active brain tumor. However, whether this same threshold should be used for other contrast-enhancing tumor types is unknown and will be the topic of future studies.
A limitation of this study is that only 1 nonradiologist determined the enhancing ROIs using dT1. A separate study characterizing the interreader agreement using the dT1 technology is warranted. Also, it is likely that all cases, particularly those with more complicated lesions, will still require expert review. Yet expert sign-offs are routine, and improving the initial step of tumor delineation with dT1 should result in improved time efficiencies to the radiologists' workflow.
Another perceived limitation is that fewer datasets could be analyzed with dT1 compared with 2D-T1. However, this limitation is not intrinsic to the dT1 method. Rather, it is because this is a retrospective analysis of data that were not collected for the purpose of creating dT1 images. Although dT1 is amenable to slight variations in parameter settings such as TE and TR and works well across vendor platforms and field strengths, it requires that the same sequence be used to collect both the pre-and postcontrast T1WI. Consequently, the issue of limited application is not of concern for prospective clinical trials.
A final limitation is that the response assessment using dT1 did not explicitly include the appearance of new lesions as an indicator of progression. However, for all cases included in this substudy, no new measurable lesions of Ͼ1 mL were present at the time of review. Thus, all statements of progression were made on the basis of the findings at the primary tumor site only. Future studies will explicitly include the presence of new lesions as an additional criterion to determine progression.
Overall, the potential impact of dT1 technology is far-reaching, given the approximately 117,000 new diagnoses of primary brain tumor per year, 25 the Ͼ300,000 patients living with brain tumors who undergo repeat imaging follow-ups as part of their standard of care, and the 475 active clinical studies for glioblastoma multiforme (https://clinicaltrials.gov/). Therefore, the potential impact of the dT1 technology for daily practice and clinical trials is immense.

CONCLUSIONS
This study shows that dT1 can predict early progression comparable with the standard method, may be superior for substratification, and offers the potential for substantial time and cost savings for clinical trials.

ACKNOWLEDGMENTS
We acknowledge Imaging Biometrics LLC (IQ-AI Ltd) for technical support, software development, and assistance with data analysis. No direct funding support was provided by Imaging Biometrics LLC for this study. The data used in this article came from a legacy ACRIN grant. My involvement in this project is part of my duties as imaging Co-Chair of the Brain Tumor Working Group at ECOG-ACRIN. My involvement is supported by the grant to ECOG-ACRIN from the National Cancer Institute*; UNRELATED: Consultancy: Blue Earth Diagnostics, Comments: I was compensated for participation in a medical advisory board meeting concerning the PET agent fluciclovine. This has minimal, if any, relevance to the submission; Grants/Grants Pending: National Institutes of Health, Comments: I have grants in the planning stages that concern segmentation for brain tumor images obtained in clinical trials*; OTHER RELATION-SHIPS: My lab has developed a method of performing difference imaging for measuring tumor enhancement that could be compared with this method at some point. *Money paid to the institution.