Tumor Response Assessment in Diffuse Intrinsic Pontine Glioma: Comparison of Semiautomated Volumetric, Semiautomated Linear, and Manual Linear Tumor Measurement Strategies

BACKGROUND AND PURPOSE: 2D measurements of diffuse intrinsic pontine gliomas are limited by variability, and volumetric response criteria are poorly de ﬁ ned. Semiautomated 2D measurements may improve consistency; however, the impact on tumor response assessments is unknown. The purpose of this study was to compare manual 2D, semiautomated 2D, and volumetric measurement strategies for diffuse intrinsic pontine gliomas. MATERIALS AND METHODS: This study evaluated patients with diffuse intrinsic pontine gliomas through a Phase I/II trial (NCT02607124). Clinical 2D cross-product values were derived from manual linear measurements (cross-product ¼ long axis (cid:2) short axis). By means of dedicated software (mint Lesion), tumor margins were traced and maximum cross-product and tumor volume were automatically derived. Correlation and bias between methods were assessed, and response assessment per measurement strategy was reported. RESULTS: Ten patients (median age, 7.6years) underwent 58 MR imaging examinations. Correlation and mean bias (95% limits) of percentage change in tumor size from prior examinations were the following: clinical and semiautomated cross-product, r ¼ 0.36, (cid:3) 1.5% ( (cid:3) 59.9%, 56.8%); clinical cross-product and volume, r ¼ 0.61, (cid:3) 2.1% ( (cid:3) 52.0%, 47.8%); and semiautomated cross-product and volume, r ¼ 0.79, 0.6% ( (cid:3) 39.3%, 38.1%). Stable disease, progressive disease, and partial response rates per measurement strategy were the following: clinical cross-product, 82%, 18%, 0%; semiautomated cross-product, 54%, 42%, 4%; and volume, 50%, 46%, 4%, respectively. CONCLUSIONS: Manual 2D cross-product measurements may underestimate tumor size and disease progression compared with semiautomated 2D and volumetric measurements.

Most therapeutic trials for DIPG have assessed treatment response with Response Assessment in Neuro-Oncology, MacDonald, or World Health Organization criteria (or modifications), which use a 2D measurement of tumor size on MR imaging, allowing comparison with historical data. [8][9][10][11] While studies have demonstrated good correlation between 2D and volumetric measurements in high-grade gliomas, there is a lack of comparison data in tumor size measurement strategies in patients with brain stem gliomas. [12][13][14] Furthermore, high interobserver variability among 2D measurements in DIPG has been observed. 15 With increasing availability and capability of novel tumor segmentation software, semiautomated tumor volumetry is a potentially useful assessment tool that may be more sensitive to tumor response and enable earlier determination of treatment efficacy. However, research is needed to define therapeutic end point criteria before incorporation into clinical trials.
The purpose of this study was to compare methods of DIPG tumor measurement, including traditional manual 2D measurements and semiautomated, software-assisted 2D measurements and tumor volumes. A secondary aim was to explore the implications of using software-assisted 2D measurements and tumor volumes for response assessment, compared with the standard manual 2D method.

MATERIALS AND METHODS
The institutional review board at Cincinnati Children's Hospital Medical Center approved this study as part of a Phase I/II drug trial of ribociclib following radiation therapy in patients with newly diagnosed DIPG (NCT02607124). Patients between 1 and 30 years of age with nonbiopsied DIPG, retinoblastoma gene mutation-positive DIPG, or retinoblastoma gene mutation-positive high-grade glioma were prospectively recruited to undergo MR imaging before and following drug therapy between April 2016 and November 2017 as part of the treatment protocol. 16 Written informed consent was obtained. MR imaging examinations were typically obtained before cycles 3, 5, 7, 9, and 11.

Tumor Measurements
For clinical purposes per the study protocol, 2D tumor measurements were made by 1 of 7 fellowship-trained, pediatric neuroradiologists at Cincinnati Children's Hospital Medical Center, including the largest diameter in the axial plane (long axis) and a measurement perpendicular to the long axis (short axis). Clinical cross-product (CP) was calculated by multiplying long and short axes (Fig 1). For this study, a single reviewer (J.L.L.), a pediatric neuroradiologist with 25 years of experience, blinded to the clinical measurements, manually segmented each entire tumor by tracing tumor margins on axial T2WI and T2-FLAIR images using proprietary, clinically-available software (mint Lesion, Version 3.4.5; Mint Medical). From the outlined tumor, the largest axial diameter (long axis) and the measurement perpendicular to the long axis (short axis) were automatically derived. Semiautomated CP was calculated by multiplying the long and short axes. From the outlined tumor, tumor volume was also automatically derived (Fig 1).

Classification of Tumor Response
Per the study protocol, the following imaging-related tumor response criteria (TRC) were used to categorize each follow-up examination (in regard to tumor size): complete response, complete disappearance of all tumor and mass effect, maintained for 81 weeks; partial response (PR), $50% decrease from baseline; stable disease (SD), ,50% decrease from baseline size and ,25% increase from the prior lowest tumor size; and progressive disease (PD), $25% increase from the prior lowest tumor size. 11,17 These criteria were applied to 2D and volumetric data. Additionally, we applied another TRC to the volume data only, following previously published recommendations: PR, $65% decrease from baseline; SD, ,65% decrease from baseline or ,40% increase from the prior lowest tumor size; and PD, $40% increase from the prior lowest tumor size. 18,19 These published TRC for volumetric measurements are extrapolated from linear values using a spheric tumor model. Currently, there are no published prospective studies regarding TRC for tumor volumes in pediatric brain tumors. 18

Statistical Analysis
Continuous data were summarized as means and SDs or medians and ranges; categoric data were summarized as counts and percentages. Correlation coefficients were used to compare clinical and semiautomated CP measurements and the percentage change in tumor size (compared with a prior examination) among the 3 measurement strategies. Bland-Altman analyses were performed to assess bias among measurement strategies. Descriptive statistics were used to describe percentage change in tumor size per TRC classification and measurement strategy. P values , .05 were considered statistically significant for inference testing. Correlation coefficients were classified by the following definitions: 0-0.19, very weak; 0.20-0.39, weak; 0.40-0.59, moderate; 0.60-0.79, strong; and 0.80-1.0, very strong. 20 Analyses were performed using MedCalc for Windows (MedCalc Software).

RESULTS
Ten patients were included, all with DIPGs. The median age at baseline MR imaging was 7.6 years (range, 3.9-20.1 years), and 6 patients (60%) were female. The median number of follow-up FIG 3. Sample case demonstrating clinical 2D measurements (clinical CP) and semiautomated 2D and volumetric measurements (semiautomated CP) during the treatment course. In this case, per protocol, imaging progression based on clinical CP (A) was called after cycle 8 (33.2% increase in CP). With semiautomated CP (B), progressive disease would have been called (based solely on imaging) after cycle 2 (28% increase). This is due to a different section choice as a maximum transaxial dimension and slightly different measurement orientation (B, cycle 2). Subsequently, however, on the basis of a protocol comparing with smallest CP during treatment (baseline), stable disease would have been called. Note that although the CP increased 28% (PD) after cycle 2, the tumor volume only increased 9% (SD). Such discrepancies were common when comparing treatment-response strategies.

Comparison of Treatment Response Assessment between Tumor Measurement Strategies
Classification of percentage change of tumor size from baseline or nadir examination, per the protocol TRC, yielded 25/50 (50%) cases that were classified concordantly across all 3 tumor measurement methods. Concordance was 32/50 (64%) between clinical and semiautomated CP, 30/50 (60%) between clinical CP and volume, and 38/50 (76%) between semiautomated CP and volume. Frequencies of SD, PD, and PR by tumor-measurement strategy are reported in Table 1. No examination or method demonstrated a complete response. Of note, 34% (14/41) of time points classified as SD by clinical CP were classified as PD by semiautomated  CP. The mean percentage change in tumor volume from follow-up examination to baseline or nadir per response category and tumor measurement strategy is reported in Table 2.

DISCUSSION
Recent research effort has aimed at improving and discovering imaging biomarkers of DIPG. Such work is relevant for the standardization of clinical trial end points and improved detection of treatment effect. Historically, DIPGs have been measured using MacDonald, World Health Organization, or Response Assessment in Neuro-Oncology criteria (or modifications). These criteria use a 2D measurement strategy in which 2 perpendicular measurements are made on the image with the largest observed cross-sectional area of tumor. However, these standard tumor measurements were not developed and may not be best-suited for DIPG, and there remains an overall lack of standardization in DIPG measurements.
2D and 3D DIPG tumor measurements have demonstrated poor interobserver agreement, possibly related to their infiltrative nature, indiscrete borders, and heterogeneous appearance. 15 Furthermore, 1D, 2D, and 3D tumor measurements of DIPG have not correlated well with clinical outcomes. [21][22][23][24] Other surrogate imaging biomarkers of DIPG have been explored, including metabolic ratios by MR spectroscopy, tumor perfusion by dynamic susceptibility contrast MR imaging, and pontine size by conventional MR imaging. [24][25][26][27][28] Recently, studies have investigated tumor volume measurement strategies (automated, semiautomated, or manual) and have shown them to be a promising tool with improved inter-and intraobserver agreement. 18,29,30 It has been hypothesized that a volume measurement strategy may be more appropriate for DIPG, given its often complex morphology.
In this study, we have demonstrated a strong correlation and relatively small bias between manual and semiautomated (from mint Lesion software) CP values. However, the correlation of percentage change in tumor size (the metric by which treatment response is assessed) between these 2 strategies was weak. In some cases, there was a large discrepancy between the percentage change in tumor size across the clinical CP, semiautomated CP, and volumetric measurement strategies. Generally, the mint Lesion-derived measurements (semiautomated CP, volume) tended to classify tumors as PD compared with the clinical CP strategy, which tended to classify tumors as SD.
There are several potential explanations for these discrepancies. DIPG can have variable, nonspheric morphology, and some shapes are not well-approximated by 1 CP measurement. In addition, tumor growth is constrained by anatomic boundaries to some degree and may be less constrained in certain directions. More pronounced growth along the cranial-caudal axis, for example, may not be captured by performing measurements only in the transverse plane. Furthermore, clinical CP measurements are likely biased toward the radiologist by using a section location and measurement orientation similar to those used on prior examinations. This bias may result in underestimation of tumor size if greater interval growth occurred at a different axial section or in a different orientation than the manual measurements. This bias is also supported by our results: mint Lesion automatically derives the largest CP, which explains why these measurements were, on average, 2.5 mm 2 larger than the clinical CP measurements. Our results have important implications for the use of  SD  PD  PR  SD  PD  PR  SD  PD  PR  SD  PD  Clinical CP  PR  0  0  0  0  0  0  0  0  0  SD  2  25  14  2  23  16  1  32  8  PD  0  2  7  0  2  7  0  3  6  Concordance  64%  60%  76%  Semiautomated CP  PR  2  0  0  1  1  0  SD  0  2 0  7  0  2   segmentation software in clinical practice. Further study is needed to determine which method is a better indicator of clinical outcome.
We found the strongest correlation in percentage change of tumor size between the 2 methods by using mint Lesion software (semiautomatic CP and volume). This is not surprising because the same segmented images (derived from 1 reviewer who traced tumor margins in all slices) were used for these 2 strategies. Although we found that the bias between these techniques was very low in magnitude (0.6%), the limits of agreement remained wide (À39%, 138%), which could be related to our small sample size. Furthermore, we did not have adequate power to define new cutoff values for tumor response-assessment classes using tumor volume measurements. If volumetric measurements of DIPG should be used for assessing tumor response, identifying relationships with linear measurements (to inform comparisons with prior studies and treatment trials) and clinical progression is critical to ensure maximum utility. Volumetric end points derived from spheric/elliptic mathematic models using cranial-caudal and transverse dimensions may not be applicable to tumors with nonspheric growth patterns. 30 Further research using larger groups of subjects and treatment time points is needed.
Correlations between true volumetric measurements and 2D measurements of brain tumors are few, particularly in children. One study of low-grade pediatric gliomas showed that 20% of MR imaging examinations demonstrated discordant response assessments between 2D and volumetric tumor measurement strategies. 18 Shah et al, 13 in a study of adult patients with glioblastomas, also demonstrated a 20% discordance rate in response assessments when comparing 2D and volumetric measurements. These discrepancy rates are quite similar to the 24% discrepancy rate between 2 techniques (clinical CP and volume) demonstrated in our study.
Our study was limited by several factors. First, we included a small sample size of patients with DIPG, inherent to the nature of this single-center study of a disease with low prevalence. Second, although the clinical trial protocol standardized the timing of MR imaging examinations, the actual timing and total number of MR imaging examinations varied across patients, often related to additional MR imaging examinations being performed when patients had a change in clinical status. Third, volumetric tumor measurements using the mint lesion software were made by only 1 reviewer; thus, we cannot draw conclusions regarding interobserver variability of the tumor measurements with this technique. Fourth, the software used for 2 of the measurement strategies in our study (semiautomatic CP, volume) may not be widely available, potentially limiting the application of our results. Finally, because the clinical outcome data for this trial remain unpublished, our study did not correlate tumor measurements or response assessment with clinical outcomes. This would be an important area of future investigation.

CONCLUSIONS
We have shown that correlation of change in DIPG tumor size among 3 measurement strategies is variable, with the strongest correlation observed between semiautomated 2D and volumetric strategies and the weakest correlation observed between clinical 2D and semiautomated 2D strategies. The conventional method of manual 2D (cross-product) tumor measurement likely underestimates tumor size and disease progression compared with semiautomated 2D measurements. Application of semiautomated 2D and volumetric measurements in therapeutic trials will alter response assessment compared with standard 2D measures in DIPG. Further research is needed to outline relationships among these methods, clinical signs of progression, and survival.