Abstract
BACKGROUND AND PURPOSE: 2D measurements of diffuse intrinsic pontine gliomas are limited by variability, and volumetric response criteria are poorly defined. Semiautomated 2D measurements may improve consistency; however, the impact on tumor response assessments is unknown. The purpose of this study was to compare manual 2D, semiautomated 2D, and volumetric measurement strategies for diffuse intrinsic pontine gliomas.
MATERIALS AND METHODS: This study evaluated patients with diffuse intrinsic pontine gliomas through a Phase I/II trial (NCT02607124). Clinical 2D cross-product values were derived from manual linear measurements (cross-product = long axis × short axis). By means of dedicated software (mint Lesion), tumor margins were traced and maximum cross-product and tumor volume were automatically derived. Correlation and bias between methods were assessed, and response assessment per measurement strategy was reported.
RESULTS: Ten patients (median age, 7.6 years) underwent 58 MR imaging examinations. Correlation and mean bias (95% limits) of percentage change in tumor size from prior examinations were the following: clinical and semiautomated cross-product, r = 0.36, −1.5% (−59.9%, 56.8%); clinical cross-product and volume, r = 0.61, −2.1% (−52.0%, 47.8%); and semiautomated cross-product and volume, r = 0.79, 0.6% (−39.3%, 38.1%). Stable disease, progressive disease, and partial response rates per measurement strategy were the following: clinical cross-product, 82%, 18%, 0%; semiautomated cross-product, 54%, 42%, 4%; and volume, 50%, 46%, 4%, respectively.
CONCLUSIONS: Manual 2D cross-product measurements may underestimate tumor size and disease progression compared with semiautomated 2D and volumetric measurements.
ABBREVIATIONS:
- CP
- cross-product
- DIPG
- diffuse intrinsic pontine glioma
- PD
- progressive disease
- PR
- partial response
- SD
- stable disease
- TRC
- tumor response criteria
Diffuse intrinsic pontine gliomas (DIPGs) comprise 80% of brain stem gliomas, which, in turn, account for 10%–20% of central nervous system tumors in children.1,2 DIPG carries a dismal prognosis, with a mean survival of 11 months and 1-, 2-, and 5-year overall survival rates of 42%, 10%, and 2%, respectively.3 Standard of care for DIPG consists of involved field radiation therapy, lengthening survival by an average of 3–4 months.3,4 Recently, it has been discovered that up to 80% of DIPGs have a pathognomonic point mutation in histone H3.3 (H3F3A) (65% of tumors) or histone H3.1 (HIST1H3B) (25% of tumors), the latter conferring longer survival in most studies assessing this.3 Chemotherapeutic agents have failed to demonstrate efficacy, and no improvement in survival has been achieved in the past 4 decades.3,5-7 Currently, there are 69 interventional research studies for DIPG listed on ClinicalTrials.gov.
Most therapeutic trials for DIPG have assessed treatment response with Response Assessment in Neuro-Oncology, MacDonald, or World Health Organization criteria (or modifications), which use a 2D measurement of tumor size on MR imaging, allowing comparison with historical data.8⇓⇓-11 While studies have demonstrated good correlation between 2D and volumetric measurements in high-grade gliomas, there is a lack of comparison data in tumor size measurement strategies in patients with brain stem gliomas.12-14 Furthermore, high interobserver variability among 2D measurements in DIPG has been observed.15 With increasing availability and capability of novel tumor segmentation software, semiautomated tumor volumetry is a potentially useful assessment tool that may be more sensitive to tumor response and enable earlier determination of treatment efficacy. However, research is needed to define therapeutic end point criteria before incorporation into clinical trials.
The purpose of this study was to compare methods of DIPG tumor measurement, including traditional manual 2D measurements and semiautomated, software-assisted 2D measurements and tumor volumes. A secondary aim was to explore the implications of using software-assisted 2D measurements and tumor volumes for response assessment, compared with the standard manual 2D method.
MATERIALS AND METHODS
The institutional review board at Cincinnati Children’s Hospital Medical Center approved this study as part of a Phase I/II drug trial of ribociclib following radiation therapy in patients with newly diagnosed DIPG (NCT02607124). Patients between 1 and 30 years of age with nonbiopsied DIPG, retinoblastoma gene mutation–positive DIPG, or retinoblastoma gene mutation–positive high-grade glioma were prospectively recruited to undergo MR imaging before and following drug therapy between April 2016 and November 2017 as part of the treatment protocol.16 Written informed consent was obtained. MR imaging examinations were typically obtained before cycles 3, 5, 7, 9, and 11.
Imaging Protocol
Imaging examinations were performed on 3T scanners (Signa Architect, GE Healthcare). Imaging protocol included the following: volumetric T1WI and T2-FLAIR (1-mm isotropic, axial, and coronal reformations; FOV = 256 mm, matrix = 256 × 256, slice thickness = 1 mm), axial T2WI (FOV = 220, matrix = 512 × 224, slice thickness = 3 mm contiguous), DTI and SWI sequences. IV gadoterate meglumine (Dotarem; Guerbet) was injected (rate = 1 mL/s, dose = 0.1 mmol/kg). Postcontrast T1WI (volumetric, 1 mm) and axial T1-FLAIR (FOV = 220 mm, matrix = 320 × 288, slice thickness = 3 mm contiguous) sequences were obtained.
Tumor Measurements
For clinical purposes per the study protocol, 2D tumor measurements were made by 1 of 7 fellowship-trained, pediatric neuroradiologists at Cincinnati Children’s Hospital Medical Center, including the largest diameter in the axial plane (long axis) and a measurement perpendicular to the long axis (short axis). Clinical cross-product (CP) was calculated by multiplying long and short axes (Fig 1).
Sample case demonstrating measurement methods. A, Manual clinical transaxial (2D) measurements. Largest dimension identified on axial images and perpendicular short axis dimension performed and reported in the clinical radiology report and used for derivation of tumor response. B, Semiautomated 2D measurements. Tumor margins are traced (red outline) on each image, and automated 2D measurements (largest long axis dimension and perpendicular short axis dimension, blue lines) are automatically derived, along with tumor volume in the mint Lesion software package. C, Semiautomated 2D measurements and volumes performed during the treatment course. Imaging performed after cycles 2, 4, 6, and 8.
For this study, a single reviewer (J.L.L.), a pediatric neuroradiologist with 25 years of experience, blinded to the clinical measurements, manually segmented each entire tumor by tracing tumor margins on axial T2WI and T2-FLAIR images using proprietary, clinically-available software (mint Lesion, Version 3.4.5; Mint Medical). From the outlined tumor, the largest axial diameter (long axis) and the measurement perpendicular to the long axis (short axis) were automatically derived. Semiautomated CP was calculated by multiplying the long and short axes. From the outlined tumor, tumor volume was also automatically derived (Fig 1).
Classification of Tumor Response
Per the study protocol, the following imaging-related tumor response criteria (TRC) were used to categorize each follow-up examination (in regard to tumor size): complete response, complete disappearance of all tumor and mass effect, maintained for 8+ weeks; partial response (PR), ≥50% decrease from baseline; stable disease (SD), <50% decrease from baseline size and <25% increase from the prior lowest tumor size; and progressive disease (PD), ≥25% increase from the prior lowest tumor size.11,17 These criteria were applied to 2D and volumetric data. Additionally, we applied another TRC to the volume data only, following previously published recommendations: PR, ≥65% decrease from baseline; SD, <65% decrease from baseline or <40% increase from the prior lowest tumor size; and PD, ≥40% increase from the prior lowest tumor size.18,19 These published TRC for volumetric measurements are extrapolated from linear values using a spheric tumor model. Currently, there are no published prospective studies regarding TRC for tumor volumes in pediatric brain tumors.18
Statistical Analysis
Continuous data were summarized as means and SDs or medians and ranges; categoric data were summarized as counts and percentages. Correlation coefficients were used to compare clinical and semiautomated CP measurements and the percentage change in tumor size (compared with a prior examination) among the 3 measurement strategies. Bland-Altman analyses were performed to assess bias among measurement strategies. Descriptive statistics were used to describe percentage change in tumor size per TRC classification and measurement strategy.
P values < .05 were considered statistically significant for inference testing. Correlation coefficients were classified by the following definitions: 0–0.19, very weak; 0.20–0.39, weak; 0.40–0.59, moderate; 0.60–0.79, strong; and 0.80–1.0, very strong.20 Analyses were performed using MedCalc for Windows (MedCalc Software).
RESULTS
Ten patients were included, all with DIPGs. The median age at baseline MR imaging was 7.6 years (range, 3.9–20.1 years), and 6 patients (60%) were female. The median number of follow-up MRIs was 4 (range, 2–9), with a total of 58 examinations reviewed for this study. There were 50 follow-up response assessment time points after baseline with a mean time between baseline and last follow-up MR imaging of 177 ± 98 days (range, 56–411 days). One patient (with 2 follow-up MRIs) had 2 discrete tumors, each measured and analyzed separately. Case examples of tumor measurements over the study course are illustrated in Figs 2 and 3.
Sample case demonstrating clinical 2D measurements (clinical CP) and semiautomated 2D and volumetric measurements (semiautomated CP) during the treatment course. Note that in this case, although there were differences in orientation of the measurements with the semiautomated process, response classification was the same compared with manual clinical CP measurements.
Sample case demonstrating clinical 2D measurements (clinical CP) and semiautomated 2D and volumetric measurements (semiautomated CP) during the treatment course. In this case, per protocol, imaging progression based on clinical CP (A) was called after cycle 8 (33.2% increase in CP). With semiautomated CP (B), progressive disease would have been called (based solely on imaging) after cycle 2 (28% increase). This is due to a different section choice as a maximum transaxial dimension and slightly different measurement orientation (B, cycle 2). Subsequently, however, on the basis of a protocol comparing with smallest CP during treatment (baseline), stable disease would have been called. Note that although the CP increased 28% (PD) after cycle 2, the tumor volume only increased 9% (SD). Such discrepancies were common when comparing treatment-response strategies.
Correlation and Bias between Tumor Measurement Strategies
Clinical and semiautomated tumor CP measurements were strongly correlated (r = 0.74, P < .001), with a mean bias of −2.5 mm2 (95% limits of agreement, −9.5, +4.4 mm2) (Figs 4 and 5). There were strong, statistically significant correlations between the percentage change in tumor size between clinical CP and volume (r = 0.61, P < .001) and semiautomated CP and volume-measurement strategies (r = 0.79, P < .001). Correlation and mean bias of percentage change in tumor size between clinical and semiautomated CP measurement strategies are depicted in Figs 6 and 7. There was a weak, statistically significant correlation (r = 0.36, P = .011) of percentage change from prior examinations comparing clinical and semiautomated CP measurements, with a mean bias of −1.5% (95% limits of agreement, −59.9, +56.8%).
Clinically derived tumor CP versus semiautomated software–derived tumor CP for all time points with a linear trendline (r = 0.74, P < .0001).
Bland-Altman plot demonstrating bias between clinical and semiautomated tumor CP for all time points. The solid line indicates a mean bias between techniques. Dashed lines indicate ±2 SDs of the mean (95% limits of agreement). Overall, clinical CP measured less than semiautomated CP (mean bias, −2.5). Outliers (>1.96 SDs) were predominantly noted in larger tumors.
Correlation of percentage change from prior examination in clinical CP versus semiautomated CP (r = 0.36, P = .011).
Bland-Altman plot demonstrating bias between the percentage change in clinical and semiautomated tumor CP from a prior examination. The solid line indicates mean bias between techniques. Dashed lines indicate ±2 SDs of the mean (95% limits of agreement). Overall, percentage change in clinical CP was smaller than the percentage change in semiautomated CP (mean bias, −1.5%, 95% limits of agreement, −59.9, +56.8) between time points.
Comparison of Treatment Response Assessment between Tumor Measurement Strategies
Classification of percentage change of tumor size from baseline or nadir examination, per the protocol TRC, yielded 25/50 (50%) cases that were classified concordantly across all 3 tumor measurement methods. Concordance was 32/50 (64%) between clinical and semiautomated CP, 30/50 (60%) between clinical CP and volume, and 38/50 (76%) between semiautomated CP and volume. Frequencies of SD, PD, and PR by tumor-measurement strategy are reported in Table 1. No examination or method demonstrated a complete response. Of note, 34% (14/41) of time points classified as SD by clinical CP were classified as PD by semiautomated CP. The mean percentage change in tumor volume from follow-up examination to baseline or nadir per response category and tumor measurement strategy is reported in Table 2.
Response assessment classifications of 50 MR imaging time points for 3 tumor-management strategiesa
Descriptive statistics of the percentage change in tumor size from baseline or nadir to follow-up examination (assigned per clinical protocol) for each response assessment classification per tumor measurement strategy
DISCUSSION
Recent research effort has aimed at improving and discovering imaging biomarkers of DIPG. Such work is relevant for the standardization of clinical trial end points and improved detection of treatment effect. Historically, DIPGs have been measured using MacDonald, World Health Organization, or Response Assessment in Neuro-Oncology criteria (or modifications). These criteria use a 2D measurement strategy in which 2 perpendicular measurements are made on the image with the largest observed cross-sectional area of tumor. However, these standard tumor measurements were not developed and may not be best-suited for DIPG, and there remains an overall lack of standardization in DIPG measurements.
2D and 3D DIPG tumor measurements have demonstrated poor interobserver agreement, possibly related to their infiltrative nature, indiscrete borders, and heterogeneous appearance.15 Furthermore, 1D, 2D, and 3D tumor measurements of DIPG have not correlated well with clinical outcomes.21-24 Other surrogate imaging biomarkers of DIPG have been explored, including metabolic ratios by MR spectroscopy, tumor perfusion by dynamic susceptibility contrast MR imaging, and pontine size by conventional MR imaging.24-28 Recently, studies have investigated tumor volume measurement strategies (automated, semiautomated, or manual) and have shown them to be a promising tool with improved inter- and intraobserver agreement.18,29,30 It has been hypothesized that a volume measurement strategy may be more appropriate for DIPG, given its often complex morphology.
In this study, we have demonstrated a strong correlation and relatively small bias between manual and semiautomated (from mint Lesion software) CP values. However, the correlation of percentage change in tumor size (the metric by which treatment response is assessed) between these 2 strategies was weak. In some cases, there was a large discrepancy between the percentage change in tumor size across the clinical CP, semiautomated CP, and volumetric measurement strategies. Generally, the mint Lesion–derived measurements (semiautomated CP, volume) tended to classify tumors as PD compared with the clinical CP strategy, which tended to classify tumors as SD.
There are several potential explanations for these discrepancies. DIPG can have variable, nonspheric morphology, and some shapes are not well-approximated by 1 CP measurement. In addition, tumor growth is constrained by anatomic boundaries to some degree and may be less constrained in certain directions. More pronounced growth along the cranial-caudal axis, for example, may not be captured by performing measurements only in the transverse plane. Furthermore, clinical CP measurements are likely biased toward the radiologist by using a section location and measurement orientation similar to those used on prior examinations. This bias may result in underestimation of tumor size if greater interval growth occurred at a different axial section or in a different orientation than the manual measurements. This bias is also supported by our results: mint Lesion automatically derives the largest CP, which explains why these measurements were, on average, 2.5 mm2 larger than the clinical CP measurements. Our results have important implications for the use of segmentation software in clinical practice. Further study is needed to determine which method is a better indicator of clinical outcome.
We found the strongest correlation in percentage change of tumor size between the 2 methods by using mint Lesion software (semiautomatic CP and volume). This is not surprising because the same segmented images (derived from 1 reviewer who traced tumor margins in all slices) were used for these 2 strategies. Although we found that the bias between these techniques was very low in magnitude (0.6%), the limits of agreement remained wide (−39%, +38%), which could be related to our small sample size. Furthermore, we did not have adequate power to define new cutoff values for tumor response–assessment classes using tumor volume measurements. If volumetric measurements of DIPG should be used for assessing tumor response, identifying relationships with linear measurements (to inform comparisons with prior studies and treatment trials) and clinical progression is critical to ensure maximum utility. Volumetric end points derived from spheric/elliptic mathematic models using cranial-caudal and transverse dimensions may not be applicable to tumors with nonspheric growth patterns.30 Further research using larger groups of subjects and treatment time points is needed.
Correlations between true volumetric measurements and 2D measurements of brain tumors are few, particularly in children. One study of low-grade pediatric gliomas showed that 20% of MR imaging examinations demonstrated discordant response assessments between 2D and volumetric tumor measurement strategies.18 Shah et al,13 in a study of adult patients with glioblastomas, also demonstrated a 20% discordance rate in response assessments when comparing 2D and volumetric measurements. These discrepancy rates are quite similar to the 24% discrepancy rate between 2 techniques (clinical CP and volume) demonstrated in our study.
Our study was limited by several factors. First, we included a small sample size of patients with DIPG, inherent to the nature of this single-center study of a disease with low prevalence. Second, although the clinical trial protocol standardized the timing of MR imaging examinations, the actual timing and total number of MR imaging examinations varied across patients, often related to additional MR imaging examinations being performed when patients had a change in clinical status. Third, volumetric tumor measurements using the mint lesion software were made by only 1 reviewer; thus, we cannot draw conclusions regarding interobserver variability of the tumor measurements with this technique. Fourth, the software used for 2 of the measurement strategies in our study (semiautomatic CP, volume) may not be widely available, potentially limiting the application of our results. Finally, because the clinical outcome data for this trial remain unpublished, our study did not correlate tumor measurements or response assessment with clinical outcomes. This would be an important area of future investigation.
CONCLUSIONS
We have shown that correlation of change in DIPG tumor size among 3 measurement strategies is variable, with the strongest correlation observed between semiautomated 2D and volumetric strategies and the weakest correlation observed between clinical 2D and semiautomated 2D strategies. The conventional method of manual 2D (cross-product) tumor measurement likely underestimates tumor size and disease progression compared with semiautomated 2D measurements. Application of semiautomated 2D and volumetric measurements in therapeutic trials will alter response assessment compared with standard 2D measures in DIPG. Further research is needed to outline relationships among these methods, clinical signs of progression, and survival.
Footnotes
This work was supported, in part, by Novartis Pharmaceuticals, A Phase I/II Study of Ribociclib, a CDK4/6 Inhibitor, Following Radiation Therapy. ClinicalTrials.gov Identifier: NCT02607124.
Additional support: The Cure Starts Now Foundation, Hope for Caroline Foundation, Julian Boivin Courage for Cures Foundation, Abbie’s Army, Michael Mosier Defeat DIPG Foundation, Reflections of Grace Foundation, The Cure Starts Now Australia, Brooke Healey Foundation, Soar With Grace Foundation, Jeffrey Thomas Hayden Foundation, Cure Brain Cancer Foundation, The Jones Family Foundation, Musella Foundation, Pray, Hope, Believe Foundation, Smiles for Sophie Foundation, Benny’s World, Love Chloe Foundation, Aiden’s Avengers, A Cure from Caleb Society, The Operation Grace White Foundation, Ryan’s Hope, Wayland Villars DIPG Foundation, American Childhood Cancer Organization, Juliana Rose Donnelly Trust, Sheila Jones and Friends, The Ellie Kavalieros DIPG Research Fund, Voices Against Brain Cancer, The DIPG Collaborative.
Disclosures: Mariko DeWire-Schottmiller—RELATED: Grant: The DIPG Collaborative and additional funding sources in the comments, Comments: The Cure Starts Now Foundation, Hope for Caroline Foundation, Julian Boivin Courage for Cures Foundation, Abbie’s Army, Michael Mosier Defeat DIPG Foundation, Reflections of Grace Foundation, The Cure Starts Now Australia, Brooke Healey Foundation, Soar With Grace Foundation, Jeffrey Thomas Hayden Foundation, Cure Brain Cancer Foundation, The Jones Family Foundation, Musella Foundation, Pray, Hope, Believe Foundation, Smiles for Sophie Foundation, Benny’s World, Love Chloe Foundation, Aiden’s Avengers, A Cure from Caleb Society, The Operation Grace White Foundation, Ryan’s Hope, Wayland Villars DIPG Foundation, American Childhood Cancer Organization, Juliana Rose Donnelly Trust, Sheila Jones and Friends, The Ellie Kavalieros DIPG Research Fund, Voices Against Brain Cancer, and The DIPG Collaborative*; Other: Novartis Pharmaceuticals.* Maryam Fouladi—RELATED: Grant: Cincinnati Children’s Hospital Medical Center, Comments: Novartis-supported clinical trial.* *Money paid to the institution.
References
- Received December 20, 2019.
- Accepted after revision February 26, 2020.
- © 2020 by American Journal of Neuroradiology