MR Imaging–Based Radiomic Signatures of Distinct Molecular Subgroups of Medulloblastoma

BACKGROUND AND PURPOSE: Distinct molecular subgroups of pediatric medulloblastoma confer important differences in prognosis and therapy. Currently, tissue sampling is the only method to obtain information for classification. Our goal was to develop and validate radiomic and machine learning approaches for predicting molecular subgroups of pediatric medulloblastoma. MATERIALS AND METHODS: In this multi-institutional retrospective study, we evaluated MR imaging datasets of 109 pediatric patients with medulloblastoma from 3 children's hospitals from January 2001 to January 2014. A computational framework was developed to extract MR imaging–based radiomic features from tumor segmentations, and we tested 2 predictive models: a double 10-fold cross-validation using a combined dataset consisting of all 3 patient cohorts and a 3-dataset cross-validation, in which training was performed on 2 cohorts and testing was performed on the third independent cohort. We used the Wilcoxon rank sum test for feature selection with assessment of area under the receiver operating characteristic curve to evaluate model performance. RESULTS: Of 590 MR imaging–derived radiomic features, including intensity-based histograms, tumor edge-sharpness, Gabor features, and local area integral invariant features, extracted from imaging-derived tumor segmentations, tumor edge-sharpness was most useful for predicting sonic hedgehog and group 4 tumors. Receiver operating characteristic analysis revealed superior performance of the double 10-fold cross-validation model for predicting sonic hedgehog, group 3, and group 4 tumors when using combined T1- and T2-weighted images (area under the curve = 0.79, 0.70, and 0.83, respectively). With the independent 3-dataset cross-validation strategy, select radiomic features were predictive of sonic hedgehog (area under the curve = 0.70–0.73) and group 4 (area under the curve = 0.76–0.80) medulloblastoma. CONCLUSIONS: This study provides proof-of-concept results for the application of radiomic and machine learning approaches to a multi-institutional dataset for the prediction of medulloblastoma subgroups.

wingless-type [WNT], group 3, and group 4) with specific subgroups conferring important prognostic and therapeutic differences. [2][3][4] For example, patients with WNT-pathway-activated tumors have favorable outcomes with a nearly 90% 5-year survival rate, while patients with group 3 tumors have Ͻ50% overall survival. 5 These divergent prognostic outcomes have propelled the recognition of these 4 subgroups, reflected in the recent revision of the World Health Organization classification of MB. 6 These molecular subgroups now drive risk-stratification, clinical outcome modeling, and novel therapeutic development. 7,8 Subtyping of tumors is frequently performed on tissues obtained from surgical resection but can also be performed from tissues obtained from a single biopsy. Even single biopsies of MB can yield accurate information for subtyping because of the presence of spatially homogeneous transcriptomes in MBs, in contrast to other tumor types such as high-grade gliomas. 9 However, surgical sampling is invasive and confers added risk to patients. In addition, despite the increasing clinical utility of MB subtyping, the translation of these genomic insights into clinical practice has been limited by extensive cost and a lack of access to sophisticated methods for accurate and expedient subgroup/subtype analyses. 3 Radiomics is an emerging discipline that can link imaging features to tumor genotype and serves as a promising approach to identify surrogate biomarkers that can accurately reflect tumor genomics. 10 Radiomic strategies have been extensively investigated in multiple cancer types, including non-small cell lung cancer, 11 glioblastoma, 12-16 hepatocellular carcinoma, 17 prostate cancer, 18 and breast cancer. 19 However, few studies have applied radiomics to MB; in those that have, the focus has been on the qualitative characterization of these tumors on MR imaging. [20][21][22][23][24][25][26] Specifically, these studies have shown that tumor location and enhancement patterns differ across MB subgroups. [20][21][22][23][24]26 For example, group 3 and group 4 MBs often arise in the midline, SHH tumors occur most frequently in the cerebellar hemispheres, and WNT tumors occur in both the midline and the cerebellar peduncle/ cerebellopontine angle cistern locations. 20, [22][23][24] Moreover, absence of enhancement is predictive of group 4 tumors, 23 while extensive enhancement in non-WNT/SHH tumors is predictive of poorer overall and event-free survival. 21 While qualitative image features of MB subgroups can provide useful clinical insight, they are subject to interobserver variability and do not capture all the multidimensional data that are acquired by MR imaging.
To date, the use of a quantitative imaging approach for the predictive analysis of MB subgroups has not yet been well-developed. In this multi-institutional study, we aimed to develop and validate radiomic and machine-learning methods to identify computational MR image signatures that are predictive of distinct molecular subgroups of MB. The discovery and establishment of noninvasive and surrogate imaging markers of MB subgroups can provide clinicians with a window into the genomics of these tumors, which can ultimately be helpful for clinical prognostication and informing management.

Patients
This multicenter retrospective study was approved by the institutional review board or research ethics board from each of the 3 participating academic institutions: Lucile Packard Children's Hospital (Stanford University, Palo Alto, California), Boston Children's Hospital (Boston, Massachusetts), and the Hospital for Sick Children (Toronto, Ontario, Canada). Because this was a retrospective study, informed consent was waived. Interinstitutional data agreement was obtained for data-sharing. All patients with de novo and histologically confirmed MBs were identified from the medical record data base of each institution from January 2001 to January 2014. These patients were further screened using the following inclusion criteria: availability of high-quality preoperative MR imaging as determined by experienced pediatric neuroradiologists, neurosurgeons, and neuro-oncologists and the availability of molecular subgroup information or the availability of tumor tissue for molecular subtyping. A total of 109 patients were included across the 3 institutions (Lucile Packard Children's Hospital, n ϭ 32; Boston Children's Hospital, n ϭ 28; the Hospital for Sick Children, n ϭ 49), comprising 64 males and 45 females; mean age, 8.56 Ϯ 5.75 years; range, 1-18 years ( Table 1). Clinicopathologic information including age, sex, histology diagnosis, and molecular subgroups, if available, was obtained from the medical record.

Radiomic Feature-Extraction Methodology
We developed a computational framework to capture a variety of phenotypic characteristics of tumor. A total of 590 MR imagingbased radiomic features were extracted from the ROIs on T2weighted and contrast-enhanced T1-weighted MR images, re-spectively. The primary types of radiomic features included intensity-based histograms, tumor edge-sharpness, Gabor features, and local area integral invariant (LAII) (all features used are described in On-line Tables 1 and 2). The Daube on Histogram features were based on Daubechies wavelet decomposition. The Quantitative Image-Feature Engine 28 offers additional detailed definitions of the extracted radiomic features (On-line Table 1). The z score normalization was used on each feature to standardize the range of all image features.

Statistical Analysis
Statistical analysis was conducted with Python software (2.7.14, https://www.python.org/). A nonparametric Wilcoxon rank sum test was used for feature selection, and a support vector machine (SVM) classifier was used for prediction. Statistical significance levels were all 2-sided, with statistical significance set at P Ͻ .05. Receiver operating characteristic (ROC) curve analysis was used to perform prediction evaluation of each molecular subgroup of MB.

Feature Selection, Radiomics, and Machine Learning Approach
The feature-selection method was applied to select the most discriminative features within a 10-fold cross-validation evaluation strategy (see "Model Evaluation"). Specifically, we used the Wilcoxon rank sum test 29 on individual features and sorted them by the acquired P values. After cross-validation analysis, the top k (k ϭ 5, 10, 15, 20, 30, 40, 50, 100, 200, and 300) features with smallest k P values were selected in the training set. We then assessed the predictive power of selected radiomic features on the validation set.
We applied the SVM classifier using a double 10-fold crossvalidation strategy for testing the performance of the model in predicting the 4 main MB molecular subgroups. SVM tackles high-dimensional data classification by weighting features and the use of a Gaussian radial basis function kernel. During the training process, to avoid potential overfitting, we determined the optimal parameters of the SVM classifier and the optimal number of image features using an internal 10-fold cross-validation and tested them by a range of selected features (top 5, 10, 15, 20, 30, 40, 50, 100, 200, and 300 features). Next, the trained model with the best area under the receiver operating characteristic curve (AUC) value was used for testing unseen samples in an outer 10-fold cross-validation strategy to determine the test set performance.

Model Evaluation
Two validation schemes were incorporated to evaluate the predictive performance of extracted radiomic features. To determine the generalization accuracy of the predictive models, we first performed a double 10-fold cross-validation on a single dataset containing all 3 patient cohorts (Fig 1). To validate the model across different institutions, we next tested an evaluation strategy in which we trained the model using the combined dataset from 2 institutions; then, we tested the model on data from the third independent institution. This process was repeated 3 times with each institutional cohort serving once as the test set (Fig 1), allow-ing us to evaluate truly predictive radiomic features across clinical sites with different vendors and imaging parameters. The overall model performance was assessed using the average of the 3 iterations and by determination of the AUC. Table 2 summarizes the mean AUCs for prediction of the MB subgroups using the double 10-fold cross-validation and 3-dataset cross-validation strategies on solely T1-weighted, solely T2weighted, and combined T1-and T2-weighted image datasets.

Model Evaluation
The double 10-fold cross-validation strategy, which combines all institutional cohorts into 1 dataset, showed that SVMs resulted in the best performance for predicting molecular subgroups. ROC analysis revealed superior performance of this model for predicting the SHH, group 3, and group 4 tumors, particularly when using extracted quantitative data from both T1-and T2-weighted images (AUC ϭ 0.79, 0.70, and 0.83, respectively) (Fig 2). In contrast, the model was not strongly predictive of WNT tumors, despite using all the different image types (AUC ϭ 0.45-0.63).  tional features extracted from T1-weighted images and the combined dataset from T1-and T2-weighted images were predictive of SHH (AUC ϭ 0.73 and 0.70, respectively) and group 4 (AUC ϭ 0.76 and 0.80, respectively) tumors (Table 2). In addition, while the mean AUC for predicting WNT tumors using T2-weighted images was good (0.72), there were institutional differences in performance (Stanford, AUC ϭ 0.90; Boston, AUC ϭ 0.49; Toronto, AUC ϭ 0.76), suggesting that more training samples of the WNT group are needed to yield stable prediction outcomes.

Identification of Discriminative Radiomic Features
To identify discriminative radiomic features for predicting the 4 main molecular subgroups of MB (Fig 4) within our study population, we analyzed the results of selected features for all tested models. On-line Table 3 shows the best number of features for each institutional cohort and the number of overlapping features that was selected in all 3 cross-validation loops (see On-line Table  4 for a complete list of feature categories and the number of overlapped features in each category). We observed that the prediction of SHH is the most robust across all institutions because the optimal feature number for 3 cross-validation loops is the same (40 features), which represented a small subset of all 590 features (6.8%). Of all the features evaluated, there were 4 leading categories: lesion area, edge-sharpness, LAII, and histogram features (On-line Fig 2), with edge-sharpness features being the most important for predicting SHH and group 4.

DISCUSSION
In this study, we developed and validated radiomic and machine learning approaches to identify individual categories of MR imaging-based radiomic features that predict distinct biologic subgroups of MB. Our first method using the double 10-fold crossvalidation scheme allowed the prediction of SHH and group 4 tumors using combined information extracted from T1-and T2weighted sequences, which are frequently acquired as part of the brain tumor MR imaging protocol of any institution. The second method using an independent 3-dataset cross-validation scheme showed the potential for applying our computational pipeline to datasets from outside institutions. In keeping with the results of our first method, this approach yielded a good predictive performance of SHH and group 4 tumors using combined T1 and T2 datasets. However, both models performed comparatively less robustly in predicting WNT and group 3 tumors, perhaps related to the lower amount of available imaging data for these specific subgroups and more molecular heterogeneity across group 3 tumors. Several brain tumors are known to have spatial molecular 1,9,30,31 and imaging 13,22,23,25,26 heterogeneity. With regard to MB, the identification of 4 molecular subgroups in the past decade has deepened our understanding of the underlying biology of this tumor and the correlation of a specific tumor genotype with different clinical outcomes. 2,3,5 A recent study analyzing multiple biopsies within MB showed that a single biopsy can accurately and reliably subtype MB due to its spatially homogeneous transcriptomes, in contrast to the markedly heterogeneous genomic landscape of glioblastomas; however, actionable somatic mutations found in a single biopsy of MB were infrequently clonal across the entire tumor, which underscores the true molecular heterogeneity of this tumor. 9 In fact, Cavalli et al 32 have further identified 12 distinct subtypes within each of the 4 core MB subgroups, each with differing clinical presentations, prognosis, and copy-number mutations. Thus, the complex molecular heterogeneity of MB subgroups and the paucity of imaging data available for individual subgroups in this study may help to explain the performance of our models for predicting WNT and group 3 tumors. Additionally, prior studies have shown tumor location to be a unique factor, particularly for predicting WNT tumors, [22][23][24] and radiomic analysis of isolated tumor volume may be another explanation for our model performance. Thus, the incorporation of qualitative semantic features such as tumor location into our model may improve its performance, particularly for the prediction of WNT medulloblastomas, which is important clinically be-    cause this subgroup is associated with the best prognosis and may not need the aggressive therapies used to treat other subgroups. 4 Because MR imaging has the capacity to capture the structure and physiology of an entire tumor, it can be an invaluable tool for noninvasively evaluating tumoral genetic heterogeneity. 33 The spatial variations in genetic and molecular expression of MBs can manifest as different imaging phenotypes on MR imaging, with varying degrees of intratumoral enhancement, hemorrhage, and signal intensity on T1-and T2-weighted images. 16,[21][22][23]25,26 Radiomic studies, most of which have focused on adult glioblastomas, have shown success in linking quantitative imaging features with key mutations as well as clinical outcomes. 12-14, [34][35][36] For example, 1 study proposed that a distinct glioblastoma subtype, specifically, a rim-enhancing cluster found to upregulate the vascular endothelial growth factor receptor signaling pathway, is more likely to respond to upfront antiangiogenic therapy. 12 Many genetic factors (eg, MYC, MYCN, OTX2, CDK6, SNCAIP, and ACVR1) contribute to the 4 main MB subgroups and associated prognostic differences; even within the SHH and group 3 subgroups, more granular sub-subgroups have emerged with significant differences in the rate of metastases and 5-year survival. 32,37 Given molecular complexities that pose challenges in MB subclassification, there is an important future role for a high-performance, image-based biomarker that either predicts such unique molecular groups or subgroups of MB or provides a more robust tumor risk-stratification scheme for treatment decision-making independent of molecular grouping or subgrouping.
Furthermore, a rapid, low-risk, and inexpensive platform for classifying MB that is feasible with radiomic and machine learning algorithms can potentially enable more widespread tumor subtyping in clinical institutions that may have limited histopathologic and genomic resources. While immunohistochemistry markers (GAB1, ␤-catenin, filamin A, and YAP1) are currently used in some institutions to identify SHH and WNT tumors, identifying specific group 3 or group 4 tumors remains expensive and difficult without the application of gene expression or methylation profiling. 38,39 In this study, we show a relatively high performance for predicting group 4 MB that is feasible with a computational analysis scheme.
This study has limitations. Because this was a retrospective and multi-institutional study, there was heterogeneity in MR image data, including the use of different scanner vendors and imaging parameters. However, in clinical practice, different scanner vendors at different field strengths and different imaging protocols are used daily for tumor diagnosis and surveillance; thus, a predictive model that incorporates such technical variations results in a more practical clinical translation of radiogenomic strategies. A recent study showed that radiomic features varied considerably on T1-weighted images generated by different pulse sequences and parameters. 40 In this study, we chose to retain differences in imaging protocols from 3 different cohorts to assess the robustness of radiomic features extracted from multi-institutional data. To facilitate evaluation of classifiers, after feature extraction, we performed feature-level normalization (z score) across patients to help with the predictive performance of the machine learning classifier. Future studies may need to look into other strategies to overcome the heterogeneity of data, including normalizing the degree of T1-weighting (eg, normalizing images to the signal intensity of the tumor) and weighting the importance of specific features or image sequences (eg, T1 versus T2) as input in a predictive radiogenomic model. 40 In addition, while this study used T1 contrast-enhanced and T2-weighted images for feature discrimination and model development, incorporating additional image sequences, such as diffusion, permeability, or T2* perfusion, could further boost model performance. Despite these challenges, our study showed that radiomic strategies can be used to extract discriminating computational features and create a machine learning-based prediction model for pediatric MB subgroups.

CONCLUSIONS
We present proof-of-concept results for the application of radiomics and machine learning using multi-institutional data for the prediction of distinct MB molecular subgroups. Highthroughput quantitative features were extracted from contrastenhanced T1-and T2-weighted images and linked to 4 core subgroups of MB. Model performance for the prediction of SHH and group 4 was more robust than for WNT and group 3. Future investigations using a larger sample size for all subgroups, particularly WNT (because we had the least amount of WNT cases in this study), is needed to improve classifier training and evaluation during cross-validation. The use of other imaging sequences such as diffusion-weighted, permeability, and T2* perfusion imaging may also yield additional radiomic features and help to improve performance. Computational analyses of MR imaging offer a wealth of opportunities to noninvasively characterize tumors, which can have an important role in the clinical and treatment decision-making processes for pediatric MB.