Radiomics in Brain Tumor: Image Assessment, Quantitative Feature Descriptors, and Machine-Learning Approaches

SUMMARY: Radiomics describes a broad set of computational methods that extract quantitative features from radiographic images. The resulting features can be used to inform imaging diagnosis, prognosis, and therapy response in oncology. However, major challenges remain for methodologic developments to optimize feature extraction and provide rapid information flow in clinical settings. Equally important, to be clinically useful, predictive radiomic properties must be clearly linked to meaningful biologic characteristics and qualitative imaging properties familiar to radiologists. Here we use a cross-disciplinary approach to highlight studies in radiomics. We review brain tumor radiologic studies (eg, imaging interpretation) through computational models (eg, computer vision and machine learning) that provide novel clinical insights. We outline current quantitative image feature extraction and prediction strategies with different levels of available clinical classes for supporting clinical decision-making. We further discuss machine-learning challenges and data opportunities to advance radiomic studies.

but most radiologic data are reported in qualitative and subjective terms. Radiomics 1,2 in neuro-oncology seeks to improve the understanding of the biology and treatment in brain tumors by extracting quantitative features from clinical imaging arrays. These data can then be "mined" with machine-learning methods and validated as quantitative imaging biomarkers 3 to characterize intratumoral dynamics throughout the course of treatment. The recent growth of cancer imaging analytic methods 4-6 has produced novel insights into early indicators of treatment response, risk factors, and subsequent tailoring of optimal treatment strategies. 2,5,7,8 Image-based computational models are, thus, becom-ing an important enabling technology that permits identification, analysis, and validation of extracted quantitative features. In this review, we discuss available methodologies in radiomics that can be used as predictive markers for diagnosis, prognosis, and therapeutic planning in the context of adult brain tumors. We will also address the interpretive challenges that emerge from the computationally based data generated by radiomic methods. While statistical correlations between computational features and clinical outcomes exist, this approach will likely not gain wide clinical acceptance until there is a better link between the quantitative metric and traditional imaging features and the underlying biology.
Radiomics incorporates several important disciplines, including radiology (eg, imaging interpretation), computer vision (eg, quantitative feature extraction), and machine learning (eg, classifier evaluation). A central goal is the identification of quantitative imaging indicators that predict important clinical outcomes, including prognosis and response or resistance to a specific cancer treatment. Here, we discuss recent studies in the development of radiomics with the following goals: 1) understanding the functionality of clinical imaging as a necessary prerequisite for developing radiomic models; 2) extracting quantitative image features extraction in computer vision that can be used to exploit tumor imaging traits; 3) identifying radiomic signatures shown to be surrogate markers of underlying molecular properties of tumors, enabling a noninvasive means to characterize biologic activities of cancer 9 ; and 4) performing predictive analysis with machinelearning techniques to classify clinical outcomes and assessing the physiologic status of cancer. 10 Through this convergence of radiology, computer vision, and machine-learning techniques, radiomics provides a mechanism for multidisciplinary research on brain tumors.

Clinical MR Imaging Assessment of Brain Tumors
MR imaging permits noninvasive characterization of mesoscopic features (ie, the "radiologic phenotype") of brain tumors and is an indispensable tool for early tumor detection, monitoring, and diagnosis. 11 Radiomic analysis is built on the central hypothesis that tumor imaging reflects the underlying morphology and dynamics of smaller-scale biologic phenomena, including gene expression patterns, tumor cell proliferation, and blood vessel formation. 12 MR imaging plays an essential role in the management of patients with glioblastoma for 3 important reasons. First, MR imaging has an excellent capacity for the detection of soft-tissue contrast by providing superior anatomic information (eg, spatial location). Second, different MR imaging sequences can be sensitive to key components of tumor physiology, such as blood flow and cellular density, and can distinguish regions of the tumor that contain different environments (eg, variations in blood flow) that are likely to affect local cellular phenotypes and genotypes. Third, MR imaging can noninvasively and nondestructively interrogate the tumor repeatedly to assess response to treatment and can, therefore, be integrated into therapeutic strategies. Understanding these image-based features is critical because they represent a key data resource in radiomic analysis. 1 Contrast enhancement in MR imaging using gadoliniumbased contrast agents is an important and useful feature in evaluating brain tumors. 13 The tumor zone that enhances following gadolinium injection typically defines the tumor region that is well-perfused with high tumor cell density but also one in which there is a breakdown of the blood-brain barrier. Compared with noncontrast imaging, contrast-enhanced images are often used to provide a delineation of gross tumor margins and allow earlier detection of additional small metastatic lesions. In general, tumor sizes based on these images are used for monitoring tumor response to therapy. 13 Thus, radiomic models for brain tumor analysis [14][15][16] often focus on contrast-enhanced sequences.
Spatial heterogeneity of brain tumors is well-recognized in MR imaging. Different MR imaging sequences exploit various biomedical properties of brain tumors more effectively than other imaging modalities (eg, CT can only show differences in electron density). Postgadolinium T1-weighted images can show enhancing regions (characterized as T1-shortening or T1 high signal) within the tumor due to gadolinium leakage from the intravascular space into the tumor because of a disrupted blood-brain barrier. Consequently, necrosis and solid tumors can be visually distinguished. In addition, T2-weighted sequences are sensitive to water tissue content and can be used to estimate cellular density and the presence of edema. Next, fluid-attenuated inversion-recovery sequences are frequently used in conjunction with T2weighted images to provide a better distinction between edema and solid tumor. 17 In addition, diffusion-weighted imaging al-lows characterization of tissue cellularity based on the free diffusion of water molecules along structural tissue pathways in different tissue types. 18 Advanced MR imaging methods, including perfusion, proton density-weighted, fast spin-echo, and short tau inversion-recovery imaging, have also been applied to depict specific tissue contrast. 19 Therefore, mining radiomic data from these advanced imaging arrays will likely offer additional information with respect to tissue discrimination, treatment measurement, and clinical usefulness.

Quantitative Image Feature Extraction
While the quality, resolution, and flexibility of MR imaging technology has greatly increased in past decades, the interpretation of images remains largely descriptive, subjective, and nonquantitative. Thus, the central goal of radiomics is the development of image analytic techniques that can reproducibly extract objective, quantitative data from MR imaging scans. Linking these quantitative features and underlying tissue dynamics that govern tumor growth and response to therapy has the potential to rapidly expand the scope of cancer imaging research. 1 Here, we focus on many related computer vision techniques that are particularly useful in quantitative cancer imaging science.

Computational Image Descriptors
Radiomics relies on computational techniques in computer vision to extract many quantitative features from radiologic images. 20 The extracted quantitative features are typically within a defined ROI that could include the whole tumor or specific regions within it. Computational image descriptors quantify visual characteristics at different scales from ROIs, which can be readily translated into radiologic image analysis pertaining to tumor volumetric shapes and visual appearance dynamics. For example, the scale-invariant feature transform (SIFT) 21,22 is computed through key point detection using a difference of Gaussian function and local image gradient measurement with radius and scale selections (as illustrated in Fig  1). This permits a quantitative measurement of the tumor shape so that subtle variations during treatment (ie, increasingly round or increasingly elliptic) can be observed and quantified. Several recent studies have demonstrated the accuracy and reproducibility of computational image extraction approaches to capture characteristics of tumor shape and texture information from brain tumor MR imaging. 23-25 Thus, these approaches have the potential for large-scale, rapid throughput and reproducible evaluation and may be applied to routine clinical imaging studies that are widely available.
Here, we describe 2 primary image feature extraction strategies with local-or global-level computations in the context of computer vision. First, local-level feature extraction provides an image descriptor used to compare a pixel being tested with its immediate pixel neighborhood. 26 This allows identification of a small, but biologically important, tumor niche area (a small number of pixels) within an otherwise homogeneous, larger tumor region. This can be achieved, for example, with local binary patterns (LBP). 27 These are local image descriptors sensitive to small monotonic gray-level differences 28 that may not be apparent to a human observer. In contrast, global-level feature extraction em-phasizes the quantification of the overall composition of an entire ROI. For example, a computational descriptor 29 was designed to develop a low-dimensional representation of the image, emphasizing spatial structure variations (eg, roughness, openness, and expansion). In addition, high-order statistical features, known as texture features, 30 have been widely applied to brain cancer and other cancer imaging analyses. 14,31 Examples of texture features include gray-level co-occurrence matrices 32 and gray-level size zone matrices, 33 which examine the spatial relationships of pixels through a series of statistical measures. Histogram of oriented gradients (HOG) 34 features have also proved to be efficient feature descriptors for quantifying image-gradient statistics with multiple directions not obvious to radiologists. A recent study 35 suggested that co-occurring gradients in MR imaging were useful for distinguishing brain tumor subtypes.
Despite these advances of computational image descriptors, they may be suboptimal because feature extraction inherently distills a complex dataset of more than a million voxels per MR imaging sequence into a handful of numeric descriptors. To identify a strong radiomic feature, one needs 2 important factors. First, the proposed descriptor must be able to capture distinctive patterns correlated with the clinical outcomes of patients. Moreover, the descriptor must be stable under various image-acquisition parameters. Although MR imaging signals exhibit tumor geometric shapes, appearances, and voxelwise variations with underlying biologic characteristics at molecular, tissue, and organ levels, 12 the potential dynamics and temporal variations in blood flow increase the difficulty in acquiring useful radiomic features. Thus, test-retest and interobserver stability 36 are strongly suggested for measuring robust computational image features in radiomic studies. Figure 1 highlights several hand-crafted computational descriptors that capture different visual characteristics of brain tumor MR imaging. Despite the promise of computational image descriptors, the underlying biologic meanings of these features have not been fully exploited, with links to promising therapies and outcome prediction of patients.

Biologically Inspired Feature Descriptor
Biologically inspired feature descriptors build on specific biologic hypotheses that transfer the recognized radiology knowledge into quantitative representation, as opposed to pure computational approaches for feature extraction. Understanding disease characteristics is necessary to propose biologically inspired features because they can be disease-specific. For example, a recent study 37 suggested that MR imaging-derived pharmacokinetic features (eg, extracellular space per unit volume of tissue) were potential biomarkers for separating outcomes of treatment with concurrent radiation therapy and chemotherapy.
Biologically inspired MR imaging features can be used to define organ-level tumor data variation and distribution, offering an opportunity to observe spatial variations and temporal evolution of tumors. 12 For example, a spatial distance measurement 38 was defined to quantitatively explore brain tumor heterogeneity. The proposed spatial distances suggested that the variations among biologically defined tumor subregions can reflect distinct prognostic information. Also, early temporal changes and spatial heterogeneity during radiation therapy in heterogeneous regions of high and low perfusion in gliomas might predict different physiologic responses to radiation therapy. 39 Other work 16 has proposed a novel concept of imaging habitats that quantifies distinctive tumor subregions by their local contrast enhancement, edema, and cellularity in MR imaging. Moreover, a recent study measured the relationships between MR signal and cell density using radiographically localized biopsies, 40 revealing that T2-FLAIR and ADC sequences were inversely correlated with cell density. The Table highlights  Measure tumor spatial characteristics biomedical mechanisms. 42 These biologically inspired features are quantitative rather than qualitative semantic features annotated by radiologists to describe the tumor environment. 43 The VA-SARI semantic feature set (https://wiki.cancerimagingarchive. net/display/Public/VASARIϩResearchϩProject), for example, is used to describe the morphology of brain tumors (eg, tumor location, shape, and geometric properties) on contrast-enhanced MR imaging. 7

Imaging and Genomics in Glioblastoma
Imaging genomics, also known as radiogenomics, is a growing field that studies the association between imaging biomarkers and genomic characteristics of a disease. 7,44,45 Inherent in this definition is the goal of enabling noninvasive imaging assessment as a surrogate for molecular signatures that were previously only available through molecular testing. A handful of studies 7,44,45 identified associations between quantitative image features and gene expression profiles of glioblastoma (eg, TP53, EGFR, NF1, and IDH1) and its molecular subtypes (eg, classical, mesenchymal, proneural, and neural). Additional studies indicated that quantitative MR imaging features derived from entire tumor volumes can be used to identify glioblastoma subtypes with distinct molecular pathway activities. 15,46 The value of such whole-tumor comparisons is limited by the spatial variations in both the imaging features and molecular tumor properties of tumor cells in, for example, glioblastoma multiforme. However, this extensive intratumoral heterogeneity also provides a compelling research opportunity if spatial characteristics in MR imaging, largely governed by mesoscopic tumor properties (eg, blood flow and cell density), could be used to define the spatial distribution of glioblastoma molecular subtypes within the same tumor. 47 To better define imaging-to-genomic relationships, the development of subregional imaging analysis enabling spatial characterization is biologically valuable (Fig 2) because it provides a means to characterize molecular variations in the spatially distinct tumor fragments.

Machine Learning in Radiomics
Machine learning offers an approach for discovering predictive radiomic features. Here, the investigator does not begin with an a priori biologic hypothesis. Thus, the parameter space is searched for an imaging feature statistically associated with clinical outcome. Before one evaluates machine-learning models, a specification for the medical diagnostic task is needed so that models can be appropriately trained. For example, supervised, unsupervised, and semisupervised learning models are fundamental learning strategies used in accordance with the different levels of available clinical outcome labels. In supervised learning, the goal is to learn from a certain portion of trained samples with known class labels and to predict classes or numeric values for unknown patterns from large and noisy datasets. 47 Conversely, unsupervised learning finds the natural structure from data without having any prior labels. As a hybrid setting, semisupervised learning needs only a small portion of labeled training data. The unlabeled data samples, instead of being discarded, are also used in the learning process. More recently, the rise of deep learning as a new frontier in machine learning has advanced large-scale medical image analysis. 48 We describe these learning strategies and highlight specific clinical applications in the context of brain tumors.

Supervised Learning
Supervised learning is a primary learning scheme that has been applied in recent radiomic studies. 16,23 Supervised learning is conceptually divided into 2 phases. First, training samples with available class labels are used to build a classifier by finding a set of parameters to define a decision boundary among classes. Second, the learned classifier is used to predict class labels of unknown testing samples. Notably, the selection of classifiers depends on the desired properties of the classifier, including convergence and modeling assumptions. 49 An example of this approach is a study 50 that showed the selection of machine-learning classifiers for supervised radiomic applications in detecting radiomic biomarkers. More examples include tumor-subtype classification and survival Linking subregional imaging to molecular profiles in glioblastoma. In this example, tumor subregions (B) are defined by jointly clustering on contrast-enhanced T1WI and T2WI (A). These subregions correspond to red (high T1WI and high T2WI), yellow (high T1WI and low T2WI), blue (low T1WI and high T2WI), and pink (low T1WI and low T2WI) areas. The defined tumor subregions enable quantitative spatial characterization, offering a means to noninvasively assess specific molecular activities (C) with enriched molecular pathways (D).
time prediction, and these are discussed in clinical applications below.

Unsupervised Learning
Without knowing any prior labels of data, unsupervised learning algorithms group the data according to their similarity. For example, a knowledge-based unsupervised fuzzy clustering approach 51 was proposed to automate brain tumor segmentation. It showed that tumor and healthy intracranial regions could be grouped using this clustering algorithm through a rule-based expert system. The growth of clinical imaging results in the onerous task of manually annotating tumors on imaging volumes. Therefore, there has been growing interest in exploring scalable algorithms for annotating large volumes of tumor imaging data 15 and evaluating human interrater variability. 52

Semisupervised Learning
Semisupervised learning is designed specifically for tasks for which it is difficult to obtain class labels for certain patients (eg, the estimation of tumor progression). In other words, the advance of semisupervised learning overcomes a limitation of conventional supervised learning that is incapable of using data with missing labels in training. Semisupervised models have a great potential to effectively enable predictive analysis with uncompleted clinical labels in training. For example, 1 study 53 predicted states of brain tumor prognosis using a semisupervised learning model. With Ͻ26% of the available staging labels, a discriminative analysis was performed using the staging labels of patients with glioblastoma.

Deep Learning
Recently, deep learning 54,55 has emerged as a powerful technique that defines a network architecture, concatenating multiple neural-like processing layers with multiple levels of abstraction. Deep learning methods have achieved record-breaking performances for numerous computer vision applications when the number of available training samples is sufficiently large. 56 Convolutional neural networks, 57 for example, are deep learning models that incorporate concatenated convolutional layers and pooling layers, followed by fully connected layers to learn the high-level representation of input data. A growing set of studies has shown superior results in the field of medical image analysis by applying convolutional neural networks models. 48,58-60 A recent study 61 specifically introduced a convolutional neural networks-based approach for brain tumor segmentation. With the designed network architecture containing multiple small 3 ϫ 3 kernels and deep layers, the model achieved strong segmentation performance (DICE score ϭ 0.88) on the Brain Tumor Segmentation Challenge 2013 data base (http://braintumorsegmentation.org/).

Specific Clinical Applications of Radiomics
Survival Time Prediction. Predicting prognostic performance of glioblastomas using clinical imaging alone is challenging due to the radiographic heterogeneity of tumors. Stratification of clinical groups will directly impact image-guided diagnosis and targeted treatment. 16 In contrast to qualitative assessment (enhancing tumor, edema, and multifocal lesions) 62 made by radiologists, a recent study 5 introduced quantitative spatial imaging biomarkers to predict survival time for patients with glioblastomas. Such work requires a labeled training set with survival information. It suggested that contrast information gained from co-occurring subregions on FLAIR and T2-weighted images was useful to separate long-term (Ͼ400 days) and short-term (Ͻ400 days) survival groups .
Classification of Glioblastoma Subtypes. An example of the classification of brain tumor histologic groups 25 demonstrated the usefulness of radiomic features to rapidly classify gliomas, metastases, and high-(grade III and grade IV) and low-grade (grade II) neoplasms. The quantitative image analysis collected brain tumor identification, raw intensity values, and MR imaging-based texture features to demonstrate the capability of multiparametric MR imaging features to separate histologic classes of brain tumors. In addition, a recent study 45 showed the possibility of classifying molecular subtypes of glioblastomas using radiomic features alone. In this approach, histograms of MR imaging intensity in enhanced tumors were identified to provide added value for predicting molecular subtypes of glioblastoma. 45 Tumor-Tissue Discriminative Analysis. Differentiating radiation necrosis from tumor recurrence is difficult in patients with glioblastoma undergoing chemoradiation because traditional qualitative interpretation of conventional contrast-enhanced MR imaging is unlikely to differentiate chemoradiation-induced necrosis and pseudoprogression. 63 By contrast, quantitative MR imaging features 64 were shown to differentiate radiation necrosis from recurrent tumor. Results revealed that a set of quantitative intensity features obtained from multiple MR imaging arrays (eg, T1, T2, relative CBF, and ADC) was useful for detecting radiation necrosis tissue in resected patients with glioblastoma multiforme undergoing chemoradiation.

Research Opportunities and Challenges
Substantial progress has been made from recent radiomic studies to deepen our understanding of imaging characteristics in cancer. 1,2 The identified quantitative features can be used to alert radiologists to suspicious abnormalities because radiomic models can even capture subtle variations in tumor environment not easily perceived by human experts. We highlighted the convergence of quantitative image feature extraction and machine-learning techniques for supporting diagnosis, prognosis, and treatment predictions. Next, we discuss challenges and opportunities that are not only related to brain tumor MR imaging but also applicable to other types of cancer studies in radiomics.

Development of Treatment-Specific Radiomics
There is a growing need to develop radiomic signatures that can be used to directly inform specific treatment options. With diffusion-weighted MR imaging, emerging evidence revealed that ADC maps were useful to differentiate treatment outcomes in glioblastoma with radiation therapy concurrent with temozolomide. 65 Also, early changes of ADC maps were identified for predicting glioblastoma recurrence. 66 In addition, a decrease in the volume of FLAIR signal and contrast enhancement 67 was shown to predict the response to bevacizumab treatment for patients with glioblastomas. To better understand the biologic implications of these imaging findings, the collection of temporal imaging at multiple diagnostic periods will offer an opportunity to consistently characterize biologic tumor evolution before and after treatment. 68 Currently, distinguishing tumor response between pseudoprogression and pseudoresponse continues to be challenging in tumor MR imaging 69 ; thus, development of radiomic features to contribute to treatment outcome analysis will be appealing. Although the type of radiomic features that will eventually be validated as true predictors responsive to different treatments is still to be determined, the vast quantity of radiomic findings, in conjunction with the growth of molecular data, will increase the possibility of a redefinition of tumor subtypes and novel biomarker discovery to inform treatment decisions in the coming years.

Machine Learning
Extracting large-scale radiomic features from a variety of imaging arrays creates a rich data base containing clinically relevant information. Both computational and biologically inspired feature descriptors are important and useful for machine learning. Thus, development of scalable machine-learning techniques is vital to search for and identify useful image features associated with outcome variables and clinical records. We discuss 2 representative techniques to address this need. First, a suggested approach is to investigate sparse-learning models, also known as lasso regularization, 70 for finding a useful set of reduced features from a highdimensional feature vector. Emerging evidence suggests that sparse learning is useful for identifying multiparametric prognostic imaging biomarkers in non-small cell lung cancer. 71 Second, recent breakthroughs in deep learning with applications in radiology, such as lung nodule malignancy classification 58,59 and lymph node detection, 72 have been encouraging in finding disease-specific imaging biomarkers. However, the availability of labeled medical data poses a challenge for developing efficient deep-learning models. For example, cancer images with pathologically proved labels are costly to collect at scale; thus, data integration with different types of clinical labels may present an alternative means to overcome such obstacles for deep-learning models, which need large datasets as input. 73 Although how we gain clinical insight from these deep-learning outcomes and how we optimize network architectures for the better use of multiscale medical data (eg, serial MR imaging, genomics, and clinical data) are uncertain, 74 the extraction of compact patterns via hierarchical networks presents enormous opportunities for large-scale radiomic applications (Fig 3).

The Role of Radiomics in Big Data
Radiomic studies follow the theme of emerging big data in health care that concerns mining vast quantities of biomedical data. 75,76 A typical concern of radiomics in big data relates to the management of growing image data, gene expression profiles, and the associated clinical records. Multiple data sources (eg, medical institutions) and various data types (eg, multitechnique imaging data) make data sharing and collective administration an especially complex problem. Enabling data standardization across different image protocols and parameters becomes a prerequisite for collective study. The pilot project of the Cancer Genome Atlas 77 is an initiative to provide a large portion of clinical data and generic information. As a parallel effort, The Cancer Imaging Archive 4 is growing rapidly for sharing radiologic images. Also, the Quantitative Imaging Network 78 is designed to promote efficient data integration that helps validate imaging biomarkers that are impactful in treating cancer. More recently, MR fingerprinting 79 was introduced to advance the role of quantitative MR imaging using pseudorandomized acquisition parameters. However, large-scale, high-quality benchmark datasets, including complete clinical labels, standard radiomic features, and molecular profiles, are not widely available for data sharing, experimental evaluation, and reproducibility of radiomics toward precision medicine. 80,81

CONCLUSIONS
The rapid discovery rate of novel imaging biomarkers in radiomics allows integrating information from interdisciplinary approaches in radiology, computer vision, and machine learning. With the growth of clinical imaging data, novel image-based computational models are playing an increasingly important role for precise diagnosis and treatment guidance in neuro-oncology. Only when these models align well with tumor biology will radiomic findings maximize their likelihood for clinical utility. Research challenges remain for exploring cancer heterogeneity, scalable computational models, and the clinical significance of radiomic findings. We believe that the newly emerging diagnostic hypotheses and scalable machine-learning algorithms have the potential for enhancing the current performance of predictive cancer diagnosis and accelerating quantitative cancer imaging findings to reach their true clinical potential.