Prediction of Human Papillomavirus Status and Overall Survival in Patients with Untreated Oropharyngeal Squamous Cell Carcinoma: Development and Validation of CT-Based Radiomics

,

CONCLUSIONS: CT-based radiomics may be useful in predicting human papillomavirus status and overall survival in patients with oropharyngeal squamous cell carcinoma.
ABBREVIATIONS: AUC ¼ area under the curve; HPV ¼ human papillomavirus; OPSCC ¼ oropharyngeal squamous cell carcinoma; OS ¼ overall survival; rad-scores ¼ radiomics scores; TCIA ¼ The Cancer Imaging Archive O ropharyngeal squamous cell carcinoma (OPSCC) is one of the most rapidly growing subtypes of head and neck cancers, primarily due to the increased incidence of human papillomavirus (HPV)-related OPSCC. 1 HPV is a well-established prognostic factor for OPSCC, with a positive HPV status carrying a better prognosis. [2][3][4] This distinction in HPV status has contributed to the recent change in OPSCC staging in the 8th edition of the American Joint Committee on Cancer Staging Manual. [5][6][7] Because HPV status has substantial clinical implications, tissue biopsy from a primary or neck site and a test for p16/HPV are indicated. Although biopsy is generally straightforward in the head and neck, it is conceivable that image-based differentiation of OPSCC based on HPV status could become relevant, especially if it comes with cost savings.
Imaging of head and neck cancer is often difficult due to anatomic complexity, overlapping tissue densities, and artifacts (ie, motion, dental amalgam, and so forth). Thus, describing imaging features can be challenging. Recently, the advent of radiomics in image-based research has allowed the quantification of such complex imaging features into standardized high-throughput data, the subsequent analysis of which could aid clinical decisionmaking. 8 Before radiomics, diffusion-weighted MR imaging indicated that HPV-positive OPSCC had lower apparent diffusion coefficients than its counterpart, 9 a finding that was presumed to be due to low stromal volume 10 and abundant lymphoid cells 11 associated with HPV-positive OPSCC. Previous CT-based radiomics studies reported promise in prognostication, 12 prediction of HPV status, 9 and both prediction and prognostication in head and neck cancers. 13,14 We hypothesized that CT-based radiomics could fulfill both purposes-HPV status and overall survival (OS) prediction. Therefore, the aim of this study was to determine whether pretreatment CT-based radiomics of primary OPSCC could predict the HPV status and OS of patients initially diagnosed with OPSCC.

Patients
This single-center retrospective cohort study was approved by the institutional review board of Seoul St. Mary's Hospital, and the need to obtain informed consent was waived. Between January 2009 and September 2019, one hundred twenty-five patients initially diagnosed with OPSCC were retrospectively reviewed. The inclusion criteria were the following: 1) pathologically confirmed OPSCC, 2) known HPV status, and 3) available pretreatment contrast-enhanced neck CT images. Thirty-nine patients were excluded for the following reasons: 1) primary tumor not visible on CT (n ¼ 8), 2) beam-hardening artifacts hampering appropriate image analysis (n ¼ 26), and 3) other underlying malignancy (n ¼ 3) or distant metastases (n ¼ 2) at the time of OPSCC diagnosis. Finally, a total of 86 eligible patients were selected for analysis. Clinical information, including age at diagnosis, sex, smoking history, cancer staging (American Joint Committee on Cancer Staging Manual, 8th edition), and HPV status were retrieved via the electronic medical records of our institution. OS was defined as the interval from the date of initial diagnosis to the date of death or last documented clinical visit. The cohort was grouped into training and test cohorts (7:3 ratio) by random stratified sampling so that the survival status (dead or alive) in each group was evenly distributed. The review of patients' medical records was finalized on October 2, 2019.

Independent External Validation Dataset
An independent external cohort (n = 137) was retrieved from The Cancer Imaging Archive (TCIA), an open repository of various oncologic images. 15 We specifically selected the head and neck cancer dataset previously published by Aerts et al, 16 who publicly shared CT-segmented primary OPSCC for reproducibility purposes. The patients were subjected to the same selection process as described above. After screening for eligibility criteria, 59 patients were excluded for the following reasons: 1) no OPSCC (n ¼ 49), 2) unknown HPV status (n ¼ 8), and 3) errors in importing segmentation (n ¼ 2). Finally, 78 eligible patients were included as an external validation dataset.

HPV Status Assessment
The formalin-fixed and paraffin-embedded tissues were prepared in 4-mm-thick sections, which were then mounted onto 3-aminopropylmethoxysilane-coated slides. In situ hybridization was processed on an automated BenchMark system (Ventana Medical Systems) via INFORMH HPV III Family 16 Probe (cocktail of HPV subtypes 16,18,31,33,35,39,45,51,52,56, and 66; Ventana Medical Systems) as recommended by the manufacturer. After removal of the paraffin wax, the tissue underwent protease digestion and hybridization with a probe. The probe-target complex was detected by the action of alkaline phosphatase on the 5-bromo-4-chloro-3-indolyl phosphate, which yielded dark blue with a pink counterstain for the HPVnegative cells due to nuclear fast red. The nuclear hybridization signal was evaluated by a head and neck pathologist (Y.-S.L.).

CT Image Acquisition
CT imaging of the neck was acquired using 2 different 128-channel multidetector CT scanners (Somatom Definition AS1 and Somatom Definition; Siemens). The CT protocol included precontrast and contrast-enhanced images after intravenous injection of 100 mL of iodinated contrast agent (60-second delay) (370 mg of iodine/mL, Ultravist, iopromide; Bayer HealthCare) at 2.5 mL/s via an automatic flow power injector (MEDRAD ® Stellant, Bayer, Leverkusen, Germany). The scan parameters for both scanners were the following: x-ray voltage, 120 kV(peak); automatic tube current modulation (CARE Dose, syngo CT; Siemens); rotation time, 1 second; pitch, 0.8; detector collimation, 0.6 mm; matrix size, 512 Â 512 pixels; FOV, 25 cm; and section thickness, 3 mm. The z-axis encompassed the skull base to the aortic arch.

Image Analysis and Radiomics Feature Extraction
All image analyses were performed using syngo.via Frontier software (Siemens). Primary sites of OPSCC were semiautomatically segmented using the "lesion segmentation" function within the software, 17 which yielded a 3D ROI contour; minimal additional adjustment of the ROI was needed to avoid beam-hardening artifacts and other soft tissues surrounding the tumor. 3D ROI segmentation included both solid and necrotic portions. The semiautomatic segmentation required approximately 2 seconds of processing time per patient. A radiologist (Y.C.) with 7 years of experience in head and neck imaging segmented all ROIs; another radiologist (K.-J.A.) with 20 years of experience in head and neck imaging reviewed and confirmed the ROIs, and any discrepancy was resolved by consensus. Both radiologists were blinded to the patients' clinical information during image analysis. Representative images of ROI segmentations are provided in Fig 1. The ROIs were initially resampled to an isometric voxel size of 1 mm. Radiomic features were extracted using PyRadiomicsa publicly maintained platform of radiomic features 18 that were embedded in syngo.via Frontier. Six different classes of features were automatically extracted, yielding a total of 854 features per patient. The detailed information regarding these features is publicly available (https://pyradiomics.readthedocs.io/en/latest/).

Radiomics Feature Selection
For HPV-status prediction, radiomics features were selected via Boruta, which is a random forest-based wrapper algorithm for all-relevant feature selection. 19 Random forest generates an importance measure for each feature, allowing minimal parameter adjustment. In estimating feature importance, we applied the Boruta algorithm repetitively, and irrelevant features were consecutively excluded. The Boruta algorithm reached statistical significance by continuously calculating all possible feature combinations, producing an all-relevant subset of features.
Regarding OS prediction, least absolute shrinkage and selection operator-Cox regression were used for dimensionality reduction of radiomics features. 20 The selected features were weighted by their respective coefficients to calculate radiomics scores (rad-scores) for individual patients, with the median radscore being used to dichotomize patients into high-risk and lowrisk groups. All feature selections were performed in the training cohort to remove the possibility of overfitting.

Statistical Analysis
For HPV-status prediction, the Boruta-selected radiomic features from the training cohort were fitted into generalized linear models to plot receiver operating characteristic curves with calculation of the area under the curve (AUC). The same features were applied in the test and TCIA cohorts for internal and external validation, respectively, to calculate their respective AUCs. The optimal cutoff AJNR Am J Neuroradiol : 2020 www.ajnr.org threshold yielding the best AUC was obtained via the Youden index. Receiver operating characteristic curves with AUCs and 95% CIs were generated with 2000-times stratified bootstrapping. Comparisons among AUCs were performed using the DeLong test.
For OS prediction, a univariate Cox proportional hazard analysis was performed using variables including HPV status, smoking history, age (65 years of age or older), sex, T-stage (I/II versus III/IV), N-stage (0/I versus II/III), and rad-score (high-risk versus low-risk). Kaplan-Meier survival curves were plotted for each of these variables. Logrank tests were used to compare the survival curves. Moreover, a nomogram for 2-and 5-year OS probabilities was constructed using the HPV status, T-stage, N-stage, and rad-score.
Multivariable Cox proportional hazards analyses were also performed to build 3 Cox regression models based on cancer stage (ie, T-and Nstage) alone (Cox model 1), cancer stage combined with the HPV status (Cox model 2), and cancer stage combined with both the HPV status and rad-score (Cox model 3). Harrell concordance indices were calculated for each model with likelihood ratio tests. Cox model 3 was both internally and externally validated in the test and TCIA cohorts to calculate the respective concordance indices. Finally, prediction error curves over survival times were plotted for the institutional and TCIA cohorts with an integrated Brier score, which ranges from 0 (perfect model) to 0.25 (meaningless model). All statistical analyses were performed using R statistical software (Version 3.5.1; http://www.r-project.org/) with "glmnet," "survival," "pROC," "pec," "caret" (for random stratified sampling), and "rms" (for nomogram) packages. The statistical significance was set at P , .05.

Patients
A flow diagram of the patient selection process is provided in Fig 2. The clinical characteristics of the study population are summarized in Table 1

Relevant Radiomics Features
For each patient in the training cohort, 854 features from 6 different classes (first-order statistics, shape-based, graylevel cooccurrence matrix, gray-level run-length matrix, gray-level size-zone matrix, neighboring gray-tone difference matrix, and gray-level dependence matrix) were extracted. Among these, the Boruta algorithm selected 9 relevant features (2 shape-based, 2 first-order, 4 gray-level co-occurrence matrices, and 1 gray-level size-zone matrix) for the prediction of HPV status ( Table 2). As for OS prediction, least absolute shrinkage and selection operator-Cox regression yielded 3 radiomic features, including 1 shape-based and 2 firstorder features. From these, the radscore was calculated using the follow-  ing formula with Cox proportional hazard weights given to each feature as previously described: 12 0.32 Â Original_Shape_SphericalDisproportion 1 (À2.363 -Â 10 À3 ) Â Original_Firstorder_Minimum 1 (À1.753 Â 10 À5 ) Â Original_Firstorder_10Percentile.

Performances of Radiomics Features in HPV Status Prediction
The generalized linear models fitted with 9 radiomics features in the training cohort yielded an AUC of 0.865 (95% CI, 0.777-0.953) with a sensitivity and specificity of 76.3% and 91.3%, respectively ( Table 3). The internal validation, which was performed on a separate test set (30% of the initial cohort), yielded an AUC of 0.747 (95% CI, 0.533-0.961) with a sensitivity and specificity of 100% and 60%, respectively. The external validation yielded an AUC of 0.834 (95% CI, 0.738-0.930) with a sensitivity and specificity of 82.6% and 80%, respectively. Comparison of the AUCs revealed no significant differences (training versus test, P ¼ .324; training versus TCIA, P ¼ .643; test versus TCIA, P ¼ .471) (Fig 3).

Performance of Radiomics Features in OS Prediction
In the univariate Cox analysis, the HPV status (positive: hazard ratio, 0.27; 95% CI, 0.09-0.7; P ¼ .008), T-stage ($III: hazard ratio, 3.66, 95% CI, 1.34-9.99; P = .01), and rad-score (high-risk: hazard ratio, 3.72; 95% CI, 1.21-11.46; P ¼ .02) were found to be strongly associated with OS ( Table 4)   the external validation of Cox model 3 onto the TCIA cohort resulted in a reduction in the prediction error rates (ie, TCIA cohort curve remaining below the training cohort curve [ Fig  5]). Similarly, the integrated Brier score was lower in the TCIA cohort than in the training cohort for all OS periods (Table 5).

DISCUSSION
In the current study, quantitative pretreatment CT-based radiomics features were extracted from patients with primary OPSCC, and the subsequent analysis of these features yielded robust performance in the prediction of HPV status and OS. The results were reproduced reliably in both internal and external validation sets. The findings of this study provide preliminary evidence that pretreatment CT-based radiomics could potentially aid both HPV-status prediction and prognostication for patients with OPSCC. Several studies have considered the implications of radiomics in head and neck cancers. The present data are consistent with prior similar studies in that radiomics demonstrated additive prognostic 13,14 and predictive value for HPV status. 13,21 The studies performed by Ou et al 13 and Bogowicz et al 14 revealed that CT-based radiomics provided prognostic value in locally advanced head and neck cancers. . However, the current study differs from theirs in that our target population was composed of untreated patients with OPSCC of various stages, which allowed more comprehensive risk assessment. Furthermore, our focused analysis on OPSCC minimized the potential heterogeneity that might be associated with various subtypes of head and neck cancers.
In multivariate Cox proportional hazards analysis, the addition of HPV status to the cancer stage increased the prognostic performance of the survival model. Likewise, the addition of the rad-score to the survival model fitted with HPV status and cancer stage further increased the prognostic performance. This finding suggests that radscore-when combined with HPV status and cancer stage-may provide an incremental benefit for prognostication of patients.
An interesting additional finding in this study was the good performance observed in the external validation set, which was better than that of the internal validation set. Compared with the institutional cohort, the external validation set represented a regionally different cohort with varying clinical variables. Regardless, the CT-based radiomics models trained with the institutional cohort exhibited comparable AUCs in HPVstatus prediction and better OS prediction in the external validation set. Therefore, a possible interpretation of this finding is that CTbased radiomics could be reproduced in multi-institutional cohorts.
As for the radiomics features, the most significant feature in OS prediction was the shape feature (spherical disproportion). This finding is consistent with a previous study by Leijenaar et al, 12 who also found a shape feature as the most significant prognostic marker among other features associated with survival of patients with OPSCC. Additionally, a previous study found that primary head and neck cancers displaying asphericity on pretherapeutic [ 18 F] fluorodeoxyglucose uptake appeared to be associated with a poorer prognosis. 22 Taken together, these findings suggest that the spatial heterogeneity of tumors might be associated with patients' prognosis. Future research should investigate the radiologic-pathologic correlation with regard to the spatial heterogeneity of the tumor.
The strength of the current study lies in its robust validations using internal and external datasets. While most previous radiomics studies of head and neck cancers used only internal validations 14 or lacked validations, 23,24 we used both internal and external validations to minimize the potential overfitting of radiomics models. In addition, the primary OPSCC images were segmented in 3D rather than in circular ROIs on single axial slices. These volumetric segmentations encompassed the entire oropharyngeal squamous cell carcinoma (OPSCC) and, therefore, better reflected the true intratumoral environment.
There are several limitations to this study. First, selection bias might have been present due to the retrospective nature of the study. However, we attempted to minimize the potential selection bias by strictly adhering to the predefined selection criteria.  Second, the issue of reproducibility is ongoing in most radiomics studies. 25 Thus, the open-platform PyRadiomics was adopted as our primary source of radiomics features. Furthermore, the external validation set used in this study was acquired from the TCIA, the public repository of oncologic medical images, thereby increasing the chance of reproducibility of our results. Some of the variables (ie, OS, treatment received, HPV status) differed between our cohort and the TCIA cohort; however, the differences would not have affected the outcome because the Cox models were built from the training cohort and the TCIA was used solely for external validation.

CONCLUSIONS
The current study highlighted the feasibility of CT-based radiomics in HPV status and OS prediction for patients with OPSCC. The results were good and reliably validated in both internal and external datasets. Our findings add preliminary evidence to the potential value of CT-based radiomics in risk-stratification and treatment allocation for patients with OPSCC.