Iodine Maps from Dual-Energy CT to Predict Extrathyroidal Extension and Recurrence in Papillary Thyroid Cancer Based on a Radiomics Approach

BACKGROUND AND PURPOSE: Accurate prediction of extrathyroidal extension and subsequent recurrence is crucial in papillary thyroid cancer clinical management. Our aim was to conduct iodine map – based radiomics to predict extrathyroidal extension and to explore its prognostic value for recurrence-free survival in papillary thyroid cancer. MATERIALS AND METHODS: A total of 452 patients with papillary thyroid cancer were retrospectively recruited between June 2017 and June 2020. Radiomics features were extracted from noncontrast images, dual-phase mixed images, and iodine maps, respectively. Random forest and least absolute shrinkage and selection operator (LASSO) were applied to build 6 radiomics scores (noncontrast radiomics scor-e_random forest; noncontrast rad-score_LASSO; mixed rad-score_random forest; mixed rad-score_LASSO; iodine radiomics score_random forest; iodine radiomics score_LASSO) respectively. Logistic regression was used to construct 6 radiomics models incorporating 6 radiomics scores with clinical risk factors and to compare them with the clinical model. A radiomics model that achieved the highest performance was presented as a nomogram and assessed by discrimination, calibration, clinical usefulness, and prognosis evaluation. RESULTS: Iodine radiomics scores performed signi ﬁ cantly better than mixed radiomics scores. Both of them outperformed noncontrast radiomics scores. Iodine map – based radiomics models signi ﬁ cantly surpassed the clinical model. A radiomics nomogram incorporating size, capsule contact, and iodine radiomics score_random forest was built with the highest performance (training set, area under the curve =0.78; validation set, area under the curve = 0.84). Strati ﬁ ed analysis con ﬁ rmed the nomogram stability, especially in group negative for CT-reported extrathyroidal extension (area under the curve = 0.69). Nomogram-predicted extrathyroidal extension risk was an independent predictor of recurrence-free survival. A high risk for extrathyroidal extension portended signi ﬁ cantly lower recurrence-free survival than low risk ( P , .001). CONCLUSIONS: Iodine map – based radiomics might be a supporting tool for predicting extrathyroidal extension and subsequent recurrence risk in patients with papillary thyroid cancer, thus facilitating clinical decision-making.

A lthough papillary thyroid cancer (PTC) has a favorable prognosis, it is prone to local-regional recurrence. 1 A dynamic risk-stratification system can predict postoperative recurrence and determinate the clinical treatment plan in patients with PTC. 1 Extrathyroidal extension (ETE) is considered an adverse prognostic factor of PTC and plays an important role in the risk-stratification system. [1][2][3] Patients with PTC with ETE are recommended for a more aggressive initial therapy, likely total thyroidectomy and intensive follow-up. [1][2][3] Therefore, a preoperative diagnosis of ETE is essential in the clinical management of PTC.
Some aggressive signs in conventional ultrasound and CT are used to assess ETE in clinical practice; however, these signs are subjective, and the sensitivity is relatively low due to that micro ETE can only be determined by tumor histopathological examination. [4][5][6] Because of the difficulty in ETE clinical diagnosis, some previous studies have attempted to use clinical risk factors and tumor morphologic features to predict ETE. However, the results of these studies were inconsistent, and morphologic features are subjective. 7,8 Radiomics is a rapidly evolving field that quantifies highthroughput features from medical images and is useful in cancer scanning, diagnosis, and prognosis evaluation. 9,10 Concerning radiomics applications in PTC, some studies have verified their value in evaluating tumor aggressiveness, predicting BRAF gene status, and diagnosing or predicting cervical lymph node metastasis. [11][12][13][14] For preoperative prediction of ETE, Chen et al 15 established a CT-based radiomics nomogram and confirmed its predictive efficiency.
Compared with conventional CT, dual-energy CT (DECT) allows material decomposition and offers iodine maps that directly quantify the content and portray the distribution of iodine. 16,17 As is well known, the thyroid gland is the primary source of iodine storage in the human body, and PTC destructs thyroid follicles leading to a decrease in iodine concentration to various degrees. 1 Therefore, iodine uptake of PTC in dual-phase iodine maps is affected by both the thyroidal parenchyma and external iodine (contrast media), which correlate with tumor perfusion. In addition, some previous studies have confirmed that quantitative iodine concentration is effective in differentiating thyroid malignancy and diagnosing metastatic lymph nodes from PTC. [18][19][20] Radiomics of iodine maps can be used to further assess the heterogeneity of iodine uptake. 21,22 In previous studies, iodine map-based radiomics has proved useful in diagnosing and predicting lymph node metastasis from PTC, 22,23 indicating that iodine map-based radiomics is promising in PTC investigation. However, no previous study has used iodine map-based radiomics to predict ETE and further correlate with prognosis in PTC, leaving a gap in the current knowledge.
Therefore, in the current study, we evaluated the value of dual-phase iodine map-based radiomics for predicting ETE and recurrence-free survival (RFS) in patients with PTC.

Study Population
Ethics approval was obtained for this retrospective study from The First Affiliated Hospital of Nanjing Medical University institutional review board, and the requirement of written informed consent was waived. A total of 858 consecutive patients suspicious for PTC who underwent DECT for preoperative assessment from June 2017 to June 2020 were retrospectively recruited. The inclusion and exclusion criteria are listed in the Online Supplemental Data. The final study cohort included 452 patients, divided into a training set (317 patients) and a validation set (135 patients) according to the time of the operation. Detailed demographic characteristics are summarized in the Online Supplemental Data. A total of 84 and 272 patients with PTC were included in our previous studies, which conducted radiomics analysis for diagnosing and predicting lymph metastasis from PTC, respectively. 22,23 Taking the final histopathologic reports of tumor specimens as a criterion standard, we divided the patients into groups without ETE (training cohort, n = 195; validation cohort, n = 73) and with ETE (training cohort, n = 122; validation cohort, n = 62).

Postoperative Follow-up and Recurrence
A total of 245 patients who underwent an operation from June 2017 to June 2019 were routinely followed up every 3-6 months in the first year and annually thereafter. The follow-up protocol is summarized in the Online Supplemental Data.
Recurrence was defined as clinical evidence (biochemical, structural, and functional) of new disease after the initial operation. Indicators of biochemical recurrence were suppressed thyroglobulin, $1 ng/mL, stimulated thyroglobulin, $2 ng/mL, and a detectable thyroglobulin antibody. Pathology-proved disease or morphologic evidence of disease on cross-sectional imaging was considered a structural recurrence. Suspicious findings on wholebody scintigraphy or PET/CT were considered to indicate functional recurrence. The end point of our study was RFS, which was defined as the period from the date of the initial operation to the date of the first recurrence (biochemical, structural, and functional) or the last follow-up visit. 1,24 DECT Imaging Technique and Postprocessing The DECT examination was performed with a third-generation DECT scanner (Somatom Force; Siemens) equipped with 2 x-ray tubes at different voltages (tube A, 80 kV[peak]; tube B, Sn150 kVp). 25 Postprocessing was performed using a commercially available software (Syngo Dual Energy, Version VB10B; Siemens). The DECT image-acquisition parameters and postprocessing are detailed in the Online Supplemental Data.

Clinicopathologic Information
Clinical information including age, sex, body mass index (BMI), nodular goiter, Hashimoto thyroiditis, and serology examination results, was acquired from medical records. Age was categorized by 45 and 55 years separately in accordance with the seventh and eighth American Joint Committee on Cancer staging systems. 1 BMI (kilograms/square meter) was calculated as weight (kilograms)/(height Â height) (square meter). BRAF V600E gene status was obtained from genetic reports of preoperative fine-needle aspiration biopsy and confirmed in the final tumor surgical specimens.

Tumor Morphologic CT Features and CT-Reported ETE Status
Qualitative CT image analysis was performed by readers 1 and 2 (with 6 and 5 years' experience in head and neck oncologic imaging, respectively) on mixed images. If discrepant interpretations occurred, reader 3 (with 8 years' experience in head and neck oncologic imaging) performed a further assessment and made the final decision. All the observers were blinded to clinical data and permanent pathologic results.
The morphologic CT features of the tumors evaluated in the study are described in the Online Supplemental Data. The degree of tumor capsule contact assessed on the CT image is presented in the Online Supplemental Data. CT-reported ETE-positive criteria are summarized in the Online Supplemental Data. Examples of CT-reported ETE-positive PTC patients using corresponding criteria are shown in the Online Supplemental Data.

Tumor Segmentation and Radiomics Feature Extraction
The workflow of our study is shown in Fig 1. Tumor segmentation was performed semiautomatically by reader 2 using syngo.via Frontier radiomics (Siemens) on noncontrast images, dual-phase mixed images, and iodine maps. 26,27 The process of semiautomatic tumor segmentation is summarized in the Online Supplemental Data. All tumor segmentation was confirmed again by reader 3.
After tumor segmentation, radiomics features from VOIs were automatically computed using syngo.via Frontier radiomics interfaces with the PyRadiomics library (https://github.com/AIM-Harvard/pyradiomics). 28 Detailed information on radiomics feature extraction is described in the Online Supplemental Data.

Radiomics Features: Mining and Signature Building
A 2-step procedure was devised for the high-dimensional radiomics feature mining. First, 183 patients were randomly selected for test and retest, and the intraclass correlation coefficient was calculated to assess the reproducibility of the features. Features with an intraclass correlation coefficient of ,0.8 were excluded from the subsequent analysis. Second, random forest (RF) and the least absolute shrinkage and selection operator (LASSO) logistic regression were respectively implemented for radiomics-feature selection and signatures building (Online Supplemental Data).
Through the above dimensionality-reduction procedure, 6 radiomics scores (rad-scores) were built on the basis of noncontrast images or mixed images or iodine maps using RF or LASSO, respectively (noncontrast rad-score_RF; noncontrast rad-score_LASSO; mixed rad-score_RF; mixed rad-score_LASSO; iodine rad-score_RF; iodine rad-score_LASSO). The predictive performance of the 6 rad-scores was compared through receiver operating characteristic analysis calculated from the area under the curve (AUC) using the DeLong test.

Establishment, Performance, and Validation of Radiomics Models
Six radiomics models were built through multivariate logistic regression analysis and were incorporated into the 6 constructed rad-scores with other independent risk factors (Fig 1). Backward stepwise variable selection was implemented with the Bayesian information criterion. The variance inflation factor was checked for each factor contained in the final radiomics models. A clinical model was also built containing other pertinent risk factors (except rad-scores) for comparison.
Receiver operating characteristic analysis with the AUC calculated as a performance indicator was used for predictive efficacy assessment. Internal validation was tested in an independent validation cohort. Comparisons of predictive efficacy among the 6 radiomics models and the clinical model were applied through the AUC using the DeLong test.

Radiomics Nomogram Construction
To provide the clinicians with easy-touse tools, we developed a radiomics nomogram based on the radiomics model that achieved the highest predictive performance. Consistency between the nomogram-predicted probability of ETE and the actual results accompanied by the Hosmer-Lemeshow goodness of fit test was tested, using calibration curves by bootstrapping with 1000 resamples. 29

Stratified and Subgroup Analysis
To confirm the robustness of the established nomogram, we performed stratified analysis for ETE prediction according to age, sex, BMI, and BRAF V600E gene status. CT-reported ETE-negative subgroup analysis was also used.

Clinical Utility
The clinical usefulness of the constructed nomogram was assessed by decision curve analysis by quantifying the net benefits at different threshold probabilities for the entire set. 30

RFS Analysis
Univariable and multivariable analyses with Cox proportional hazards regression were used to determine risk factors for RFS. Variables with P , .050 in the univariable analysis were incorporated into the multivariable Cox regression. The RFS curve was generated using the Kaplan-Meier method, and the nomogrampredicted ETE low-and high-risk groups were compared using log-rank tests.

Clinical Characteristics
No significant differences were found between the training and validation sets in any of the clinicopathologic and radiographic characteristics (Online Supplemental Data) (P all . .050). The ETE-positive rates were 38.5% (122 of 317) in the training cohort and 45.9% (62 of 135) in the validation cohort. CT-reported ETE had a high specificity (97.4%) but a poor sensitivity (40.2%) in the entire cohort. Univariate analyses indicated BMI, size, capsule contact, and CT-reported ETE with P , .050 in estimating ETE (Online Supplemental Data). A clinical model was built containing size and capsule contact, which were identified as independent risks for ETE through multivariable logistic regression analysis (Table 1).

Features Selection and Radiomics Scores Calculation
In total, 573 stable features extracted from noncontrast images, 1565 stable features (707 arterial and 858 venous phase) extracted from mixed images, and 1461 stable features (738 arterial and 723 venous phase) extracted from iodine maps with the intraclass correlation coefficient of .0.8 were retained. The reproducibility of the extracted features is shown in the Online Supplemental Data. The RF classifier selected 6 features of noncontrast images, 14 features (8 arterial phase and 6 venous phase) of mixed images, and 15 features (7 arterial phase and 8 venous phase) of iodine maps, respectively. RF feature-importance ranking is presented in the Online Supplemental Data. LASSO logistic regression shrank these stable features, resulting in 9 features of noncontrast images, 22 features (8 arterial phase and 14 venous phase) of mixed images, and 11 features (5 arterial phase and 6 venous phase) of iodine maps with nonzero coefficients, respectively. The results of LASSO logistic shrinking are depicted in the Online Supplemental Data. Ultimately, 6 rad-scores of noncontrast images, mixed images, and iodine maps using RF or LASSO logistic were established (noncontrast rad-score_RF; noncontrast rad-score_LASSO; mixed rad-score_RF; mixed rad-score_LASSO; iodine rad-score_RF; iodine rad-score_LASSO).

Mixed or Iodine Rad-Scores versus Noncontrast Rad-Scores
Significant differences were found between patients with and without ETE in all 6 established rad-scores (P all , 0.050). There were no significant differences between different rad-scores using the RF or LASSO method (P all . 0.050). Mixed and iodine rad-scores both performed better than noncontrast rad-scores, irrespective of whether RF (training set, P = .171 and P , .001; validation set, P = .198 and P = .001) or LASSO (training set, P = .172 and P , .001; validation set, P = .318 and P = .004) was used, with significant differences between iodine and noncontrast rad-scores (P , .005). Detailed predictive ability comparisons of 6 rad-scores are summarized in the Online Supplemental Data. Receiver operating characteristic analyses are also shown in the Online Supplemental Data.

Iodine versus Mixed Rad-Scores
Iodine rad-scores significantly outperformed mixed rad-scores in both the training and validation cohorts, irrespective of whether RF or LASSO was used (P , .050). Iodine rad-score_RF obtained the optimal performance, with an AUC of 0.74 in the training cohort and 0.74 in the validation cohort. Radiomics feature composition of the iodine rad-score_RF is summarized in the Online Supplemental Data, and distribution is shown as a violin plot in the Online Supplemental Data.

Development, Performance, and Validation of Radiomics Models
Six radiomics models were constructed using the 6 rad-scores incorporating other clinical risk factors through multivariable logistic regression analysis. The results of logistic regression are presented in Table 1. The variance inflation factor of all variables included in the 6 radiomics models ranged from 1.00 to 1.83, indicating that there is no multicollinearity. The AUCs of all 6 radiomics models were all .0.73 in both the training and validation cohorts. The predictive performance of the 6 radiomics models is detailed in the Online Supplemental Data. The predictive performances of radiomics models improved when adding the clinical risk factors (tumor size and capsule contact) in addition to different rad-scores. Among the 6 radiomics models, radiomics model 5, which incorporated size, capsule contact, and iodine rad-score_RF, achieved the highest performance with an AUC of 0.78 in the training cohort and 0.84 in the validation cohort.

Radiomics versus Clinical Models
All 6 radiomics models yielded higher discrimination than the clinical model both in the training and validation cohorts. In addition, the performance improvement of radiomics models 5 and 6 (radiomics of iodine maps using RF and LASSO) added to the clinical model reached a significant difference (P , .050). Predictive performance comparisons of radiomics models and the clinical model are summarized in Table 2.

Stratified and Subgroups Analyses within the CT-Reported ETE-Negative Group
Stratified analysis demonstrated that performance of the radiomics nomogram was good and stable in different subgroups. In the CT-reported ETE-negative group, the nomogram still achieved an AUC of 0.69, a sensitivity of 0.68, a specificity of 0.66, and an accuracy of 0.64. The results of the receiver operating characteristic curve analysis are summarized and presented in the Online Supplemental Data. In our study, 110 of 184 (59.8%) patients with PTC with ETE were misdiagnosed using conventional CT signs. Of these misdiagnosed patients, 77 were correctly reclassified using the radiomics nomogram. The rate of misdiagnosis was dramatically decreased from 59.8% (110 of 184) to 17.9% (33 of 184).

Clinical Practice
Decision curve analysis revealed that if the threshold probability was ..09, the radiomics nomogram added more benefit than the clinical model (Online Supplemental Data), thus indicating the potential value of the nomogram in clinical practice.

RFS Analysis
The median follow-up was 26 months (interquartile range = 23-30) for 245 patients with PTC, and 46 of 245 (18.8%) patients had recurrence (local-regional in 43 patients and distant metastasis in 3 patients).
Size, capsule contact, and nomogram-predicted ETE showed a significant association with ETE in univariate Cox regression analysis. Only nomogram-predicted ETE was identified as an independent preoperative factor for PTC recurrence in multivariable Cox analysis (P = .011) ( Table 3). The median RFS was 23 months (95% CI, 20-26) in the nomogram-predicted ETE high-risk group and 31 months (95% CI, 29-34) in the low-risk group. The Kaplan-Meier cumulative event curve for recurrence stratified by the nomogram-predicted ETE risk classification is shown in Fig 2. DISCUSSION ETE is a significant prognostic risk factor for patients with PTC. Patients with ETE are recommend for a more aggressive initial therapy, likely total thyroidectomy and intensive follow-up. [1][2][3] However, ETE varies to some degree: Some ETE involves only the perithyroid soft tissues identified under a microscope, which could only be detected through postoperative pathologic evaluation. 1,5,15 On the basis of conventional imaging techniques (ultrasound and CT) and their morphologic features, it is difficult to detect the ETE that could only be determined by tumor histopathologic examination. [4][5][6] In our study, 110 of 184 (59.8%) patients with PTC with ETE were misdiagnosed using conventional CT signs. Therefore, a more effective and reliable approach was needed. Radiomics analysis can extract massive quantitative image features, which enable mineable high-dimensional data to be applied to support clinical decision-making. 9,10 Chen et al 15 conducted radiomics analysis of conventional structural CT to predict ETE and verify its potential value. We found that 6 radiomics models all had favorable performance, with AUCs all . 0.73, and they surpassed the clinical model, similar to findings in a previous study. These findings indicate that radiomics approaches might be promising in predicting ETE. Notably, the incremental performance of radiomics models 5 and 6 (radiomics of iodine maps using RF and LASSO) added to the clinical model indicated a significant difference. To make the results of radiomics analysis more stable, we applied 2 machine learning algorithms, RF and LASSO, and no significant differences were found between the 2 methods.
The thyroid gland is the primary storage organ for iodine, so radiomics analysis of PTC in contrast-enhanced images is affected by both the thyroidal parenchyma and external iodine. To eliminate iodine effects on the thyroidal parenchyma and truly reflect contrast media uptake in PTC, we compared radiomics analysis of unenhanced CT images with enhanced images (mixed images and iodine maps). We found that the mixed and iodine rad-scores performed better than noncontrast rad-scores, irrespective of whether RF or LASSO was used. Compared with noncontrast images, dual-phase mixed images and iodine maps may provide additional information on tumor perfusion that correlates with tumor vascularity. 16,17 Our results are consistent with those in previous studies, which indicated that intratumor vascular heterogeneity was highly associated with migration, invasion, and metastasis of PTC. 1,31 Iodine rad-scores significantly outperformed mixed radscores, irrespective of whether RF or LASSO was used. In addition, iodine rad-score_RF obtained optimal performance in both the training and validation cohorts. Previous studies have confirmed an advantage of DECT-derived quantitative iodine concentration over conventional Hounsfield unit values in assessing enhancement, suggesting that iodine maps represent a more reliable detection method for lesion-enhancement differences than conventional CT images. 32 In addition, iodine maps can effectively suppress the background CT value and neck root artifacts, further affecting subsequent radiomics analysis. 16,32 Previously, Zhou et al 22,23 reported that radiomics analysis of iodine maps can realize effective diagnosis and prediction for lymph node metastasis in patients with PTC. Radiomics of iodine maps can truly reflect tissue iodine heterogeneity, which aids in predicting ETE for PTC, similar to findings in previous studies.
On the basis of above the results, we finally established a radiomics nomogram incorporating the iodine rad-score_RF with clinical risk factors (size and capsule contact) to predict ETE. Size, capsule contact, and the iodine rad-score_RF complement each other. The efficiency of the nomogram improved when combining clinical risk factors and radiomics features. It is understandable that tumors with larger size and greater exposure to the capsule are more likely to disseminate into the glandular lobes and interrupt the thyroid edge. 5,8 Fifteen radiomics features, including 7 arterial phase and 8 venous phase, were used to construct the iodine rad-score_RF, most of which (13 of 15) described the distribution of voxel intensities that were relevant to tumor heterogeneity. In addition, more than half of the selected features (10 of 15) were wavelet filters. The wavelet transformation decomposes the original image into 3 different frequency directions, which can further explore the spatial heterogeneity of target tumors at multiple scales. 28 To confirm the stability and robustness of the radiomics nomogram, we performed stratified analysis in different subgroups. In addition, the radiomics nomogram performed well, with AUCs of .0.80 in all subgroups with different age, sex, BMI, and BRAF gene statuses. Encouragingly, the nomogram still achieved an AUC of 0.69 in the CT-reported ETE-negative group. Moreover, the rate of misdiagnosis is dramatically decreased from 59.8% to 17.9% when using the radiomics nomogram.
The prognostic value of the nomogram-predicted ETE risk was also analyzed in our study. Most promising, nomogram-predicted ETE was regarded as an independent preoperative predictor of RFS in patients with PTC through multivariable Cox analysis. High-risk patients had a significantly increased risk of recurrence compared with low-risk patients. Given that the incidence of a recurrence is not low in patients with PTC, 1 nomogram-predicted ETE as a potential operative recurrence predictor may help to decide the extent of the initial operation, postoperative adjuvant therapy, and the intensity of postoperative followup.
The current study has several limitations. First, this retrospective study conducted radiomics analysis in a single center, and prospective multicenter studies with larger data sets are needed to validate our results. Second, the use of iodinated contrast agents may delay radioactive iodine therapy in patients with PTC, though radioactive iodine therapy was routinely performed at least 2 months after the operation in our institution. A previous study reported that the uptake of iodine 131 can return to normal at this interval. 1,20 Third, increased radiation exposure using contrast-enhanced DECT should not be ignored. A further effective technique is required to reduce the radiation dose.

CONCLUSIONS
Radiomics of iodine maps performed significantly better than that of conventional CT images for predicting ETE in PTC. An iodine map-based radiomics nomogram demonstrated higher performance than the clinical model in predicting ETE and subsequent recurrence risk, thus serving as a supporting tool in PTC clinical decision-making.
Disclosure forms provided by the authors are available with the full text and PDF of this article at www.ajnr.org.