Incidental Thyroid Nodules on CT: Evaluation of 2 Risk-Categorization Methods for Work-Up of Nodules

BACKGROUND AND PURPOSE: Thyroid nodules are common incidental findings on CT, but there are no clear guidelines regarding their further diagnostic work-up. This study compares the performance of 2 risk-categorization methods of selecting CT-detected incidental thyroid nodules for work-up. MATERIALS AND METHODS: The 2 categorization methods were method A, based on nodule size ≥10 mm, and method B, a 3-tiered system based on aggressive imaging features, patient age younger than 35 years or nodule size of ≥15 mm. In part 1, the 2 categorization methods were applied to thyroid cancers in the SEER data base of the National Cancer Institute to compare the cancer capture rates and survival. In part two, 755 CT neck scans at our institution were retrospectively reviewed for the presence of ITNs of ≥5 mm, and the same 2 categorization methods were applied to the CT cases to compare the number of patients who would theoretically meet the criteria for work-up. Comparisons of proportions of subjects captured under methods A and B were made by using the McNemar test. RESULTS: For 84,720 subjects in the SEER data base, methods A and B each captured 74% (62,708/84,720 and 62,586/84,720, respectively) of malignancies. SEER subjects who would not have met the criteria for further work-up by both methods had equally excellent 10-year cause-specific and relative survival of >99%. For part 2, the prevalence of ITNs of ≥5 mm at our institution was 133/755 (18%). The number of ITNs that would be recommended for work-up by method A was 57/133 (43%) compared with 31/133 (23%) for method B (P < .0005). CONCLUSIONS: Compared with using a 10-mm cutoff, the 3-tiered risk-stratification method identified fewer ITNs for work-up but captured the same proportion of cancers in a national data base and showed no difference in missing high-mortality cancers.

A n incidental thyroid nodule is defined as a thyroid nodule detected in studies performed for indications not specific to the thyroid in a patient without known thyroid disease. An ITN is a common finding seen in 1 in 6 contrast-enhanced neck CT scans and also frequently seen on cervical spine CT and chest CT scans. 1 ITNs pose a difficult problem for radiologists and clinicians because despite favorable features of the low malignancy rate in ITN of 0.5%-9% 2,3 and excellent survival for most thyroid cancers, many radiologists feel compelled to recommend work-up of ITNs seen on CT for fear of missing a malignancy. The decision to further evaluate such incidental nodules can start a cascade of sonographic imaging, biopsy, and even surgery for nodules that are commonly discovered to be benign. The cost of routinely pursuing work-up of ITNs includes unneeded patient anxiety and a substantial health care economic burden. 1 Additionally, the radiologist's approach to reporting ITNs on CT can vary widely because of a lack of clear guidelines. 4,5 In practice, the most common method for selecting a CT-detected ITN for sonography is to use a 10-mm-size cutoff. 5 This size has been arbitrarily chosen on the basis of extrapolation from sonographic recommendations, but unlike sonography, CT has no signs that help to further differentiate malignant from benign nodules. 4,6 If Ն10 mm is the only criterion for selection, then Յ78% of ITNs on CT could require sonography. 3 Thus, there is a need for a more effective method of managing CT-detected ITNs.
A recently proposed strategy for reporting ITNs seen on crosssectional imaging is based on prioritizing subsets of patients who are more likely to have malignant nodules. 7 Each of the risk categories in this 3-tiered system is intended to help the radiologist communicate the risk of malignancy in a CT-detected ITN and the need for work-up with sonography. We modified the 3-tiered system for this study: Risk category 1 is a nodule of any size with aggressive imaging features of invasive or metastatic disease, risk category 2 is a nodule of any size in a patient younger than 35 years of age, and risk category 3 is a nodule at or above a cutoff of 15 mm and not meeting criteria for categories 1 and 2. The ideal categorization method for reporting nodules should increase the detection of malignancy while also reducing the number of nodules requiring work-up. Although this tiered risk system could potentially perform better than a method based solely on size, it requires validation.
This study aimed to compare the performance of 2 risk-categorization methods of selecting CT-detected ITNs for work-up. These aims were evaluated by an analysis of the Surveillance, Epidemiology, and End Results data base of the US National Cancer Institute (http://seer.cancer.gov) to determine the malignancy capture rate and a retrospective review of ITNs on neck CT studies at our institution to determine the number of ITNs that would potentially require work-up. Our hypothesis was that the 3-tiered system, compared with size cutoff alone, would capture clinically important cancers that confer the greatest risk of mortality while reducing the number of nodules that would be referred to sonography for work-up.

MATERIALS AND METHODS
In part 1, we evaluated the performance of these 2 categorization methods by using the SEER cancer registry to compare capture of thyroid cancer and survival. In part 2, we applied the categorization methods to a retrospective cohort of ITNs seen on CT at our institution to compare the proportion of patients that would be recommended for sonographic work-up. Part 2 also served as an additional check to ensure that the methods worked in a realworld population of patients.

Categorization Methods
Method A was based simply on nodule size. When applied to tumors in part 1, nodule size refers to the recorded maximum tumor diameter in the SEER data base, obtained from pathology or imaging reports. When applied to nodules on CT in part 2, nodule size refers to the largest diameter on axial CT images in the region of focal attenuation abnormality. In addition to the cutoff of Ն10 mm, we also evaluated sizes of 15 and 20 mm to consider alternate size cutoffs that other groups or practices may adopt.
Method B used a 3-tiered system based on the aggressiveness of imaging features, patient age, and nodule size. 8 In this system, 3 subcategories were created on the basis of the risk of malignancy as described below.
Risk category 1 (highest risk) denotes patients with concerning CT findings such as local invasion, suspicious lymphadenopathy, or systemic metastatic disease. Risk category 2 is patients younger than 35 years of age and not meeting the criteria for risk category 1. This group was selected because of their higher ratio of malignant-to-benign nodules. 6,[9][10][11][12][13] Risk category 3 is nodule size Ն15 mm and not meeting the criteria for risk category 1 or 2. The 15-mm cutoff for risk category 3 is higher than that in method A and is intended to reflect a higher size threshold for work-up of nodules that lack aggressive imaging findings or demographic risk factors. This cutoff has also been used in sonographic evaluation of thyroid nodules by several groups. 2,14 The purpose of having 3 risk categories is to help the radiologist communicate the risk of malignancy in a CT-detected ITN and the need for work-up with sonography. For example, for risk category 1 the radiologist would strongly recommend work-up, while for risk category 3, the radiologist could report the finding in the impressions without specific mention of work-up, leaving more flexibility for the clinician's input.
For purposes of this study, patients meeting the criteria for any of the 3 risk categories were considered to be receiving work-up under method B. We evaluated the performance of method B overall and the performance of each risk category.
Of note, clinical risk factors such as family history, childhood radiation exposure, and endocrine syndromes were not included in our risk assessment because this information is not available in the SEER data base and radiologists' access to this information may be limited in practice. In the original description of the 3-tier system, patients with these risk factors were assigned to risk category 2. 8 Uncategorized subjects not meeting the size criteria for method A or B would compose the subgroup for which work-up with sonography and biopsy would have theoretically not been pursued if the nodule was seen on CT.

Part 1: SEER Data Base of Thyroid Malignancy
Subjects. The SEER Program collects cancer data from 18 population-based registries representing 28% of the US population. 15 Cases were selected on the basis of a diagnosis of thyroid carcinoma. Although this data base only contained cancers and not benign ITNs, this part of the study served as a model to compare the number of cancers that would potentially be captured or missed with the categorization methods. The purpose of using these data was to compare the capture rate and not the diagnostic ability (sensitivity, specificity, and accuracy) of the 2 methods because calculating these latter statistics would require inclusion of benign cases.
Subjects were excluded if they were coded as any of the following: not the first malignant primary, not actively followed, alive with no survival time, missing or unknown cause of death (for the cause-specific survival analysis only), not in the research data base, or missing data regarding age or nodule size. The SEER data base provided the sex and age of each patient at diagnosis, follow-up time, vital status at follow-up, tumor size, and staging information. Recorded tumor sizes are intended to reflect sizes at initial staging. Because tumor size was not recorded in the SEER data base before 1983, the dataset used in this study included only cases diagnosed from January 1983 through December 2009. 15 Application of Categorization Methods. The SEER data on tumor size and stage were based on a combination of pathology and imaging, but for the purposes of this study, we assumed that this information could be obtained on CT. SEER subjects were stratified by methods A and B by using data on age; size of the cancer; and initial staging designation of localized, regional, or distant metastatic disease.
Outcome Measure. We calculated the proportion of patients in the SEER data base that would have been captured with method A compared with B. This indicated the number of cancers that would have met the criteria for undergoing work-up and thus would have been potentially diagnosed if they had presented initially as ITNs on CT. We compared the relative survival and cause-specific survival for patients that were uncategorized by methods A and B. This was a way of determining whether the cancers missed (uncategorized) by method A differed from those in method B.

Part 2: CT Cohort of Incidental Thyroid Nodules
Subjects. We retrospectively reviewed all (consecutive) subjects with contrast-enhanced neck CT and CTA examinations performed at a single institution in a 12-month period from July 1, 2002, to June 30, 2003. We chose the first year in which an electronic PACS was available at our institution to maximize the duration of clinical follow-up. Although thyroid nodules may also be seen on cervical spine and chest CT and MR images of the neck and chest, we have not examined these studies because the protocols are less likely to consistently include the entire thyroid gland (smaller FOV, saturation bands, or artifacts). Of 1127 contrastenhanced neck CT or CTA examinations performed in the target time interval, 266 were excluded because they were not the patient's first neck cross-sectional study during the target time, 13 were not available for review due to technical reasons, 22 were excluded due to incomplete visualization of the thyroid, and 71 were excluded due to a thyroid-specific indication or known history of thyroid cancer or thyroid surgery. The final study group consisted of 755 patients with 549 soft-tissue neck CT scans (section thickness, 3 mm; interval, 3 mm) and 206 neck CTA examinations (section thickness, 1.25 mm; interval, 0.5 mm). During the period, imaging was performed with 120 -150 mL of iopamidol (Isovue-300; Bristol-Meyers Squibb, Princeton, New Jersey) with an injection rate of 3 mL/s on 16-row multidetector CT scanners (LightSpeed; GE Healthcare; Milwaukee, Wisconsin).
The study was approved by the institutional review board at our institution, with waivers of informed consent and authorization of the Health Insurance Portability and Accountability Act.
Application of Categorization Methods. Two fellowship-trained neuroradiologists with 5 years' experience in interpreting neck CT examinations retrospectively reviewed axial CT images for thyroid nodules measuring at least 5 mm in diameter by using electronic calipers. A nodule is defined as a focal region of attenuation distinct from the remainder of the thyroid parenchyma. They recorded the following findings: number of nodules in the thyroid, maximal dimension of the largest nodule, calcification or simple cystic composition, and aggressive features. Aggressive features fulfilling the criteria for risk category 1 for method B were extrathyroid invasion and distant or nodal metastases. Data were recorded independently, with observers blinded to any follow-up information. Nodules meeting the criteria for category 1 (invasive or lymphadenopathy) were reviewed by a third neuroradiology fellowship-trained radiologist with 10 years' experience reading neck CT scans. Any potentially ambiguous cases were also flagged for consensus review. Medical records of subjects with ITN were reviewed to determine age, sex, clinical follow-up, and pathology if available.
Outcome Measure. We compared the proportion of nodules that would meet the criteria for methods A and B. This was an indication of the work-up rate for ITNs detected on CT.

Statistical Analysis
For part 1, relative survival and cause-specific survival analyses were performed by using Seer*Stat 7.0.9 (National Cancer Institute; http://seer.cancer.gov/seerstat/releasenotes.html) and the R package (www.r-project.org). Relative survival was calculated on the basis of observed survival of the cohort relative to expected survival, where expected survival is determined from census population survival data, adjusting for age, sex, race, and year. The mean survival and 95% confidence intervals for the 3 size-cutoff values of method A and the 3-tiered system of method B were calculated and compared. Comparison of survival was performed by using Cox regression to yield hazard odds ratios for the variables in the categorization methods.
For part 2, comparisons of proportions of patients captured under methods A and B were made by using the McNemar test. Statistical significance was determined by P value Ͻ .05.
Overall, there were 2697 deaths attributed to thyroid cancer that occurred within 10 years of diagnosis, representing 3.2% of the cohort of 84,439 subjects and yielding a 10-year cause-specific survival of 95.7% (80,840/84,439). Overall relative survival in the cohort at 10 years was 97.4% (89.4/91.7). Relative and thyroid cancer-specific survival by age, sex, and tumor characteristics is presented in Table 1.
Categorization Methods: Effect on Cancer Capture. Table 2 shows the cancers captured with each method for the thyroid cancer population. The size cutoff of Ն10-mm of method A captured 74% of cancers, compared with 59% with a 15-mm threshold and 46% with a 20-mm threshold. The 3-tiered system of method B captured 74% of the thyroid cancer population.
Categorization Methods: Effect on Survival. Table 2 shows the survival of subsets of the thyroid cancer population classified by the categorization methods. Regarding method A, relative survival estimates show no evidence of excess mortality for thyroid cancer subgroups with tumor sizes Ͻ10 mm, 15 mm, or 20 mm. The thyroid cancer-specific 10-year survival rates for tumor sizes Ͻ10 mm, 15 mm, and 20 mm were greater than 99% and are significantly different for tumors above the size cutoffs (P Ͻ .0001). The cause-specific survival curves as a function of tumor size are shown in Fig 1. Under method B, the 26% unassigned subjects had a 99.9% relative survival and 99.5% cause-specific survival. Cause-specific survival among cancers not captured under method B was not significantly different from that of method A (P ϭ .26). Compared with the uncategorized subset, the 39% of cancer subjects identified as risk category 1 had a markedly worse 10-year causespecific survival with an OR of 22.07 (95% CI, 17.11-28.47; P value Ͻ .0001). The OR for risk category 2, relative to the uncategorized group was 0.43 (95% CI, 0.25-0.75; P value ϭ .003), indicating that it had a significantly better survival rate than the uncategorized group. Risk category 3 had a moderately worse 10year cause-specific survival rate than the unassigned group (OR, 4.90; 95% CI, 3.73-6.47; P value Ͻ .0001). The thyroid cancerspecific survival curves stratified by method B are shown in Fig 2.

Part 2: CT Cohort of Incidental Thyroid Nodules
Subjects and Prevalence of Incidental Thyroid Nodules. Among 755 subjects (mean age, 50.2 Ϯ 20.8 years; 52% male), there were 133 patients with thyroid nodules of Ն5 mm (mean age, 60.7 Ϯ 16.5 years; 32% male), resulting in an ITN prevalence of 18%. Mean nodule size was 11 Ϯ 6.9 mm. The characteristics of these patients and their CT findings are shown in the On-line Table. Categorization Methods: Effect on Work-Up Rate. One hundred thirty-three patients with ITNs were included in the cohort for evaluation of categorization methods (Table 3). Method A captured 54 patients (43%) with the Ն10-mm threshold. Alternative size thresholds of Ն15 and Ն20 mm captured 22 (17%) and 12 (9%) patients, respectively.
The 3-tiered system of method B captured 31 patients (23%) ( Table 3), significantly less than those captured with  a Nodules smaller than the size cutoff and in the "not categorized" group for method B would represent nodules that would not receive work-up if the methods were applied.
method A (P value Ͻ .0005). All nodules in category 2 were Ͻ10 mm. Review of medical records found that a small number of nodules underwent further evaluation. Biopsy performed in 14 patients revealed 12 benign nodules and 2 cases of thyroid lymphoma. There were no cases of papillary thyroid cancers. Both cases of lymphoma met the criteria for category 1; one measured 1.5 cm and the other measured 2 cm in the maximal axial dimension on CT. Patients without pathology had a median follow-up time of 24 months (interquartile range, 6 -105 months). No additional thyroid cancers were diagnosed during that time.

DISCUSSION
ITNs are very common on CT, but malignancy in the ITN is uncommon. While it would be ideal for a categorization method to capture all incidental thyroid malignancies, it may be more rational and cost-effective to capture all malignancies associated with poor outcome. Our 3-tiered system improved on a size-only method by capturing the same proportion of cancers as in the SEER data base while almost halving the number of nodules targeted for work-up. Although interpretation of these findings is subject to the limitations discussed below, the clinical implication is that a larger proportion of ITNs detected on CT can be managed more conservatively without missing high-mortality cancers.
To date, a size of 10 mm is the most common method of triaging CT-detected nodules for work-up, 4,5 despite there being no guidelines and no evidence to support this cutoff size. While increasing the cutoff size for work-up would obviously reduce the number of patients having work-ups, there should be a balance so as not to increase the proportion of missed malignancies. Our retrospective review of ITNs on CT in 1 year at our institution shows that if we considered size alone, using a 15-mm cutoff compared with a 10-mm cutoff could more than halve the number of patients requiring work-up, but the disadvantage is that with SEER data, nearly half the number of cancers would be missed.
The rationale for developing method B was to improve on a size-based categorization by taking into consideration additional features that either increase the risk for malignancy in an ITN or are associated with higher mortality. The performance of method B in capturing cancers with poorer survival in the SEER data base is largely the result of including aggressive imaging features or suspicious lymphadenopathy in risk category 1. The other subgroup highlighted by method B was young subjects because this group has a higher malignant-to-benign ratio in a given thyroid nodule. 6,[9][10][11][12][13] We found that the age group younger than 35 years had a discordant proportion of ITNs versus thyroid cancer: Younger than 35 years represented 7% of ITNs by CT review, but 23% of thyroid cancers recorded in SEER. Thus, the additional burden associated with working up young patients may be justified by the greater yield of detecting malignancy.
There are additional clinical risk factors such as family history of malignancy, childhood radiation, and endocrine syndromes that increase the malignancy risk. 4 In the original description of the 3-tiered system, patients with these risk factors were assigned   to risk category 2. 8 In our modified 3-tiered system (method B), we did not include clinical risk factors because this clinical information was not available from the SEER data base and, in practice, the radiologist's access to this information may be limited. Radiologists reporting ITNs should be aware of the clinical risk factors and modify their recommendations when these factors are present.
There are several limitations to this study. In part 1, a major limitation was the assumption that the data in SEER correspond to information seen on imaging. In practice, early tumor with microscopic local invasion and nodal metastases may have been missed on CT. This study may, therefore, overestimate the relative performance of method B in capturing malignancies on CT. It is also possible that some cancers between 10 and 15 mm were not seen on CT and we have overestimated the performance of method A. The second limitation of part 1 is that survival for most cases in the SEER data base was based on diagnosed and treated tumors. The uncategorized patients by our 2 methods could have had worse survival had they not been diagnosed and treated. However, the main purpose of evaluating survival was to compare the biology of the tumors between the 2 categorization methods. The data show that the tumors uncategorized by both methods are equally less aggressive than the tumors that met the criteria for work-up. Furthermore, survival in small tumors is likely to be excellent because epidemiologic trends show the absence of a survival improvement despite increased diagnosis of small thyroid cancers and a Japanese study showed no deaths during 10 years in nonaggressive small thyroid carcinomas that did not receive treatment. 16,17 There are also limitations for our CT retrospective review. It would have been interesting and ideal to have histology of the thyroid nodules to determine the number of missed cancers with each categorization method. However, the purpose of using the cohort was to determine the potential work-up rate and not the accuracy of the methods. The work-up rate in a future study could be improved with uniform radiology-report recommendations based on our proposed stratification system. In the present study, we did not address other risk factors that are known to affect the risk of thyroid malignancy, such as radiation exposure and family history.

CONCLUSIONS
A stratification approach to ITN that incorporates aggressive imaging findings, age younger than 35 years, and a 15-mm cutoff for triaging work-up has several advantages. Compared with the common practice of a 10-mm-size cutoff, the 3-tiered system reduces excess work-up of benign ITNs while capturing the same proportion of thyroid malignancies and is no more likely to miss high-mortality malignancies.