Initial Performance of NI-RADS to Predict Residual or Recurrent Head and Neck Squamous Cell Carcinoma

BACKGROUND AND PURPOSE: The Head and Neck Imaging Reporting and Data System (NI-RADS) surveillance template for head and neck cancer includes a numeric assessment of suspicion for recurrence (1–4) for the primary site and neck. Category 1 indicates no evidence of recurrence; category 2, low suspicion of recurrence; category 3, high suspicion of recurrence; and category 4, known recurrence. Our purpose was to evaluate the performance of the NI-RADS scoring system to predict local and regional disease recurrence or persistence. MATERIALS AND METHODS: This study was classified as a quality-improvement project by the institutional review board. A retrospective database search yielded 500 consecutive cases interpreted using the NI-RADS template. Cases without a numeric score, non-squamous cell carcinoma primary tumors, and primary squamous cell carcinoma outside the head and neck were excluded. The electronic medical record was reviewed to determine the subsequent management, pathology results, and outcome of clinical and radiologic follow-up. RESULTS: A total of 318 scans and 618 targets (314 primary targets and 304 nodal targets) met the inclusion criteria. Among the 618 targets, 85.4% were scored NI-RADS 1; 9.4% were scored NI-RADS 2; and 5.2% were scored NI-RADS 3. The rates of positive disease were 3.79%, 17.2%, and 59.4% for each NI-RADS category, respectively. Univariate association analysis demonstrated a strong association between the NI-RADS score and ultimate disease recurrence, with P < .001 for primary and regional sites. CONCLUSIONS: The baseline performance of NI-RADS was good, demonstrating significant discrimination among the categories 1–3 for predicting disease.

R adiologists are major stake-holders in the shift toward valuebased performance. The American College of Radiology is leading the effort to re-engineer the radiology enterprise to be "patient centric, data-driven, and outcomes-based." Standard-ized reporting systems, dictation templates, and linked management recommendations have been identified as key contributions to value. 1 Much of this shift toward data-driven and outcomesbased reporting stems from the success of the BI-RADS system for standardizing mammography reports. Similar templates have been developed for hepatocellular carcinoma, 2 prostate cancer, 3 and thyroid nodules. 4 The Head and Neck Imaging Reporting and Data System (NI-RADS) was recently developed for surveillance contrast-enhanced CT (CECT) with and without positron-emission tomography in patients with treated head and neck (H&N) cancer. 5 Both the primary tumor site and neck are assessed for recurrence and assigned a category of 1-4 based on imaging suspicion with linked management recommendations: Management: Routine surveillance (6 months typically, see "Materials and Methods") • Category 2: Low suspicion of recurrence Imaging: Ill-defined abnormality with only mild enhancement and/or FDG uptake Management: Direct inspection for mucosal findings or short (3-month) follow-up with CECT or PET for deep findings • Category 3: High suspicion of recurrence Imaging: Discrete, new, or enlarging lesions with intense enhancement and/or focal FDG uptake Management: Biopsy • Category 4: Known recurrence, pathologically proved or definite radiologic or clinical progression.
NI-RADS categories 1-4 are the same for CECT or CECT/ PET, but the linked management recommendations and lexicon are slightly different (because FDG avidity is included in the latter). Furthermore, the first version of NI-RADS category 2 contains subcategories that also address lesion size and location (superficial or deep): 2a, superficial/mucosal surface; 2b, deep abnormality of Ͻ1 cm; and 2c, deep abnormality of Ͼ1 cm. 5 These subcategories are useful to direct management, for example superficial mucosal abnormalities are amenable to direct inspection. Size criteria were not added to predict disease but rather to avoid biopsy in this indeterminate category unless immediate management depended on the biopsy. 5 This template-driven approach reflects common language to promote collaboration between radiologists and referring providers, data-driven optimization of H&N cancer imaging, and greater direct engagement with patients.
An obstacle to improvement in value-based performance and direct patient reporting is lack of a data-driven standard surveillance imaging algorithm. PET/CECT at 12 weeks is often the first posttreatment study, though a recent study suggests that it can be performed at 8 weeks. 6 At our institution, patients with advanced H&N cancer are scanned with CECT/PET at 12 weeks as a baseline. If the findings are negative, they undergo CECT alone 6 months later, and if these findings are negative, they undergo CECT alone 12 months later. Although studies have investigated PET/CT for surveillance, 7-9 ordering practices among treating physicians remain variable. The 2015 National Comprehensive Cancer Network recommendations advocate imaging within 6 months for T3/4 primary tumors or N2/3 nodal disease and then additional imaging only for new signs/symptoms, smoking, or areas inaccessible to clinical inspection (the latter being arbitrary and difficult to apply). 10 Yet, 79% of H&N cancer surgeons selfreported using PET/CT for asymptomatic patients. 11 Given this variation in practice, it is critical to have measurable categories to correlate with outcomes to develop a data-driven universal surveillance algorithm.
NI-RADS allows H&N radiologists to perform structured radiologic-pathologic correlation and to determine accuracy, prognostic value, and interobserver agreement in contrast to subjective interpretations that provide no data for retrospective analysis. Our objective was to determine the initial performance of the NI-RADS scoring system to predict tumor recurrence or persistence in patients treated for squamous cell carcinoma of the H&N undergoing imaging surveillance.

MATERIALS AND METHODS
This study was designated a Quality Improvement project by our institutional review board at the Emory University School of Medicine. An electronic medical record search from June 12, 2014 to January 28, 2015 yielded 500 consecutive neck CECT examinations interpreted with the NI-RADS template, including patients with a variety of tumor types and primary locations. The following was gathered from a review of the electronic medical record: 2) CECT and/or CECT/PET for surveillance.
3) NI-RADS template used for interpretation. A total of 402 scans met the inclusion criteria. Criteria for tumor recurrence or persistence included the following: 1) Biopsy positive for squamous cell carcinoma, 2) evidence of disease progression on subsequent imaging (per Response Evaluation Criteria In Solid Tumors criteria; http://www.radiologytutor.com/ index.php/cases/oncol/139-recist), or 3) obvious tumor on physical examination. To confirm the lack of tumor recurrence, we assessed the following: 1) follow-up imaging at least 90 days after the index scan, 2) clinical follow-up for at least 6 months without evidence of recurrence, or 3) biopsy of an abnormality detected on the index scan with pathology results negative for tumor.

Exclusion Criteria
1) Insufficient outcomes data to determine positive or negative disease.
2) Score of "4, known recurrence" because recurrence had already been proved before the scan. However, it is possible to have a score of 4 for the primary site and 1, 2, or 3 for lymph nodes (or vice versa). Thus, an outcome could still be determined for nodes so that each scan had 2 possible sites for target abnormalities (primary and neck).
3) Multiple scans in the same patient if there were back-to-back scores of 1 for both primary and neck. In this case, subsequent index scans were excluded because the final outcome of "recurrence or not" for the primary or neck would be the same for these 2 data points.
These criteria yielded 287 patients, 318 scans, and 618 total targets (314 primary targets and 304 nodal targets) for which outcomes could be determined.

Surveillance Algorithm and Image Interpretation
At our institution, all patients with advanced H&N cancer (almost all patients except those with T1 N0 disease) are scanned with CECT/PET at 12-week baseline, and if the findings are negative, they undergo a CECT alone 6 months later. If these findings are negative, they undergo a CECT 12 months later. All NI-RADS surveillance scans were interpreted prospectively using the template by 1 of 4 dedicated H&N neuroradiologists (30, 15, 10, and 9 years of experience). Both the primary site and neck were assigned a NI-RADS category of 1-4. For this study, all category 2 subcategories were recorded as general category 2. For scores of 2-4, the target abnormality was described briefly in the impression after the numeric score. The NI-RADS template, created by a multidisciplinary team and implemented in 2014, has been subject to ongoing peer review through weekly tumor boards and the American College of Radiology RADPEER. Interpreting radiologists reviewed prior clinical history and endoscopic notes. Comparison with baseline imaging, including pretreatment FDG avidity when available, was made. The subjective interpretation of the PET/ CECT included evaluation of disease on both fused PET and CECT. As noted in the NI-RADS template, factors incorporated into lesion assessment included the following: size, FDG avidity, morphology, and enhancement pattern. Because previous studies have established that the standard uptake value data do not improve diagnostic accuracy for disease after treatment for H&N cancer, a strict threshold for standard uptake value was not used. 6,12,13 Instead, a subjective dichotomous analysis of intense FDG uptake was determined.

Image Acquisition
All PET/CT imaging followed standard protocol, and was performed on GE Discover 600 and 690 PET/CT scanners (GE Healthcare, Milwaukee, Wisconsin). Patients fasted for 6 hours before the scan, and serum glucose concentration was obtained immediately before FDG administration. The examination was deferred if glucose was Ͼ200 mg/dL. Combined PET/CT from the skull vertex through the midthigh was obtained 1 hour after intravenous administration of 10 -14 mCi of FDG. Helical noncontrast CT from the vertex through midthigh was performed before PET for attenuation correction and anatomic localization. A CECT of the neck with the arms down was performed following PET. Our split-bolus technique used 110 mL of intravenous iopamidol (Isovue-370; Bracco, Princeton, New Jersey), with 55 mL injected first at 2.5 mL/s, a 40-second delay, then another 55 mL at the same rate, with a total scan delay of 90 seconds. We acquired axial images from the frontal sinuses through the mediastinum at 1.25-mm section thickness; pitch, 0.984:1; gantry rotation, 0.7 seconds; FOV, 25 cm; 120 kV(peak); and Smart milliampere with a noise index of 13.78. Reformatted images at 2.5-mm thickness in the axial planes and 3-mm sagittal and coronal reformations were sent to the PACS.

Statistical Methods
Univariate association between recurrence and scan score (1-3) was estimated by the 2 test and the nonparametric Fisher exact test. The same analysis was repeated for primary site, lymph node, and their combination separately. The overall performance of discrimination of the scan score on recurrence status (yes versus no) is measured as the area under curve (AUC) by receiver operating characteristic (ROC) analysis with 95% confidence intervals. The sensitivity and specificity at each cut-point of the scan score were presented accordingly for score 1 versus 2-3 and for scores 1-2 versus 3. Additionally, the same ROC analyses for subgroup performed for CECT alone versus CECT ϩ PET and for the first posttreatment examination versus the subsequent surveillance examination were explored. The interobserver agreement was measured by statistics among 40 scans for primary and neck sites by 2 graders. The statistical significance level was set at P Ͻ .05, and analyses were conducted in SAS 9.4 (SAS Institute, Cary, North Carolina).
Median imaging follow-up after the index scan was 51 weeks; median clinical follow-up was 54 weeks. The distribution of tumor site and initial stage (when known) is outlined in Table 1. Primary tumors of the oropharynx were the largest group (43.2%), followed by tumors of the oral cavity (25.4%) and larynx (22.3%). At the primary site, almost one-third had moderately advanced (T4a) disease (32.8%). More than half had at least N2 nodal disease (54.7%). Distant metastatic disease at initial staging was rare (2.1%).

Incidence of Disease Recurrence/Persistence Based on NI-RADS Score
The incidence of recurrence for each NI-RADS category is detailed in Table 2. Overall, the incidence of tumor persistence/ recurrence was 7.9%, with an 8.9% (28/314) recurrence rate at the primary site and a 6.9% (21/304) regional nodal recurrence rate. NI-RADS 1. Five hundred twenty-eight of 618 targets (85.4%) were scored "NI-RADS 1, no evidence of recurrence" with only 3.8% having recurrent disease during the follow-up. When considered separately, the recurrence rate for primary and nodal NI-RADS 1 scores was similar (3.5% and 4.0%, respectively).
NI-RADS 3. Thirty-two of 618 targets (5.2%) were "NI-RADS 3, highly suspicious for recurrence" and had the highest overall recurrence rate of 59.4%, with a 54.6% recurrence at the primary site and a 70.0% rate at nodes. Of the 32/618 category 3 lesions, there were 22 primary site lesions and 10 neck lesions. Twentytwo of 32 category 3 targets had pathologic confirmation of disease presence or absence. The remaining 10/32 did not have pathologic confirmation because it would not affect management (n ϭ 7) or the ultrasound or CT correlate of the suspected lesion could not be found when biopsy was attempted (n ϭ 3). Eight of these 10 (80.0%) had clinical or radiologic evidence for recurrence (7 primary site lesions and 1 nodal site), defined as progression at the target site on imaging or clinically obvious tumor.

NI-RADS Performance
Univariate association analysis demonstrated a strong association between the NI-RADS score and ultimate disease persistence/recurrence, with P Ͻ .001 for primary site, lymph node scores, and combined scores. ROC curves for NI-RADS performance at the primary site (Fig 1), lymph nodes (Fig 2), and combined (Fig 3) were obtained and reflect an overall good performance. For the primary site ROC curve (Fig 1) (AUC ϭ 0.787; 95% CI, 0.691-0.881), P Ͻ .001 indicated a good performance of the NI-RADS score to discriminate primary site recurrence versus no recurrence (an AUC value of 1 indicates a perfect discrimination, and an AUC value of 0.5 indicates no use). For lymph nodes, the AUC of 0.712 and an AUC of 0.756 for combined primary and nodal sites indicated good overall performance of this rating scale.

Subgroup Analysis of CECT Alone versus PET/CECT
A subgroup analysis was undertaken comparing the performance of CECT alone versus CECT ϩ PET/CT ( Table 3). The overall   recurrence rate in these 2 groups was similar (7.0% versus 10.1%).
Although there was no statistical difference in overall performance of NI-RADS for CECT (AUC ϭ 0.779) versus CECT/PET (AUC ϭ 0.709), a NI-RADS 3 on CECT alone was more likely to correctly identify recurrence (primary or nodal) compared with a NI-RADS 3 on CECT ϩ PET (91.7% versus 40.0%).

Subgroup Analysis of Initial Posttreatment Study versus Subsequent Studies
An additional subgroup analysis compared the performance on initial posttreatment studies with performance on subsequent follow-up (Table 4). While there was no statistical difference in the overall performance of NI-RADS in initial posttreatment surveillance (AUC ϭ 0.729) versus subsequent scans (AUC ϭ 0.760), the recurrence rate for NI-RADS 1 was greater for the initial baseline scan group (5.7%, 5/88) compared with the subsequent follow-up examination group (3.4%, 15/440). This difference was even more pronounced when looking at the primary site alone (9.3% versus 2.4%). As expected, the incidence of positive disease was also greater in the initial posttreatment group versus surveillance studies (11.2% versus 7.2%).

DISCUSSION
The baseline performance of NI-RADS demonstrated significant discrimination between groups, with disease recurrence/persistence rates of 3.8% for NI-RADS 1, 17.2% for NI-RADS 2, and 59.4% for NI-RADS 3. A strong association between score and positive disease was found for primary site, lymph nodes, and all targets combined, and ROC analysis also demonstrated clinically significant and accurate performance in these categories. While adding additional NI-RADS categories may improve ROC performance, the simplicity of the current scale is appropriate for the limited management options: routine surveillance, shorter interval follow-up, additional PET/CT imaging, mucosal inspection, or biopsy. Because all these patients are part of our institutional surveillance program, with routine follow-up, it is reasonable for the specificity to be high and sensitivity lower. In fact, size cutoffs were set for the "ill-defined" or "questionable" NI-RADS 2 lesions to avoid low-yield, difficult, dangerous, or likely nondiagnostic A, CECT showed only subtle/questionable asymmetric enhancement in the right vallecula (arrow) retrospectively after review of PET. B, Fused PET image shows asymmetric uptake in the right vallecula (arrow). Direct visualization did show ulcerated mucosa, but the biopsy was negative for tumor. Clinically, this was deemed a radiationrelated injury.    Our NI-RADS template has been useful in daily clinical practice. For the primary site, the 2a category is used for low-suspicion superficial mucosal lesions with a linked recommendation of direct inspection. Focal asymmetric enhancement and FDG uptake are not an uncommon finding in posttreatment imaging and could represent benign mucositis or early recurrence/persistence. Although many mucosal abnormalities are false-positives (Fig 4), we are able to identify mucosal recurrences, especially in the postradiated larynx where abnormalities may be subtle on CECT ( Fig  5A). In this clinical scenario, the fused PET images (Fig 5B) help direct inspection and biopsy. For the primary site, the 2b category is used for deep, ill-defined, nondiscrete, low-suspicion lesions with only mild FDG uptake (if combined with PET) (Fig 6). In practice, most category 2 lesions are managed with short-term follow-up rather than biopsy because clinicians and patients were comfortable waiting. Short-term follow-up has become our official recommendation in this category because size criteria were removed.
Finally, NI-RADS 3 is reserved for a discrete, nodular, robustly enhancing lesion (Fig 7A) with marked FDG uptake if PET was also performed (Fig 7B), and the recommendation is for biopsy. In the neck, NI-RADS 3 is a new or enlarging lymph node (Fig 8A) with marked FDG uptake if PET is combined (Fig 8B). The positive predictive value for NI-RADS 3 primary site lesions was lower (54.6%) than for the neck (70%); this finding likely reflects the more complex posttreatment imaging appearance at the primary site. Overall, we believe that the NI-RADS template yielded a reasonable rate of recommending biopsy. Only 32 of 618 possible targets (5.2%) were scored category 3 with biopsy recommendation, balanced against a relatively high positive predictive value (54.6% for the primary site, 70% for the neck).
Our subgroup analyses highlight areas for future study. Although the numbers are small, our data suggest that CECT alone may be more specific because the rate of true persistence/recurrence was much higher for a NI-RADS 3 for CECT (91.7%) alone versus CECT/PET (40%). We also separated our scans into the first posttreatment baseline at 3 months and subsequent surveillance studies to understand the variation of NI-RADS performance at different time points. As expected, a NI-RADS 1 on a subsequent follow-up examination had a higher negative predictive value than on the initial posttreatment examination. This is valu-  T4a larynx squamous cell carcinoma, status post total laryngectomy, bilateral neck dissection, and chemoradiotherapy. A, CECT shows a 1-cm discrete rounded hyperenhancing nodule along the lateral border of neopharynx, deep to the flap (arrow). B, Fused PET images show focal high FDG uptake (arrow). This was given a category 3 score, and endoscopic biopsy demonstrated recurrence. able in providing guidance to patients regarding their risk of disease at different time points.
Finally, NI-RADS provides a meaningful framework for discussion of results with patients. For example, a patient with a NI-RADS 2 on surveillance imaging has a chance of recurrence of roughly 17.2%. We can also reassure patients with NI-RADS 1 that their overall recurrence rate is low (3.8%). There is an opportunity to understand the negative and positive predictive values of NI-RAD scores at different time points in further subpopulation studies. For example, our subgroup analysis and comparison of NI-RADS 1 score primary site recurrence rates for the initial posttreatment examinations versus subsequent examinations found a difference (9.3% versus 2.4%, P ϭ .047), but the overall numbers were small because the incidence of recurrence in this group was so low. This analysis suggests that a NI-RADS 1 on the initial baseline posttreatment examination is not as reassuring as a NI-RADS 1 on subsequent surveillance examinations.

CONCLUSIONS
The performance of NI-RADS was good, demonstrating significant discrimination between groups, with positive disease rates of 3.8% for NI-RADS 1, 17.2% for NI-RADS 2, and 59.4% for NI-RADS 3. Standardization of linked management recommendations and correlation with patient outcomes should validate performance and highlight the added value of radiologists in patient care.