Using FDG-PET to Measure Early Treatment Response in Head and Neck Squamous Cell Carcinoma: Quantifying Intrinsic Variability in Order to Understand Treatment-Induced Change

BACKGROUND AND PURPOSE: Quantification of both baseline variability and intratreatment change is necessary to optimally incorporate functional imaging into adaptive therapy strategies for HNSCC. Our aim was to define the baseline variability of SUV on FDG-PET scans in patients with head and neck squamous cell carcinoma and to compare it with early treatment-induced SUV change. MATERIALS AND METHODS: Patients with American Joint Committee on Cancer stages III-IV HNSCC were imaged with 2 baseline PET/CT scans and a third scan after 1–2 weeks of curative-intent chemoradiation. SUVmax and SUVmean were measured in the primary tumor and most metabolically active nodal metastasis. Repeatability was assessed with Bland-Altman plots. Mean percentage differences (%ΔSUV) in baseline SUVs were compared with intratreatment %ΔSUV. The repeatability coefficient for baseline %ΔSUV was compared with intratreatment %ΔSUV. RESULTS: Seventeen patients had double-baseline imaging, and 15 of these patients also had intratreatment scans. Bland-Altman plots showed excellent baseline agreement for nodal metastases SUVmax and SUVmean, but not primary tumor SUVs. The mean baseline %ΔSUV was lowest for SUVmax in nodes (7.6% ± 5.2%) and highest for SUVmax in primary tumor (12.6% ± 9.2%). Corresponding mean intratreatment %ΔSUVmax was 14.5% ± 21.6% for nodes and 15.2% ± 22.4% for primary tumor. The calculated RC for baseline nodal SUVmax and SUVmean were 10% and 16%, respectively. The only patient with intratreatment %ΔSUV above these RCs was 1 of 2 patients with residual disease after CRT. CONCLUSIONS: Baseline SUV variability for HNSCC is less than intratreatment change for SUV in nodal disease. Evaluation of early treatment response should be measured quantitatively in nodal disease rather than the primary tumor, and assessment of response should consider intrinsic baseline variability.

F DG-PET is the most widely used functional imaging technique in head and neck squamous cell carcinoma. Pretreatment imaging has a significant role in initial staging, prognosis assessment, and target delineation. 1 Posttreatment FDG-PET has become an important tool for the assessment of residual disease in cervical lymph nodes. 2,3 Another area of active investigation is the use of PET to monitor therapy response during treatment. PET performed early in treatment (intratreatment PET) could detect favorable or unfavorable metabolic changes before anatomic changes are evident and could help determine whether a particular therapeutic strategy should be maintained or changed. This approach could enhance the choice of initial treatment and facilitate the use of adaptive radiation therapy strategies, including dose escalation, selection of nonresponding patients for new molecularly targeted therapies, or discontinuation in favor of primary surgery, among other options. 4 Early response assessment with FDG-PET has been evaluated in lymphoma, soft-tissue sarcoma, and esophageal and lung cancers. [5][6][7][8] Findings that early treatment changes in glucose metabolism can predict histopathologic response or survival have led to proposals of using standardized uptake value cutoff values to stratify patients by outcome. [5][6][7] One of the largest studies of neo-adjuvant chemotherapy for esophageal cancer identified responders with high sensitivity by using 0% SUV decrease as a cutoff (ie, any decrease in SUV), 5 and the authors concluded that a decrease in SUV of any magnitude would indicate an early treatment response. In practice, using such small changes to signify treatment response should be viewed with caution, for it is known that PET scans repeated days or even hours apart without intervening treatment can vary considerably in terms of SUV. [9][10][11][12][13] The significance of this phenomenon is that change observed during the course of treatment must be greater than inherent baseline variability to correctly attribute the observed change to the treatment itself. Intrinsic variability of SUV in the absence of treatment reflects biologic, technical, and observer variation. This fluctuation was observed recently in HNSCC in a study that evaluated change in SUV max on pretreatment PET/CT scans that were performed on different scanners. 13 The authors warned about the need to account for variability in PET biomarkers in clinical protocols. Wahl et al, 14 who proposed criteria for the Positron Emission Response Criteria in Solid Tumor (PERCIST) trial, also stated that more studies were needed to address questions concerning the reproducibility of baseline quantitative readings and PET response during the initial phases of treatment. Data are sparse, but baseline tumor PET metabolic activity for tumors outside the head and neck can vary by 10%-16% in single-center studies 9-12 and up to 39% in multicenter studies. 11 Quantification of both baseline variability and intratreatment change is necessary to optimally incorporate functional imaging into adaptive therapy strategies for HNSCC. The aim of this prospective study was to define the intrinsic (pretreatment) variability of tumor SUV and compare it with early treatment-induced (intratreatment) change in patients with HNSCC. We hypothesized that intratreatment changes in HNSCC would be larger than the intrinsic variability in metabolic activity in patients responding favorably to treatment. A secondary aim was to determine whether the magnitude of intrinsic variability differed between primary tumor and nodal metastases or according to the parameter used to describe SUV, namely SUV maximum (SUV max ) or SUV mean .

Patient Selection and Imaging Protocol
Patients with newly diagnosed American Joint Committee on Cancer stages III-IV head and neck squamous cell carcinoma scheduled to undergo curative-intent chemoradiotherapy were prospectively enrolled between September 2009 and August 2011. Exclusion criteria included patients younger than 18 years of age, the presence of a synchronous second malignancy, and diabetes mellitus. To avoid data contamination by image noise, we excluded patients with both tumor and node SUV max of Յ4. The Cancer Center Protocol Review Committee and the institutional review board of our institution approved the trial. Written informed consent was obtained from all patients before enrollment.
The imaging protocol specified the performance of 2 baseline pretreatment FDG-PET/CT scans (PET1 and PET2) separated by 1 week. The second scan was to be obtained just before the initiation of therapy. The third PET/CT was to be obtained after completion of the first week of CRT (PET3) to assess early treatment-induced change. All patients were scheduled to receive a standard institutional regimen of CRT, consisting of intensity-modulated radiation therapy, 2 Gy once daily to 70 Gy. Chemotherapy consisted of 2 cycles of cisplatin during weeks 1 and 5 of intensitymodulated radiation therapy (20 mg/m 2 /day ϫ 5 days per each cycle).

PET/CT Scanning Technique
All acquisitions were performed by using 1 of 2 integrated PET/CT scanners, the Discovery STE with 16-section CT (GE Healthcare, Milwaukee, Wisconsin) and the Biograph mCT PET/CT System with 128-detector CT (Siemens Medical Solutions, Erlangen, Germany). All scans for any given patient were obtained on the same scanner. Patients fasted for at least 4 hours before intravenous administration of FDG (5.92 MBq/kg of body weight, with a minimum of 296 MBq and maximum of 555 MBq). Serum glucose concentrations were obtained in all patients and were Ͻ200 mg/dL (11.1 mmol/L) (normal range, 70 -115 mg/dL) in all patients. After an uptake phase of 60 minutes, patients were positioned in a head and neck immobilizer device, and unenhanced CT from the midcranium to the thoracic inlet was performed with the arms down (3.75-mm-thick contiguous images with 30-cm FOV).
The CT scan was followed by dedicated PET/CT neck images obtained during 1 or 2 bed positions (position-emission scan, 68 minutes/bed), with the patient's arms down. Both PET scanners had a resolution of 5-mm full width at half maximum and yielded PET sections with 3.27-mm center-to-center spacing. PET images were reconstructed with corrections for attenuation, scatter, random events, and dead time by using ordered subsets expectation maximization, resulting in a 128 ϫ 128 matrix. The FOV was 30 cm with 3 iterations of ordered subsets expectation maximization. Contrast-enhanced CT scans were obtained separately if they had not been obtained with the baseline PET/CT, to optimize target delineation to facilitate treatment planning.

Image Analysis
PET/CT images were analyzed by a fellowship-trained, boardcertified neuroradiologist (10 years' experience reading CT, without a Certificate of Added Qualification, 50% practice in head and neck imaging) who knew the site of the primary tumor but was blinded to the tumor and nodal staging and clinical treatment response. All PET studies were analyzed quantitatively with a software platform capable of deformable registration of multimodality images (VelocityAI; Velocity Medical Solutions, Atlanta, Georgia) in axial, coronal, and sagittal planes. On the PET scans, metabolic volumes were manually delineated in 2 tumor sites: the primary tumor and the most metabolically active nodal metastasis. Correlation to CT images was made to ensure accurate delineation.
The SUV was calculated by using the following formula: where c dc is the decay-corrected tracer tissue concentration (in becquerels per gram), d i is the injected dose (in becquerels), and w is the patient's body weight (in grams). SUV was measured as SUV max and SUV mean . These parame-ters were obtained from a VOI that was generated by contouring a region of interest onto all axial images covering the metabolically active tumor. Large photopenic areas in the center of the nodal disease were excluded. SUV max was defined as the highest pixel value in the VOI. SUV mean was defined by the average pixel value for the VOI. This approach was used for the 2 baseline and the single intratreatment scans.

Statistical Analysis
Differences in metabolic activity among the 2 scans were described as SUV unit differences and SUV percentage differences. If SUV1, SUV2, and SUV3 are the respective measurements of SUV on PET1, PET2, and PET 3, then the formulas for SUV unit differences (⌬) and SUV percentage differences (%⌬) are as follows: Repeatability of baseline measurements was examined graphically for each patient by using a Bland-Altman plot, which displays the mean of SUV1 and SUV2 against the difference between SUV1 and SUV2 (SUV1 Ϫ SUV2). 15 Baseline repeatability was quantified with the intraclass correlation coefficient. 10 Intratreatment change in the SUV was compared with baseline differences in SUV. The paired t test was used to determine whether there were statistically significant differences between these 2 measurements. A P value of Ͻ.05 was statistically significant. The ICC for SUV1 and SUV2 was also compared with the ICC for SUV2 and SUV3. A repeatability coefficient was also derived from the baseline %⌬SUV and was compared with the intratreatment %⌬SUV. The RC has been applied to measure intrinsic baseline variability 10 and is defined as the SD of baseline change multiplied by 1.96. This implies that the difference between repeated test results can be expected to be greater than RC only 5% of the time. Thus, intratreatment %⌬SUV would have to be greater than the RC to be confident that the change represented more than baseline variability.
Finally, the SUVs of the 3 PET scans were examined for significant treatment changes by using a 2-factor ANOVA with subject and treatment status (pre-or post-) being the 2 factors.

Patients and Treatment Outcome
Nineteen patients were enrolled in the study. Two patients were excluded because the SUV max in both the primary tumor and node at baseline was Յ4. The 17 remaining patients were all men with a mean age of 51 Ϯ 8.5 years. Table 1 summarizes patient demographics and disease characteristics. All patients had double-baseline PET scans. Two of 17 patients did not have intratreatment scans because they did not wish to have the third scan. The median interval between the 2 baseline scans was 9 days (interquartile range [IQR], 7-13 days). The median interval between the intratreatment and second baseline scan was 13 days (IQR, 12.5-17 days). The median radiation dose to the clinical target volume at the time of the intratreatment scan was 12 Gy (IQR, 10 -14 Gy). The timing in days between PET2 and PET3 is shown on Table 1. Three patients were scanned in the third intratreatment week because of conflicts in scheduling scanning and treatment or technical issues with the scanner.
Fifteen patients had no residual disease after CRT by clinical examination and/or on a PET/CT performed 12 weeks posttreatment. The posttreatment PET scan was obtained per clinical routine but not explicitly as part of this study. Two patients (7 and 14) had pathologically confirmed residual disease in the cervical lymph nodes.

Baseline Repeatability
The baseline repeatability for primary tumor and nodes are displayed as Bland-Altman plots in Fig 1. There was excellent agreement between the 2 baseline studies for nodes measured as SUV max and SUV mean . The mean difference between baseline nodal SUV was 0.58 Ϯ 0.90 for SUV max and 0.33 Ϯ 0.71 for SUV mean . Figure 1A shows that all points for nodal SUV max were within the mean Ϯ 1.96 ϫ SD. The solid black line in Fig 1 is the line of least-squares best fit for the association of the difference with the average. The slope of the line is close to zero, suggesting that the size of the difference in SUV does not depend on the magnitude of the SUV. The Bland-Altman plots for primary tumor revealed poorer agreement compared with nodal disease ( Fig  1C, -D). In particular, at higher SUVs for primary tumor, there were larger differences between the 2 baseline SUV measurements.
The ICC for SUV1 and 2 (baseline SUVs) was high for SUV max and SUV mean in primary tumor and nodes (Table 1), but it was lowest for primary tumor SUV mean (0.91) and highest for nodal SUV max (0.95) ( Table 2).

Intratreatment Change
The difference between mean baseline SUV variability and mean intratreatment change was larger for nodes than for primary tu-  mor as seen in Table 2. This was due to poorer repeatability in primary tumor compared with nodes. The differences between baseline and intratreatment %⌬SUV did not reach statistical significance by the paired t test, but we did find a nonoverlapping 95% confidence interval for the ICC of SUV1 and 2 (baselinebaseline) and the ICC of SUV2 and 3 for both nodal SUV max and nodal SUV mean . This suggested that the ICC of SUV2 and 3 was significantly different from the ICC of SUV1 and 2. In contrast, primary tumor SUV had an overlapping 95% confidence interval for the 2 sets of ICCs (not statistically significant). The calculated repeatability coefficients for baseline nodal SUV max and SUV mean were 10% and 16%, respectively. Any values less than RC for %⌬ intratreatment SUV could be due to intrinsic baseline variability rather than true intratreatment change. Only 1 patient had an intratreatment increase in both nodal SUV max and SUV mean above the respective RCs. This was patient 7, who was one of the patients with residual disease after completion of CRT. The other patient who was a nonresponder (patient 14) did not have a rise in the SUV max or SUV mean , but his intratreatment PET was performed later (after 22 Gy of radiation) compared with after 12 Gy for patient 7 (group median, 12 Gy).
ANOVA showed a significant effect of treatment on nodal SUV max and SUV mean (P Ͻ .002). For the primary tumor, the effect of treatment was significant for SUV max (P ϭ .006) and marginal for SUV mean (P ϭ .06).

DISCUSSION
The proper interpretation of treatment-induced changes in tumor glucose use with PET requires a quantitative understanding of the inherent variation of this metabolic parameter in the absence of treatment. Metabolic processes are dynamic and fluctuate as opposed to anatomic parameters such as tumor volume, which are relatively stable and static. The current study is the first to address this problem for HNSCC via the performance of double-baseline scans and an intratreatment scan in the same patient on the same scanner. Changes in glucose metabolism during the early phases of CRT were greater than the intrinsic variation in nodal metastases but not the primary tumor.
Temporal variability in pretreatment metabolic activity for HNSCC was recently reported by Chu et al 13 in a retrospective study using diagnostic PET/CT and planning PET/CT. They reported serial change as mean composite SUV velocity (SUV max change divided by time in weeks between scans). Factors that contributed to a mean composite SUV velocity of Ϫ0.1/week and a wide SD of 2.0 were a longer interval between scans (median interval of 3 weeks) and use of different PET/CT scanners.
The current study attempted to control for these additional factors affecting repeatability. The magnitude of pretreatment SUV variability in nodes and primary tumor ranged from 8% to 13%, which is consistent with the literature for lung and gastrointestinal malignancies with similar short time intervals between baseline scans. These tumors have mean baseline %⌬SUV ranging from 3% to 16%, with repeatability coefficients of 15%-20%. [9][10][11][12] The implication of this finding is that changes observed on PET scans obtained during the early portions of treatment that fall within these ranges should be interpreted with caution because they may only represent intrinsic fluctuation in metabolic activ-ity. This concept is fundamental and must also be considered in the use of other non-FDG-PET isotopes and other functional imaging modalities, including CTP, dynamic contrast-enhanced MR imaging, and DWI.
Greater baseline variability in the primary tumor than in lymph nodes was an unexpected finding. The exact causes are unknown, but 2 explanations are possible: First, the primary tumor had overall higher SUV compared with nodes, and there may be poorer repeatability as the metabolic activity increases beyond an SUV threshold. This pattern was best appreciated on the Bland-Altman plots for primary tumor (Fig 1). A second factor could be tumor morphology. The infiltrative nature of primary tumors causes their boundaries with normal tissue to be more poorly defined than for lymph nodes, which are often surrounded by fat. Moreover, contouring the margins of metabolically active primary tumor without including surrounding inflamed or reactive mucosa is challenging. Smaller baseline variability in nodal disease compared with the primary head and neck tumor was also a finding in the study by Chu et al. 13 The repeatability was not quantified, but it was graphically displayed in plots showing the absolute change in SUV max with time in primary tumor and nodes for serial PET/CT scans.
The purpose of defining baseline variability was to better interpret intratreatment change. Early treatment change in SUV for HNSCC has not been reported, but Geets et al 16 did study FDG-PET intratreatment changes in metabolic gross tumor volume in HNSCC. This group reported a mean metabolic gross tumor volume reduction of 34% after 14 Gy (range, 2%-100%). We chose to evaluate change in SUV rather than gross tumor volume, and we separated changes in nodal SUV from those in the primary tumor. The rationale for using SUV is that it reflects the magnitude of tumor metabolic activity rather than size and adheres to PERCIST criteria. 14 PERCIST advocates viewing PET tumor response as a continuous SUV variable, and it defines a metabolic partial response as Ͼ30% decrease in SUV after completion of therapy. PERCIST does not offer criteria for intratreatment change, however. The data from our study suggest that in the early stages of treatment using a relative change of Ͼ10% in nodal SUV max and Ͼ16% in nodal SUV mean in any direction could be indicative of a true increase or decrease in tumor glucose metabolism.
There are limitations to this study. First, it was a small singleinstitution pilot study, which limits the generalizability of the results. A larger trial will be required to validate the percentage cutoff criteria for true treatment-induced change for nodal SUV max and SUV mean . The results do, however, provide insight into the need to interpret small changes observed during the early phases of treatment with caution, given baseline variability. It is also obviously necessary to follow our study patients to determine the correlations between intratreatment changes and disease recurrence. However, very few patients may recur in our group since most had human papillomavirus positive oropharyngeal cancer.
Second, SUV max and SUV mean are summary parameters of tumor FDG uptake, while a malignant mass is heterogeneous in composition and metabolic activity. They represent "low-hanging fruit" and may not be the most robust parameters for charac-terizing metabolic activity of a tumor. A segmented or voxelwise approach to evaluating functional imaging of tumor should also be considered, and such analyses will be a focus of our future effort. These alternative image-analysis methods may provide additional information about the 2 patients with low-metabolic-activity tumors at baseline who were excluded from this study. Additionally, there may be interobserver and intraobserver variability because SUV measurements were made by 1 radiologist. However, this is less likely to affect SUV max and is minimized by deriving SUV from the VOI rather than from the region of interest.
Finally, the optimal time for assessing intratreatment metabolic response is unknown. This matter should be investigated in future studies. If imaging is performed too early, the effects of therapy may be small and partially masked by acute inflammatory changes. If imaging is performed during the latter phases of treatment, then the opportunity to modify therapy on the basis of an unfavorable response may be lost. This study suggests that the best timing would be when relative change is at least greater than the baseline variability.