The Diagnostic Value of Diffusion-Weighted Imaging in Differentiating Metastatic Lymph Nodes of Head and Neck Squamous Cell Carcinoma: A Systematic Review and Meta-Analysis

BACKGROUND: Accurate lymph node staging is crucial for proper treatment planning for metastasis in patients with head and neck squamous cell carcinoma. PURPOSE: Our aim was to evaluate the diagnostic performance of DWI for differentiating metastatic cervical lymph nodes from benign cervical lymph nodes in patients with head and neck squamous cell carcinoma and to identify optimal cutoff values for ADC. DATA SOURCES: A computerized literature search was performed to identify relevant original articles in Ovid MEDLINE and EMBASE. STUDY SELECTION: Studies evaluating the diagnostic performance of DWI for differentiating metastatic cervical lymph nodes from benign cervical lymph nodes were selected. DATA ANALYSIS: Diagnostic meta-analysis was conducted with a bivariate random-effects model, and a hierarchical summary receiver operating characteristic curve was obtained. Meta-regression was also performed. DATA SYNTHESIS: Nine studies with 337 patients were included. In all studies, ADC values derived from metastatic lymph nodes were significantly lower than ADC values derived from benign lymph nodes. The median ADC cutoff value was 0.965 × 10−3 mm2/s. The pooled sensitivity and specificity for the diagnostic performance of DWI in differentiating metastatic lymph nodes from benign lymph nodes were 90% (95% CI, 84%–94%) and 88% (95% CI, 80%–93%), respectively. In the meta-regression, sensitivity was significantly higher in the studies using a 3-mm slice thickness (93% [95% CI, 88%–98%]) than in studies using a slice thickness of >3 mm (86% [95% CI, 77%–95%], P < .01). LIMITATIONS: A small number of studies were included in our meta-analysis. CONCLUSIONS: DWI demonstrated high diagnostic performance for differentiating metastatic lymph nodes from benign lymph nodes in patients with head and neck squamous cell carcinoma, and the median ADC cutoff value was 0.965 × 10−3 mm2/s. A 3-mm DWI slice thickness can provide a slight improvement in sensitivity.

L ymph node (LN) metastasis is an adverse prognostic factor in patients with head and neck squamous cell carcinoma (HNSCC), and accurate LN staging is crucial for proper treatment planning. The National Comprehensive Cancer Network (NCCN) guidelines recommend CT and/or MR imaging with contrast for the initial work-up in patients with HNSCC. 1 CT and MR imaging are useful for determining morphologic criteria, including shape, size, internal architecture, extracapsular extension, and vascular features associated with LN metastasis; however, diagnostic performance is limited, especially in normal-sized non-necrotic LNs. 2,3 During the past decade, diffusion-weighted imaging has been used for differentiating metastatic cervical LNs from benign cervical LNs in patients with HNSCC. Some studies have reported a high diagnostic performance for DWI, [4][5][6][7][8][9][10][11][12] whereas other studies have demonstrated disappointing results. 13,14 In addition, various cutoff values have been proposed for the apparent diffusion coefficient.
We considered it timely to review the DWI protocols, parameters, and reported diagnostic performances because there are no published systematic reviews or meta-analyses on the topic, to our knowledge. Therefore, we aimed to evaluate the diagnostic performance of DWI for differentiating metastatic cervical LNs from benign cervical LNs in patients with HNSCC and to identify the optimal ADC cutoff value.

MATERIALS AND METHODS
This study adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. 15

Search Strategy
A preliminary literature search demonstrated that various terminology was used to indicate cervical LN metastasis, including "cervical nodal metastasis," "malignant cervical lymph nodes," "metastatic cervical lymph node," and "cervical lymphadenopathy." These synonyms for cervical LN metastasis were used in the search terms for Ovid MEDLINE and EMBASE: ((cervical lymph node metastasis) OR (cervical nodal metastasis) OR (malignant cervical lymph nodes) OR (metastatic cervical lymph node) OR (cervical lymphadenopathy)) AND ((diffusion-weighted) OR (DWI) OR (apparent diffusion coefficient) OR (ADC)). The literature search was not limited to a publication date or study setting but was limited to English-language publications. Any additional relevant studies identified were investigated, and the literature search was updated until January 3, 2018.

Study Selection
We used the following eligibility criteria: 1) patients with biopsyproved HNSCC who underwent preoperative MR imaging including DWI, 2) histopathology as a reference standard, 3) provision of the diagnostic performance and corresponding ADC cutoff value for differentiating metastatic cervical LNs from benign cervical LNs, and 4) published original articles. Case reports/ series (including Ͻ10 patients), reviews, conference abstracts, and studies including other types of tumor (including nasopharyngeal carcinoma or lymphoma), or a study population overlapping other studies were excluded. Authors of the studies were contacted for further information when 2 ϫ 2 tables could not be acquired.

Data Extraction and Quality Assessment
The following information was extracted from the selected studies using a standardized form: 1) study characteristics: authors, publication years, institution, study period, study design, and data analysis (per LN versus per patient); 2) demographic characteristics: sample size, mean age, age range, sex, and proportion of metastatic LNs; 3) MR imaging characteristics: magnetic field strength, MR imaging vendor, MR imaging scanner, coil, DWI sequence, b-values (seconds/square millimeter), TR, TE, slice thickness, interslice gap, matrix, FOV, number of signal acquisitions, scan time, number of readers, experience of readers; and 4) outcomes: a 2 ϫ 2 contingency table (number of truepositive, false-positive, false-negative, and true-negative results) demonstrating the presence of metastatic LNs according to the ADC values and optimal ADC cutoff values for differentiating metastatic from benign LNs.
The risk of bias was assessed for each selected study, according to the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) criteria. 16 Two reviewers (C.H.S. and Y.J.C.) independently performed study selection, data extraction, and quality assessment.

Statistical Analysis
A diagnostic meta-analysis of the DWI was conducted with a bivariate random-effects model. [17][18][19] Individual study sensitivity/ specificity and pooled sensitivity/specificity were plotted using a coupled forest plot. The pooled positive likelihood ratio, negative likelihood ratio, and diagnostic odds ratio were calculated. "Positive likelihood ratio" was defined as the likelihood that a DWI result positive for differentiating metastatic LNs from benign LNs would occur in patients with metastatic LNs. "Negative likelihood ratio" was defined as the likelihood that a DWI result negative for differentiating metastatic LNs from benign LNs would occur in patients without metastatic LNs. The "diagnostic odds ratio" was defined as the odds of having a positive DWI result in patients with metastatic LNs compared with the odds of having a positive DWI result in patients without metastatic LNs. A hierarchical summary receiver operating characteristic curve with 95% confidence and prediction regions was obtained, and the area under the hierarchical summary receiver operating characteristic curve was calculated.
Heterogeneity across the studies was explored using the inconsistency index (I 2 ) and Cochran Q-statistics. 20 I 2 values of Ͼ50% indicated the presence of heterogeneity across the studies. 21 Visual assessment of a coupled forest plot (inverse correlation indicating the presence of a threshold effect) and a Spearman correlation (a coefficient of Ͼ0.6 indicating the presence of a threshold effect) were performed to evaluate any threshold effect (positive correlation between sensitivity and the falsepositive rate). 22 Visual assessment of the difference between the 95% confidence and prediction regions in the hierarchical summary receiver operating characteristic curve (a large difference indicating heterogeneity) was also performed. The presence of publication bias was assessed by a Deeks funnel plot asymmetry test, 23 and a slope coefficient with P Ͻ .1 was considered significant small-study bias.
All statistical analyses were conducted by one of the reviewers (C.H.S., with 5 years of experience in conducting systematic reviews and meta-analyses) using commercially available software (STATA 15.0, StataCorp, College Station, Texas; and R statistical and computing software, Version 3.4.1; http://www.r-project. org/). P Ͻ .05 indicated statistical significance. Figure 1 provides an overview of the search strategy and studyselection procedure. After 15 non-English studies were excluded, our search yielded 214 records, of which 35 articles remained after screening of the titles and abstracts. The full text of these studies was reviewed, and 26 studies were excluded as follows (On-line Appendix): studies evaluating patients with enlarged cervical LNs (not all patients had HNSCC [n ϭ 9]), studies including patients with non-HNSCC malignancy (nasopharyngeal carcinoma or lymphoma [n ϭ 5]), a study population partially overlapping other studies (n ϭ 4), studies that did not allow a 2 ϫ 2 contingency table to be obtained (n ϭ 4), and studies not in the field of interest (n ϭ 4). There were no studies reporting the diagnostic performance of DWI without mentioning the corresponding ADC cutoff value. Ultimately, 9 studies with 337 patients were included in this meta-analysis and were considered for further analyses. [4][5][6][7][8][9][10][11][12]

Study Characteristics and Quality Assessment
The relevant study characteristics are summarized in On-line Table 1. Eight of 9 studies analyzed the diagnostic performance of DWI per LN, 4,5,7-12 whereas 1 study performed analysis on a perpatient basis. 6 The number of included patients ranged from 16 to 80, and the number of LNs ranged from 34 to 651. There were 7 prospective studies 4-8,10,11 and 1 retrospective one, 9 with the study design not being explicit in a further study. 12 Informed consent was obtained in 8 studies, 4,6-12 as was approval by an ethics committee or institutional review board. [4][5][6][7][8][10][11][12] The results of the methodologic quality assessment according to QUADAS-2 are presented in Fig 2. Most studies were considered to have a low risk of bias and minimal concerns regarding applicability. Common weaknesses involved uncertainties in blinding to the reference standard when analyzing the MR imaging results and a poorly documented time interval between MR imaging and the reference standard. In the patient-selection domain, 2 studies had a high risk of bias due to a case-control design 9 or inappropriate exclusion criteria. 8 In the index test domain, 3 studies had an unclear risk of bias because no information was provided on blinding to the reference standard. 6,7,12 In the reference standard and flow/timing domain, 1 study had a high risk of bias and a high concern regarding applicability because both histopathology and follow-up imaging results were used as a reference standard. 9 No studies were excluded from the meta-analysis on the basis of the quality assessment.

Data Analysis
In all studies, ADC values derived from metastatic LNs were significantly lower than ADC values derived from benign LNs. The optimal ADC cutoff values varied slightly among individual studies, ranging from 0.851 ϫ 10 Ϫ3 mm 2 /s to 1.038 ϫ 10 Ϫ3 mm 2 /s. The median ADC cutoff value was 0.965 ϫ 10 Ϫ3 mm 2 /s. The individual sensitivities ranged from 80% to 97%, and the individual specificities ranged from 65% to 96%.
Heterogeneity was present, with I 2 values exceeding 50% for both sensitivity and specificity. Visual assessment of the coupled forest plots revealed no threshold effect, and the Spearman correlation coefficient was Ϫ0.471 (95% CI, Ϫ0.865-0.281), also indicating no threshold effect. The slope coefficient for the Deeks funnel plot for differentiating metastatic LNs from benign LNs is presented in Fig 5 and suggests slight asymmetry in the data (P ϭ .03) and possible publication bias.
The Table shows the results of the meta-regression to explore the influence of 10 covariates on pooled sensitivity and specificity. Slice thickness was revealed to be a significant factor affecting study heterogeneity. Sensitivity was significantly higher in studies using a 3-mm slice thickness (93% [95% CI, 88%-98%]) than in studies using a slice thickness of Ͼ3 mm (86% [95% CI, 77%-95%], P Ͻ .01). Otherwise, the analysis method, percentage of metastatic LNs, underlying disease, study design, consecutive enrollment, number of readers, magnetic field strength, maximum b-value, and ROI used for ADC measurement were not significant factors affecting heterogeneity. MR imaging using a 3T scanner, a maximum b-value of 1000 s/mm 2 , and an ADC measurement of the whole node all showed slightly higher sensitivity; however, the differences did not reach statistical significance.

DISCUSSION
The present systematic review and meta-analysis demonstrated that in patients with HNSCC, the ADC derived from metastatic LNs was significantly lower than the ADC derived from benign LNs. The median ADC cutoff value was 0.965 ϫ 10 Ϫ3 mm 2 /s. In addition, our study demonstrated a high pooled sensitivity and specificity for the diagnostic performance of DWI for differentiating metastatic from benign LNs in patients with HNSCC. The meta-regression revealed that studies using a 3-mm slice thickness had higher sensitivity than studies using a slice thickness of Ͼ3 mm. Therefore, DWI using a 3-mm slice thickness should be optimally considered for differentiating metastatic from benign cervical LNs.
A dedicated sequence optimization is essential to obtain optimized DWI. In the meta-regression, sensitivity was significantly higher in studies using a 3-mm slice thickness (93% [95% CI, 88%-98%]) than in studies using a slice thickness of Ͼ3 mm (86% [95% CI, 77%-95%], P Ͻ .01). A 3-mm slice thickness may help to detect smaller sized LNs. In addition, 3T MR imaging (92% [95% CI, 86%-98%]), the use of a maximum b-value of 1000 s/mm 2 (91% [95% CI, 86%-97%]), and ADC measurement of the whole node (93% [95% CI, 89 -97]) showed slightly higher sensitivity, though the differences did not reach statistical significance. A previous report also mentioned that a high gradient strength substantially increases the signal-to-noise ratio and that applying a larger number of bvalues not only reduces the influence of noise propagation in ADC calculations but also decreases the risk of motion-related artifacts. 10 When one uses DWI to differentiate metastatic from benign LNs in patients with HNSCC, use of a 3-mm slice thickness, a 3T scanner, a maximum b-value of 1000 s/mm 2 , and ADC measurement of the whole node should all be considered to obtain a high diagnostic performance. Considerable effort is required to achieve standardization, and further studies are needed.
Among the included studies, the ADCs derived from metastatic LNs were consistently lower than the ADCs derived from benign LNs. The lower ADC values are probably due to the tumor microstructure in metastatic LNs, which typically show a larger number of cells, cellular polymorphism, and increased mitosis in comparison with benign LNs; these characteristics may reduce the extracellular extravascular space and decrease the ADC value. 24 In our study, the optimal ADC cutoff values ranged from 0.851 ϫ 10 Ϫ3 mm 2 /s to 1.038 ϫ 10 Ϫ3 mm 2 /s, with 7 of 9 studies reporting optimal cutoff values between 0.94 ϫ 10 Ϫ3 mm 2 /s and 1.038 ϫ 10 Ϫ3 mm 2 /s, which is a relatively small variation. In addition, the median ADC cutoff value was 0.965 ϫ 10 Ϫ3 mm 2 /s. We recognize that our study has several limitations. First, a small number of studies were included in our meta-analysis; therefore, we cannot evaluate all potential causes of heterogeneity. Although we found that slice thickness was a significant factor affecting study heterogeneity, other technical aspects, including different TRs/TEs and different sets of b-values, may  account for some portion of the heterogeneity. In addition, the low number of included studies may limit the power to achieve statistical significance. Second, publication bias was reported. One possible reason is that 2 studies showing negative results were excluded because of the nonavailability of 2 ϫ 2 contingency tables. 13,14 Therefore, our results should be interpreted cautiously, and the high diagnostic performance of DWI may have been overestimated. To overcome these limitations, we included a relatively homogeneous study population (ie, biopsy-proved HNSCC) and performed an extensive meta-regression using 10 covariates. Moreover, we applied recent robust methodology (hierarchical logistic regression modeling [17][18][19] ) and reported our results according to prestigious guidelines (the Preferred Reporting Items for Systematic Review and Meta-Analysis 15 and the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration 25 ). Nevertheless, caution should be used when applying our results to daily clinical practice.

CONCLUSIONS
DWI demonstrated a high diagnostic performance for differentiating metastatic from benign cervical LNs in patients with HNSCC, and the median ADC cutoff value was 0.965 ϫ 10 Ϫ3 mm 2 /s. A 3-mm slice thickness for DWI can slightly improve sensitivity. Further large prospective multicenter studies are required to confirm these findings.