Diffusion-Weighted Imaging of the Head and Neck in Healthy Subjects: Reproducibility of ADC Values in Different MRI Systems and Repeat Sessions

BACKGROUND AND PURPOSE: DWI is typically performed with EPI sequences in single-center studies. The purpose of this study was to determine the reproducibility of ADC values in the head and neck region in healthy subjects. In addition, the reproducibility of ADC values in different tissues was assessed to identify the most suitable reference tissue. MATERIALS AND METHODS: We prospectively studied 7 healthy subjects, with EPI and TSE sequences, on 5 MR imaging systems at 3 time points in 2 institutions. ADC maps of EPI (with 2 b-values and 6 b-values) and TSE sequences were compared. Mean ADC values for different tissues (submandibular gland, sternocleidomastoid muscle, spinal cord, subdigastric lymph node, and tonsil) were used to evaluate intra- and intersubject, intersystem, and intersequence variability by using a linear mixed model. RESULTS: On 97% of images, a region of interest could be placed on the spinal cord, compared with 87% in the tonsil. ADC values derived from EPI-DWI with 2 b-values and calculated EPI-DWI with 2 b-values extracted from EPI-DWI with 6 b-values did not differ significantly. The standard error of ADC measurement was the smallest for the tonsil and spinal cord (standard error of measurement = 151.2 × 10−6 mm/s2 and 190.1 × 10−6 mm/s2, respectively). The intersystem difference for mean ADC values and the influence of the MR imaging system on ADC values among the subjects were statistically significant (P < .001). The mean difference among examinations was negligible (ie, <10 × 10−6 mm/s2). CONCLUSIONS: In this study, the spinal cord was the most appropriate reference tissue and EPI-DWI with 6 b-values was the most reproducible sequence. ADC values were more precise if subjects were measured on the same MR imaging system and with the same sequence. ADC values differed significantly between MR imaging systems and sequences.

A lmost 3% of all malignancies are head and neck cancer, 95% of which are squamous cell carcinomas. 1 MR imaging is one of the modalities used in the work-up of patients with head and neck cancer. 2 DWI is an MR imaging technique by which diffusion properties of water can be quantified as an ADC value. 3 Changes in ADC are inversely correlated with changes in cellularity. 4 In tissues with high cellularity, diffusion of extracellular water in particular is limited by cell membranes, which give low ADC values. In tissues with low cellularity, when diffusion is facilitated (eg, in edematous or necrotic tissue), ADC values are high.
Indications for DWI in head and neck cancer include tissue characterization of primary tumors and nodal metastases, prediction and monitoring of treatment response after chemotherapy or radiation therapy, and differentiation of radiation changes and residual or recurrent disease. 5 Neither the optimal DWI sequence for assessment of the head and neck region nor its reproducibility has been clearly established, to our knowledge. DWI can be performed with either EPI or TSE sequences, of which the EPI sequence is most commonly used in the head and neck area. 6,7 On EPI-DWI, more malignant lesions can be detected and lesion delineation is facilitated. However, the interobserver agreement of ADC values is reported to be higher on TSE-DWI, probably due to the frequent occurrence of artifacts and geometric distortions in EPI-DWI. 8 Currently the use of DWI in head and neck imaging is mostly confined to research protocols and advanced academic centers.
Before DWI can be used in multicenter studies, its reproducibility across different centers and MR imaging systems should be validated. 9 ADC values may be affected by the selected technique and MR imaging system (eg, due to differences in gradient systems, coils, pulse-sequence designs, imaging parameters, and artifacts related to susceptibility effects or eddy currents). 10 Information on variance is needed. 11 Furthermore, the use of reference tissues might help ascertain the variability among different MR imaging systems and could potentially help correct for differences in ADC values among MR imaging systems.
The purpose of this prospective study was to determine the reproducibility of ADC values in the head and neck region obtained from DWI on the basis of both EPI and TSE sequences in repeated measurement on different MR imaging systems in healthy subjects. In addition, we assessed which tissue shows the highest reproducibility in ADC values, so that it could function as a reference tissue in future studies.

Subjects
The study population consisted of 7 healthy subjects, 5 men and 2 women (age range, 27-54 years; median age, 30 years). The subjects were examined in 2 institutions: VU University Medical Center and University Hospitals Leuven. All examinations were performed in 2011, after obtaining approval from the relevant institutional review boards and written informed consent from all subjects. We used the following MR imaging systems: I) Avanto (Siemens, Erlangen, Germany), II) Sonata (Siemens), III) Signa HDxt (GE Healthcare, Milwaukee, Wisconsin), and IV) Aera (Siemens), all at 1.5T, and V) Achieva (Philips Healthcare, Best, the Netherlands) at 3T. All examinations were performed with a dedicated head and neck radiofrequency coil in combination with a spine-array coil.
All subjects were examined on all MR imaging systems at 3 time points per MR imaging system, yielding a total of 15 sessions per subject. Two examinations were performed on the same day (between examinations, the subject was removed from the MR imaging system), and 1 examination, at least 1 month later.

Imaging Protocol
Each session included an anatomic T2-weighted sequence through the neck and up to 3 DWI sequences, with acquisition parameters as similar as possible among the MR imaging systems. Due to technical limitations, no EPI-DWI with 6 b-values (6b) was performed on 1 MR imaging system (Signa HDxt), and on 2 MR imaging systems (Aera and Achieva), no separate EPI-DWI with 2 b-values (2b) was performed. The sequences used per MR imaging system are shown in Table 1

Data Analysis
All ADC maps were calculated on-line or off-line by using the MR imaging system software of the respective vendor. EPI-DWI-6b was analyzed assuming a monoexponential ADC. ADC values for EPI-DWI-2b on the 2 MR imaging systems without EPI-DWI-2b were derived from EPI-DWI-6b by selecting only the images acquired by using bϭ0 s/mm 2 and bϭ1000 s/mm 2 . 12 These "generated" EPI-DWI-2b data were compared with the other EPI-DWI-2b data. Data were transferred to a DICOM viewer.
For each examination, 1 elliptic region of interest per tissue was manually drawn on the section that contained the bulk of the tissue of interest by 1 observer (R.L.) with 7 years of experience in head and neck imaging. ADC values were determined for each of the following 5 tissues in the head and neck: 1) submandibular gland, 2) sternocleidomastoid muscle, 3) spinal cord, 4) subdigastric lymph node, and 5) tonsil. For the selection of a subdigastric lymph node, either the left or right one was selected consistently within each subject. The size (range, 20 -50 mm 2 ) and the position of the region of interest were identified on T2weighted images. ROIs were drawn on corresponding B0 images by visual comparison with the anatomic T2WI. ROIs drawn on bϭ0 s/mm 2 images were copied to the corresponding ADC maps.

Statistical Analysis
First, it was determined whether ADC values of the EPI-DWI-2b sequences can be replaced with ADC maps obtained by selecting only the bϭ0 and bϭ1000 s/mm 2 images from the EPI-DWI-6b (for MR imaging systems IV and V), because they are theoretically equivalent. We used a linear mixed model, with fixed effects for subjects, MR imaging systems, sequences, and an MR imaging system ϫ sequences interaction. 13,14 Random effects were all possible interactions with the subjects (On-line Appendix). This possibility was tested by using data from MR imaging systems I and II, these being the only MR imaging systems on which both sequences had been performed.
For the main variance analysis, 5 MR imaging systems and 3 sequences were compared by using the same statistical modeling approach and reasoning as those used for the linear mixed model and by incorporating tissues as fixed effects (On-line Appendix). All 3 examinations of each subject were assumed pure replications and were nested within subject ϫ MR imaging system combinations. Models with sequence-specific error variances were compared by using the Akaike Information Criterion. 15 The standard error of measurement (SEM) for ADC values per tissue was expressed as the square root of the sum of residual variance ( 2 E ) and the variance expressing the interaction between replication and subjects at different MR imaging systems ( 2 R:IM ), sequences ( 2 SR:IM ), and tissues ( 2 TR:IM ) (On-line Appendix): Differences in mean ADC values for all systems and the betweensubjects effects were tested by using a Levene Test of Equality of Error Variances; an ␣-level of .05 was used for statistical significance. 16 All missing data or images with poor quality of DWI were specifically labeled for statistical analysis. Boxplots were created by using SPSS (Version 20.0; IBM, Armonk, New York). All other analyses were performed with Proc NLMIXED of SAS (Version 9.2; SAS Institute, Cary, North Carolina).

DWI
All subjects underwent multiple DWI sessions with multiple sequences on all MR imaging systems. For MR imaging system III, EPI-DWI-6b was unavailable; for MR imaging systems IV and V, ADC maps for EPI-DWI-2b were constructed by using only the bϭ0 and bϭ1000 s/mm 2 images from the EPI-DWI-6b, yielding a total of 12 DWI sequences per subject (Table 1). Two subjects underwent 2 instead of 3 replications. One subject had prior bilateral tonsillectomies. Therefore, the maximum number of possible ROIs was 1104. For a detailed overview of the number of possible ROIs, see On-line Figure. Further elimination was due to technically failed images and image-specific poor quality, and in 37 cases, it was impossible to place a region of interest: In 95% of tissues, region-of-interest placement was possible on TSE-DWI-2b; in 96%, on EPI-DWI-2b; and in 97%, on EPI-DWI-6b ( Table 2). Examples of ADC maps on different MR  imaging systems and sequences are shown in Fig 1. An example of drawn ROIs is shown in Fig 2. When combining the results of the 3 DWI sequences, regionof-interest placement was possible in 96% of tissues (Table 2). However, in only 87% (range, 83%-90%) of images could a region of interest could be placed on the tonsil. In the other regions, ROIs could be placed in 97%-98% of cases. These data indicate that the tonsil is probably not a good reference tissue for future evaluations.
A variance component analysis was performed for MR imaging systems I and II to test potential differences between ADC values derived from the EPI-DWI-2b sequence and the calculated EPI-DWI-2b extracted from EPI-DWI-6b (Table 3). The lowest bias was found in the subdigastric lymph node (0.7 ϫ 10 Ϫ6 mm 2 / s), and the highest bias was found in the tonsil (Ϫ23.2 ϫ 10 Ϫ6 mm 2 /s). Furthermore, this analysis showed a small range of limits of agreement (LoA) (range, Ϫ307.0 ϫ 10 Ϫ6 mm 2 /s to 302.4 ϫ 10 Ϫ6 mm 2 /s) for all tissues combined. This finding implies that both ADC values are not significantly different. Therefore, we used calculated EPI-DWI-2b ADC values extracted from EPI-DWI-6b on systems if EPI-DWI-2b was not available for further analysis.
The intersystem difference between the MR imaging systems, with mean ADC values as a dependent variable, was statistically significant (P Ͻ .001). The influence of the sequence, the MR imaging system, and the interaction between these 2 parameters was significant (P ϭ .011). The influence of the MR imaging system on the ADC values among the subjects (P Ͻ .001) was also significant.
Variance caused by time is limited (Fig 5). The mean difference in ADC values of the second examination compared with the first, which were on the same day, was 6 ϫ 10 Ϫ6 mm/s 2 (SD ϭ 310 ϫ 10 Ϫ6 mm/s 2 ). For the third examination, 1 month after the first, the mean difference in ADC values was Ϫ5 ϫ 10 Ϫ6 mm/s 2 (SD ϭ 310 ϫ 10 Ϫ6 mm/s 2 ) compared with the first measurement.

DISCUSSION
Before quantitative DWI can be applied in a multicenter study, knowledge is required about the reproducibility of ADC values within a subject, among different MR imaging systems, and among sequences. 10 This study is a first step to obtaining that knowledge.
In this study, we assessed the reproducibility of ADC values for different DWI sequences, MR imaging systems, and tissues in the head and neck. As expected, the variance in ADC values per subject per tissue is the smallest if the subject is measured on the same MR imaging system with the same sequence. The EPI-DWI-6b sequence showed the best reproducibility for all compared tissues, though this sequence was not available on all MR imaging systems. The EPI-DWI-2b sequence had a slightly lower reproducibility than the EPI-DWI-6b. Advantages of EPI-DWI-2b are a shorter acquisition time and being more widely clinically available. ADC measurements in the spinal cord and tonsil were the most precise and reproducible. Because the spinal cord is almost always present in the FOV during a head and neck study, this tissue can potentially be used as a reference. It also has the advantage of being rarely affected by malignancy; this advantage is in     contrast to the tonsils, which are absent in case of tonsillectomy and frequently prove to be the location of an initially unknown primary tumor. 17 Therefore, the spinal cord seems to be the most suitable to serve as reference tissue.
DWI is frequently used in oncologic imaging. 18,19 Previous studies have shown the potential of DWI in diagnosing malignancies in the head and neck area, response prediction, and differentiation between treatment-induced tissue changes and residual or recurrent disease. 6,20,21 However, these studies were conducted in a single institution, without variance in MR imaging systems and protocol. Quantitative MR imaging parameters (eg, ADC) can differ substantially among MR imaging systems and imaging protocols. 22 This difference was also confirmed in the present study. We performed 3 examinations on 5 MR imaging systems on healthy subjects. This study validates differences in ADC values being statistically significant for sequences, MR imaging systems, and the interaction between MR imaging systems and sequences.
Verhappen et al 8 found TSE-DWI to be more reproducible among observers than EPI-DWI in a single-center, single-system study on primary tumors and lymph nodes of 12 patients with head and neck cancer. In the current multicenter, multisystem study, ADC values derived with the EPI-DWI-6b sequence were the most reproducible in healthy subjects with time, followed by EPI-DWI-2b. TSE-DWI-2b was the least reproducible sequence. These different findings may be attributed to the included subjects: healthy volunteers in the current study and patients with head and neck malignancies, which had diffusion restrictions in the study by Verhappen et al. 8 TSE-DWI-2b has inherently lower SNR, 23 which limits the reproducibility in healthy tissue, whereas it does not have geometric distortion and is apparently sensitive enough to detect diffusion restriction. In the current study, ROIs were drawn on bϭ0 s/mm 2 images in visual correlation with anatomic T2 images. Because EPI-DWI has a higher SNR, small structures (eg, benign lymph nodes) are more easily visualized. EPI-DWI may therefore be more appropriate for the evaluation of small structures. In a study by Vandecaveye et al, 20 57% of malignant lymph nodes had a diameter of Ͻ1 cm; therefore, appropriate evaluation of small (apparently benign) structures is vital. Verhappen et al 8 drew ROIs on ADC maps of malignant tissue that showed diffusion restriction. Especially, DWI of primary tumors in the head and neck area may have geometric distortion due to the tumor location at the air-tissue interface. In that case, geometric distortion of EPI techniques may reduce reproducibility among observers. 8 There is also a difference in reproducibility among various tissues in the head and neck area. On all MR imaging systems and sequences, ADC values of the submandibular gland were the least precise (Table 4). An explanation for the relatively poor reproducibility might be the intrinsic physiologic changes in salivary glands during the time of day. ADC values in subdigastric lymph nodes have a relatively poor reproducibility (Table 4). Subdigastric lymph nodes are often too small for drawing reliable ROIs, particularly in healthy subjects. Moreover, lymph nodes are prone to changes with time (eg, due to frequently occurring inflammation in the head and neck area). In contrast, ADC values of the spinal cord and the tonsil are the most reproducible within subjects. In 87% of the images, a region of interest could be drawn on the tonsils; this percentage was lower than for the other tissues (range, 97%-98%) ( Table 2). In healthy subjects, the tonsils are sometimes too small for reliably drawing a region of interest on DWI. However, if the tonsils are large enough to allow the assessment of ADC values, these values appear to be relatively stable with time within a subject; this stability results in the relatively high precision and reproducibility of ADC measurements. The sternocleidomastoid muscle has intermediate reproducibility.
Small changes in ADC values of muscle tissue may be explained by small differences in muscle tone with time.
Sasaki et al 10 previously assessed the reproducibility of ADC measurements in the brain among MR imaging systems, imaging protocols on different time points, and different institutions. It was concluded that there was significant variability in ADC values depending on the coil systems, imagers, vendors, and field strengths. However, only 3 of 10 patients were imaged more than once on the same MR imaging system. In our study, all patients were imaged multiple times on the same MR imaging system, in different institutions, and with a time interval of at least a month between imaging. We found significant differences between MR imaging systems and sequences.
The present study shows that though the physiology of healthy subjects may change with time, ADC values obtained within 1 person and with the same MR imaging system, protocol, and sequences immediately after the first scan and with an interval of at least 1 month have a low variance (ie, the intrasubject variance is small) (Fig 5). This finding indicates that ADC measurements are reproducible and independent of time. The spinal cord and tonsil are the tissues with the lowest ADC variability when different MR imaging systems, protocols, and sequences are used.
This study had some limitations. We included only healthy subjects with a broad age range for whom a stable physiologic status with time for all normal tissues can only be assumed. On the basis of Fig 5, the influence of time appears to be limited, with mean ADC differences being less than 10 ϫ 10 Ϫ6 mm/s 2 among measurements. The stability of the MR imaging systems and sequences used also needs to be assumed. Furthermore, the study population was too small to calculate a conversion factor for different MR imaging systems. A group size of Ն50 subjects is needed to calculate such a conversion factor. 13

CONCLUSIONS
The smallest range of ADC values can be obtained by imaging a subject on the same MR imaging system with an EPI-DWI with 6 b-values. Of the investigated tissues, the spinal cord shows the least variance and therefore should serve as reference tissue in the head and neck region.