Diffusion-Weighted Imaging of Orbital Masses: Multi-Institutional Data Support a 2-ADC Threshold Model to Categorize Lesions as Benign, Malignant, or Indeterminate

BACKGROUND AND PURPOSE: DWI has been increasingly used to characterize orbital masses and provides quantitative information in the form of the ADC, but studies of DWI of orbital masses have shown a range of reported sensitivities, specificities, and optimal threshold ADC values for distinguishing benign from malignant lesions. Our goal was to determine the optimal use of DWI for imaging orbital masses through aggregation of data from multiple centers. MATERIALS AND METHODS: Source data from 3 previous studies of orbital mass DWI were aggregated, and additional published data points were gathered. Receiver operating characteristic analysis was performed to determine the sensitivity, specificity, and optimal ADC thresholds for distinguishing benign from malignant masses. RESULTS: There was no single ADC threshold that characterized orbital masses as benign or malignant with high sensitivity and specificity. An ADC of less than 0.93 × 10−3 mm2/s was more than 90% specific for malignancy, and an ADC of less than 1.35 × 10−3 mm2/s was more than 90% sensitive for malignancy. With these 2 thresholds, 33% of this cohort could be characterized as “likely malignant,” 29% as “likely benign,” and 38% as “indeterminate.” CONCLUSIONS: No single ADC threshold is highly sensitive and specific for characterizing orbital masses as benign or malignant. If we used 2 thresholds to divide these lesions into 3 categories, however, a majority of orbital masses can be characterized with >90% confidence.

O rbital space-occupying lesions represent a heterogeneous group that includes benign tumors, malignant tumors, inflammatory lesions, vascular lesions, and infections. 1 Frequent nonclassic clinical presentations, challenging pathologic evaluation, and risks associated with biopsy are strong reasons to develop better noninvasive diagnostic and imaging tools for orbital disease.
Imaging with CT and MR can be helpful in establishing a diagnosis through demonstration of characteristic patterns of ana-tomic involvement and through features such as CT attenuation, MR imaging signal intensity, and pattern of contrast enhancement. [2][3][4][5][6][7][8][9] Nevertheless, imaging is frequently nonspecific, and significant room for improvement remains in imaging diagnosis.
DWI has been increasingly used to characterize solid masses in the head and neck, aiding in the distinction of benign and malignant lesions. 10 Several retrospective studies have characterized orbital masses with DWI, and some have attempted to determine optimal quantitative ADC thresholds and their sensitivity and specificity in distinguishing malignant from benign lesions. [11][12][13][14][15][16][17][18] These studies have shown somewhat conflicting results and have been limited by single-institution designs and potential selection bias inherent to their patient populations. Specifically, some studies have suggested that a single ADC threshold can be both highly sensitive and specific for predicting malignancy, 16 whereas other results have contradicted this statement. 11,14 To resolve these outstanding conflicts, we performed an analysis of aggregated data by using all available published data points on the DWI of orbital masses, including aggregated source data from the 3 largest published case series on this topic. 14,16,17 The purpose of this study was to better determine what ADC thresholds can be used to predict either benign or malignant histology with high confidence.

Review of Published Literature
To conduct an initial meta-analysis, the lead author (A.R.S.) performed a MEDLINE search to identify published data on the DWI of orbital masses. The search strings included "orbit" OR "orbital" AND "DWI" OR "diffusion-weighted," as well as "head and neck" AND "DWI" OR "diffusion-weighted." One hundred forty-three results were obtained, as of February 2013. These were reviewed, and studies that did not describe the DWI of orbital space-occupying lesions were excluded, leaving 11 studies. Studies of exclusively intraocular tumors and of demyelinating optic neuritis were excluded in this process. Of these 11, one was excluded because its data were wholly duplicated in a more expansive follow-up study from the same authors. The remaining 10 studies, which described 260 orbital masses, were further analyzed. 11,12,14,16,17,[19][20][21][22][23] The lesions described in each of these studies were characterized as either lymphoma, metastasis, nonlymphomatous primary malignancy, benign mass, inflammatory disease, vascular malformation, or infection. The distribution of lesions across the studies is summarized in the On-line Table. The review of the literature revealed only 2 studies that reported sufficient quantitative metrics (sensitivity, specificity, positive predictive value, negative predictive value) to permit meta-analysis, both from this study's authors. 14,16 It was not possible to reconstruct these data from the published results of the other studies, either because of a small sample size or the way the data were summarized. In place of a metaanalysis, we attempted to aggregate as many raw data points as could be obtained on the basis of source data from the study authors' previous works and ADC values of individual tumors that could be obtained from the literature. All tumors with reported ADC values were included. In some cases, multiple lesions with the same diagnosis were reported as an average and SD of the group. These data were excluded from further analysis because it was impossible to incorporate them into the receiver operating characteristic analysis (ROC). To assess for a systematic bias in lesion distribution, we compared the distribution of lesions from the published data and from the final analysis group against historical data from the largest published series of orbital masses by Shields et al. 1 These data are summarized in Table 1.

Data Collection and Analysis
The de-identified data used in this study comprised source data from 3 previously published case series of orbital mass DWI by the authors of this study, consisting of ADC and corresponding clinical/pathologic diagnosis for 189 cases. 14,16,17 These data were collected with the approval of the respective local institutional review boards/ethics committees, with technical methods as previously described. 14,16,17 Thirteen additional orbital mass ADC values were obtained through review of the literature. In total, 98 benign lesions and 104 malignant masses were studied. The re-maining 58 cases were excluded either because quantitative ADC analysis was not performed by the original authors or because the data were reported in a summarized fashion that did not allow the extraction of individual data points.
The data included DWI studies performed on MR imaging machines from different vendors, with different field strengths and different technical parameters. To determine the equivalence of the DWI techniques across institutions, we compared the most commonly occurring lesions across the authors' source datasets with each other by using Kruskal-Wallis analysis. Lymphomas from the 3 source datasets (6, 32, and 6 tumors) and inflammatory lesions from the 3 source datasets (20, 13, and 6 lesions) were compared.
The data were then grouped into benign and malignant categories. For each of these categories, descriptive statistics, Student t tests, and ROC were performed. These analyses were performed for all lesions in aggregate and for the authors' source datasets separately. Sensitivity and specificity of various ADC thresholds for distinguishing benign from malignant masses were determined.
Lymphoma and inflammatory lesions were also compared with each other separately because there is considerable clinical and radiologic overlap in these conditions. As mentioned above, descriptive statistics, Student t tests, and ROC were performed.
In consideration of the disproportionately large number of lymphoma lesions in our dataset, which may skew the results through characteristically low ADC, ROC was also performed, comparing benign lesions and malignant tumors, after excluding lymphomas.

Lesions Analyzed
The final analysis group consisted of 202 patients with 98 benign lesions and 104 malignant lesions. The most common benign lesions were inflammatory masses (n ϭ 39), vascular lesions (n ϭ 24), and optic nerve sheath meningiomas (n ϭ 11). The most common malignant lesions were lymphoma (n ϭ 46) and metastases (n ϭ 20). The data are summarized in Table 2. The composition of the final analysis group of 202 subjects was similar to the composition of the 260 subjects imaged with DWI before exclusion of unavailable data points (Table 1 and Fig 1), though with a modest reduction in the proportion of benign primary lesions. Both groups contained a larger proportion of lymphoma lesions than would be expected on the basis of available epidemiologic data. 1 When lymphomas were excluded, the composition of the pre-exclusion group, final analysis group, and the epidemiologic data was similar (Fig 1).

Validation of ADC across Techniques
There was no significant difference in the ADC of lymphoma across the authors' source datasets (P ϭ .98). Likewise, there was no significant difference in ADC of inflammatory lesions across these datasets (P ϭ .42). These data are summarized in Table 2.

Descriptive Characteristics
Benign lesions showed an ADC of 1.43 Ϯ 0.41 ϫ 10 Ϫ3 mm 2 /s (mean). Malignant lesions showed ADC of 0.90 Ϯ 0.36 ϫ 10 Ϫ3 mm 2 /s ( Table 3). Figure 2 shows a scatterplot of lesion categories with corresponding ADCs. There were significant differences between benign and malignant lesions with respect to ADC (P Ͻ .0001), and these differences were visually apparent (Fig 3). Significant differences remained (P Ͻ .0001), even after exclusion of lymphomas.

ADC Performance in Distinguishing Benign from Malignant Lesions
The area under the ROC curve for aggregated data was 0.84. An ADC threshold of less than 0.93 ϫ 10 Ϫ3 mm 2 /s resulted in a 60% sensitivity and 96% specificity for malignancy. A more lenient threshold of ADC less than 1.35 ϫ 10 Ϫ3 mm 2 /s resulted in 90% sensitivity for malignancy, but only 49% specificity. When lymphomas were excluded, the area under the ROC curve dropped to 0.73. The 0.93 ϫ 10 Ϫ3 mm 2 /s threshold resulted in only a 28% sensitivity for malignancy, with a 96% specificity. Figure 4 shows the ROC curve for distinguishing benign from malignant lesions. Table 4 shows the sensitivities and specificities of various ADC values for distinguishing benign from malignant lesions.

DISCUSSION
This analysis showed that DWI produces equivalent quantitative ADC values across a variety of MR imaging scanners and techniques, a finding that is in concert with expectations based on previous investigation. 24 There were significant differences between benign and malignant lesions, though with notable overlap. ADC was highly accurate in distinguishing lymphoma from inflammatory disease. Previous studies of orbital mass DWI have demonstrated its technical feasibility and potential clinical uses. These studies have conflicted somewhat in their results, however, and each has been limited by a retrospective, single-institution design. Therefore, the role of DWI in evaluating orbital masses remains unclear. Aggregating data from multiple institutions removes some of the selection bias inherent in the individual studies. Furthermore, this analysis verifies that quantitative ADC values are generalizable across a range of MR imaging scanners and techniques.
Previous studies have conflicted in their reports of the overall sensitivity and specificity of DWI for differentiating benign from malignant lesions and have conflicted slightly in their optimal ADC thresholds. Sepahdari et al 14 reported an optimal threshold value of 1.0 ϫ 10 Ϫ3 mm 2 /s for differentiating benign from malignant lesions, with an associated 63% sensitivity and 86% specificity. Razek et al 16 reported an optimal threshold value of 1.15 ϫ 10 Ϫ3 mm 2 /s, with a sensitivity of 95% and specificity of 91%.
Politi et al 17 did not specifically address the question of differentiating benign from malignant lesions, but they reported an ADC threshold of 0.775 ϫ 10 Ϫ3 mm 2 /s for distinguishing lymphoma from nonlymphoma lesions with a 96% sensitivity and 93% specificity. Roshdy et al 11 did not attempt to calculate a threshold ADC value and associated sensitivity and specificity, but they did note overlap between benign and malignant lesions.
The results of this multi-institutional analysis indicate that there is no single ADC threshold that is both sensitive and specific for distinguishing benign from malignant lesions. On the basis of these results, we propose a 2-threshold model for characterizing orbital masses with DWI: 1) "likely malignant," for lesions with a Ͼ90% probability of malignancy, based on an ADC less than 0.93 ϫ 10 Ϫ3 mm 2 /s (33% of this cohort); 2) "likely benign," for lesions with a Ͼ90% probability of benignity, based on an ADC greater than 1.35 ϫ 10 Ϫ3 mm 2 /s (29% of this cohort); and 3) "indeterminate," for lesions with ADCs between 0.93 and 1.35 ϫ 10 Ϫ3 mm 2 /s (38% of this cohort).
In general, the optimal clinical use of DWI for evaluating an orbital mass will depend on the differential diagnosis dictated by other clinical and imaging data. For example, differentiation between lymphoma and atypical lymphocytic infiltrate or other orbital inflammatory diseases is a common diagnostic dilemma, 8 for which DWI proves quite useful. On occasion, it can be difficult to distinguish an infantile hemangioma (capillary hemangioma) or a vascular malformation from a rhabdomyosarcoma in a pediatric patient, 25 another task for which DWI would seem wellsuited. 21 There are, however, tasks for which DWI may be limited. There was overlap in the ADC of optic nerve sheath meningioma and lymphoma, and DWI may also fail to distinguish these lesions in cases in which the imaging appearance and clinical findings overlap. The clinical and conventional imaging data should always be weighted appropriately when evaluating any single case, to ensure that the DWI information contributes to the analysis rather than detracting from it. In a practical setting, we believe that DWI is best used as a tool to further refine a short differential provided by the clinical presentation and the other imaging data.
There were 4 major limitations to this study. The first is that the data were acquired on multiple scanner types, with slight differences in acquisition technique and methods of measuring ADC. This feature is both a strength and a weakness of the study design. Although less technical standardization weakens the internal validity of the data, equivalent ADC values of similar lesions across multiple datasets suggest that quantitative ADC measurements are robust across multiple platforms. The second limitation is that some of the results may not be generalizable across all practices. The sensitivity, specificity, and accuracy of DWI in distinguishing benign from malignant lesions depend on the study population because there is heterogeneity in lesions. This study  includes a larger number of patients with lymphoma and a larger proportion of lymphomas compared with other malignancies than would be expected in an unselected group of patients based on available epidemiologic data. 1 This fact likely improves the observed sensitivity and specificity of DWI for distinguishing benign from malignant lesions, due to the characteristic very low ADC of lymphoma. When lymphoma lesions were removed from the analysis, the ability of DWI to differentiate benign from malignant lesions dropped. Nevertheless, there were still significant differences between benign and malignant lesions.
The results of this ad hoc subgroup analysis should be interpreted cautiously because the removal of lymphoma lesions reduces the sample size and introduces other biases. A third limitation is the inability to quantify variability in ADC measurements. A degree of intraobserver and interobserver variability in the measurement of lesion ADC is expected, and a degree of scan-to-scan variation in ADC is also expected. Without the ability to quantify these factors, it can be difficult to interpret the results of a single ADC measurement in a single patient. Overall, the effect of measurement error between or within observers on a single scan has been shown to be small, 26 and the variability in ADC between different scanners is similarly small. 27 Finally, there are multiple areas of potential bias that could not be addressed by the study design. There may be a degree of selection bias within studies, with selective exclusion of some data points or publication bias related to exclusion of results that do not support a role for DWI. The exclusion of published studies for which ADC values could not be obtained could also affect the results. Note that 2 lymphomas and 3 rhabdomyosarcomas reported by Roshdy et al 11 showed higher average ADC than was observed in our group but were excluded because the ADC data were reported as an average. Specifically, Roshdy et al reported 2 lacrimal gland lymphomas whose average ADC overlapped that in the inflammatory lesions we observed. Our study may therefore overestimate the true performance of DWI in distinguishing lymphoma from inflammatory disease.

CONCLUSIONS
This analysis of multi-institutional data confirms that benign and malignant orbital tumors have significant differences in ADC. There was no single ADC threshold that was both highly sensitive and specific for predicting malignancy. On the basis of these results, we propose a "likely malignant" threshold ADC of Ͻ0.93 ϫ 10 Ϫ3 mm 2 /s, a "likely benign" threshold ADC of Ͼ1.35 ϫ 10 Ϫ3 mm 2 /s, and an "indeterminate range" ADC be-  D). A, Axial DWI shows a right orbital mass with marked hyperintensity. B, Corresponding axial ADC map shows dark signal, indicating low ADC and hypercellularity typical of lymphoma. ADC of this lesion was 0.65 ϫ 10 Ϫ3 mm 2 /s. C, Axial DWI in a different patient shows a lacrimal gland mass with less intense signal compared with A. D, Corresponding axial ADC map shows intermediate signal, brighter than adjacent brain parenchyma, reflective of the lower cellularity seen in orbital inflammatory lesions. The ADC of this lesion was 1.09 ϫ 10 Ϫ3 mm 2 /s. tween 0.93 and 1.35 ϫ 10 Ϫ3 mm 2 /s when evaluating orbital masses with DWI. Knowledge of the expected ADC values of common lesions may also be helpful in organizing the differential diagnosis determined by other clinical and imaging data. DWI may be particularly helpful in distinguishing inflammatory disease from lymphoma.