Improved Assessment of Middle Ear Recurrent Cholesteatomas Using a Fusion of Conventional CT and Non-EPI-DWI MRI

BACKGROUND AND PURPOSE: Recurrent middle ear cholesteatomas are commonly preoperatively assessed using MR imaging (non-EPI-DWI) and CT. Both modalities are used with the aim of distinguishing scar tissue from cholesteatoma and determining the extent of bone erosions. Inflammation and scar tissue associated with the lesions might hamper a proper delineation of the corresponding extensions on CT images. Using surgical findings as the criterion standard, we assessed the recurrent middle ear cholesteatoma extent using either uncoregistered or fused CT–MR imaging datasets and determined the corresponding accuracy and repeatability. MATERIALS AND METHODS: Twenty consecutive patients with suspected recurrent middle ear cholesteatoma and preoperative CT–MR imaging datasets were prospectively included. A double-blind assessment and coregistration of the recurrent middle ear cholesteatoma extent and manual delineation of 18 presumed recurrent middle ear cholesteatomas were performed by 2 radiologists and compared with the criterion standard. “Reliability score” was defined to qualify radiologists' confidence. For each volume, segmentation repeatability was assessed on the basis of intraclass correlation coefficient and overlap indices. RESULTS: For the whole set of patients, recurrent middle ear cholesteatoma was further supported by surgical results. Two lesions were excluded from the analysis, given that MR imaging did not show a restricted diffusion. Lesions were accurately localized using the fused datasets, whereas significantly fewer lesions (85%) were correctly localized using uncoregistered images. Reliability scores were larger for fused datasets. Segmentation repeatability showed an almost perfect intraclass correlation coefficient regarding volumes, while overlaps were significantly lower in uncoregistered (52%) compared with fused (60%, P < .001) datasets. CONCLUSIONS: The use of coregistered CT–MR images significantly improved the assessment of recurrent middle ear cholesteatoma with a greater accuracy and better reliability and repeatability.

R ecurrent middle ear cholesteatoma (rMEC) is a destructive and expanding lesion 1 that can recur after a seemingly complete surgical resection. The frequency of rMEC ranges from 5% to 15% and can reach up to 61% 2 after an initial operation, particularly with canal wall up techniques. Potential clinical consequences are similar to those resulting from a primary lesion-that is, hearing loss, meningitis, brain abscess, and labyrinthitis. However, the clinical presentation of rMEC differs from that of middle ear cholesteatoma (MEC) regarding otoscopic assessments. A high rate of false-negative results has been reported, so an additional surgical procedure has often been performed as a diagnostic confirmation. 3,4 CT is commonly used to assess rMEC, plan revision surgery, and choose middle ear repair strategies. 5 Endoscopic (transmastoid or transcanal) or microscopic (canal wall up/ down) procedures have been used. 6 In the past decade, MR imaging and, more particularly, DWI have been proposed as an alternative to the additional surgical procedure. 7 Restricted diffusion in lesions of the middle ear cavity has been reported as a sensitive index of rMEC and false-negative results-that is, missed lesions have been related to small volume or mural cholesteatomas and susceptibility artifacts. Assessment of rMEC using non-EPI-DWI would be more accurate with the ability to detect 3-mm rMEC lesions. 7,8 High-resolution CT has been considered so far as the imaging technique of choice for the evaluation of bone tissue changes oc-curring in rMEC. Erosions of ossicles and the facial nerve canal, labyrinthine system, or tympanic tegmen have been described. 9 However, high-resolution CT assessment of rMEC can be challenged by multiple factors such as postoperative scarring and fluid and cholesterol granulomas around the lesion.
Most interesting, it has been recently suggested that surgical planning of rMEC could be improved when using information from coregistered MR and CT images. 10 The corresponding usefulness has not been assessed for rMEC. The main aim of the present study was to assess the added value of DWI-CT fusion for the local assessment of rMEC in a clinical workflow. Volumes of lesions were quantified on fused and unfused datasets and compared using intraclass coefficients and similarity indices.

Patients
Between August 2015 and August 2016, twenty consecutive patients (7 women and 13 men; mean age, 41.2 years) were prospectively enrolled before an rMEC operation. Each patient had a history of histologically proved MEC and had an initial operation at least a year before the inclusion. Each patient had a presumed rMEC based on the combined analysis of otoscopy, CT, and MR imaging. Patients were included once they provided written informed consent. Exclusion criteria were middle ear infection, gadolinium allergy, pregnancy, non-MR imaging-compatible implants or devices, surgical contraindications, or refusal to participate.

Image Analysis
Both MR and CT images were analyzed using OsiriX Imaging Software (http:// www.osirix-viewer.com) by 2 experts (A.V. with 16 years' experience and F.F. with 4 years' experience) who were blinded to the patient's clinical status. Observers performed the image assessment and coregistration independently. rMEC lesions were assessed in 18 middle ear areas using a 5-point Likert scale as follows: 1 ϭ no invasion, 2 ϭ unlikely invasion, 3 ϭ unclear invasion, 4 ϭ highly probable invasion, and 5 ϭ obvious invasion; 22 anatomic locations ( Table 2) were predefined by the surgical team.
To assess the reproducibility of the manual delineation of the lesions, each expert manually segmented each lesion twice at different times on CT images. The corresponding volumes (vol 1 and vol 2) were quantified using the OsiriX "ROI volume" tool. Lesion segmentation was initially performed on the unfused datasets (uncoregistered dataset [UD]). Then, a rigid coregistration was performed between the 3D-VIBE-T1WI and CT datasets using the OsiriX plugin "Fusion tool." The ipsi-and contralateral geniculate ganglion and ipsilateral stylomastoid foramen were used as specific landmarks. Fusion quality was defined on the basis of a 4-point quality scale using the distance between the corresponding landmarks with 1, 2, 3, and 4 referring to a distance of 0, Ͻ1 mm, between 1 and 2 mm, and Ͼ2 mm, respectively. The bϭ1000 non-EPI-DWI was resliced to the coregistered 3D-VIBE-T2WI and then fused to the CT dataset.
The fused dataset (FD) was analyzed 6 weeks later using the same paradigm and the same predefined anatomic locations. The corresponding volumes were referred as vol 3 and vol 4 for observer 1 and 2, respectively.

Surgical Criterion Standard
Surgical findings were considered as the criterion standard for lesion assessment. Using a Likert scale, we defined findings as negative for scores 1 and 2 and as positive for scores 4 and 5 ( Table  2, "UD Exact," "FD Exact"). In addition, a reliability score was computed as follows: 100% for Likert scores 1 and five; 50% for Likert scores 2 and four; 0% for Likert score 3.

Test-Retest Repeatability
To assess the coregistration process and segmentation repeatability, we performed test-retest for volume measurement. The corresponding intraclass coefficients (ICC) for paired measurements (vol 1 to vol 4) and the overlap between manually segmented volumes of interest were computed. Overlap was quantitatively assessed on a voxel basis using the Jaccard index (JI) 11 : with VOI a and VOI b being volumes manually segmented in fused datasets. ICC values were qualified according to Landis and Koch. 15 The ICC precision of estimates was defined as previously described using the relative 95% confidence interval. 16

Statistical Analyses
The whole set of tests was performed with R statistical and computing software, Version 3.2.0 (http://www.r-project.org/), and a P value Ͻ .05 was considered statistically significant.
Quantitative variables are presented as means Ϯ SDs, while categoric variables are presented as frequencies. Paired Mann-Whitney U tests were used to compare exact finding rates, reliability scores, volumes, and segmentation overlap indices computed in UD and FD.
Areas under the receiver operating characteristic (ROC) curve measurements were used to quantify the ability to locate rMEC. The DeLong test 12 was used to compare the areas under the curve of the paired ROC curves with the package pROC version 1.14.0 for R. For quantitative variables, the Youden method 13 was used to determine the thresholds of the ROC curves.
Single-measure ICCs were calculated using the 2-way random ANOVA on average measures (ICC ranges, 0.00 -1.00, with values closer to 1.00 representing better reproducibility). 14 Interpretation of the ICC was categorized according to Landis and Koch. 15 The precision of ICC estimates was defined as previously described using the 95% CI. 16

rMEC Assessment
Among the 20 rMECs, in the 20 patients included in the study, 2 rMEC lesions (11 and 12 mm, respectively) were excluded ( Table 3). The corresponding absence of a bϭ1000 signal on DWI was considered a false-negative finding. A total of 18 lesions were analyzed ( Table  2) with respect to 22 anatomic locations (n ϭ 396). Fusion was qualified as "perfect" in 9 lesions (50%) and as "good" (Ͻ1 mm) for the remainder. The anatomic distribution of lesions is detailed in Table 2. The epitympanic recess was the usual rMEC site. The frequency of exact findings amounted to 84.8% Ϯ 11.2% using UD and was significantly larger (99.7% Ϯ 11.8%, P Ͻ 0.01) using FD. Similarly, the reliability score significantly rose from 84.5% Ϯ 10.3% to 96.2% Ϯ 4.3% (P Ͻ 0.01) using UD and FD, respectively ( Table  2). The area under the curve value was 0.93 using UD and significantly increased to 0.99 (P Ͻ 0.01) using FD (Table 4).

DISCUSSION
The present results suggest that a better rMEC assessment could be achieved when CT and MR imaging datasets are coregistered and fused. On that basis, both imaging modalities should be considered before an operation.
These results further support and extend those from previous reports that demonstrated the usefulness of FD in the local evaluation of MEC. 10,[17][18][19] Accordingly, from a comparative analysis between CT alone and FD, a few studies have suggested that MEC would be better assessed using FD. 10,17,19 This result has been further confirmed from a comparative analysis between DWI alone and FD. [17][18][19] The superiority of the FD-based rMEC assess-  ment that we describe in the present study could partly be due to a careful patient selection based on the DWI results, the number of patients, and the number of locations. DWI is an imaging technique that can clearly distinguish cholesteatoma from postsurgical abnormalities such as scars. In our study, only patients with rMEC were included, and untreated patients were excluded so that the postoperative scar tissue could be addressed. On the contrary, in studies in which untreated patients have been included, this issue could not be addressed, 10,17,18 and the advantage of the FD-based assessment could not have been properly investigated. The number of patients with rMEC included in the present study (n ϭ 20) was larger compared with other studies in which 2-16 patients were included. 10,[17][18][19] The number of locations could have also been a central methodologic factor. While 4 -6 locations have been assessed in previous studies, the 22 locations assessed in the present study likely led to a reduced ␣ risk. In addition, given that each location has been analyzed in each patient, no missing data had to be considered, and cluster analysis was not considered. The increase of exact location findings was most striking in the tegmen antri, anterior epitympanic recess, tympanic segment of the facial nerve, and mesotympanum. These locations have been reported as frequently occurring. 19 As previously described, the 5-point Likert scale was used to count exact findings, calculate the reliability score, and compare ROC curves. 20 Such a scale has been used in a large number of radiologic studies. 21 As an example of the clinical added value of the FD-based rMEC assessment, the carotid canal was affected in one of the patients, and this was missed using the UD, whereas it was identified using FD (Figure). In this particular case, the proper location had an impact on the surgical procedure and the middle ear repair strategy.
The improved segmentation reproducibility using FD is further supportive of this added value. Scarce data are available in previous studies regarding this particular issue. ICC values regarding computed volumes illustrated an almost perfect agreement for each segmentation. Although no criterion standard measurements were available for volumes, our analysis disclosed reproducible volume measurements based on CT and bϭ1000 findings. The superior reproducibility of the FD-based assessment was statistically demonstrated using the JI, which has been largely used for volume delineation. 11 The JI ranged between 31% and 74%, thereby indicating a poor-

FIGURE.
Example of rMEC leading to a carotid canal erosion. Recurrent middle ear cholesteatoma (axial bϭ1000 non-EPI MR imaging, B, arrow) probably located (rated Likert 4) in the hypotympanum, on uncoregistered dataset (A, B, C, E, arrows). The hypotympanum location is clear (rated Likert 5), considering that the fused dataset (bϭ1000/CT in axial plane, D, arrow) and the vertical carotid canal lysis (A and D, arrowheads) can be seen. The perioperative surgical findings (F) confirmed the hypotympanum rMEC location (arrows) and the carotid canal lysis (arrowhead). In that case, the presurgical image fusion allowed the surgeon to adapt his approach, thus lowering the surgical risks.  to-high overlap between volumes of interest. One has to keep in mind that the JI assessed both segmentation and coregistration processes, given that each observer performed his own fusion process.
During the past decade, non-EPI-DWI has significantly improved the handling of MEC both in terms of diagnosis and postoperative survey strategy. 17,22 As recently reported, non-EPI-DWI is less sensitive to susceptibility artifacts and can provide a high signal-to-noise ratio so that high sensitivity (90%) and specificity (94%) can be achieved. 7 In the present study, 2 lesions of 20 were freely diffusive, leading to a sensitivity of 90%, in agreement with a recent meta-analysis report. 7 The 100% specificity reported in the present study is likely due to particular attention paid to excluding patients with middle ear infection. False-positive findings can be found due to water restriction in infected areas as described in mastoid abscess. 23 The choice of the non-EPI-DWI technique in the field of rMEC investigations is still a matter of debate. We chose to use TSE-DWI, considering the recognized low artifacts on the skull base and signal homogeneity. 24 HASTE-DWI has been noted for the shorter acquisition time, 7 but information regarding artifacts and contrast-to-noise ratio are missing. The usual slice thickness used for non-EPI-DWI sequences ranges from 2 to 4 mm. The recently developed isotropic 1.5-mm slice thickness non-EPI-DWI at 3T can be considered a major advantage offering 3.4-mm 3 voxel volumes. 10 Such a turbo-field echo with diffusion-sensitized driven-equilibrium recall should be considered in the future for high-resolution MR imaging-CT fusion. 10

Registration Process
Registration and fusion processes were performed in a 10-minute clinical workflow using freeware as previously suggested. 10,[17][18][19] Because the skull base is a nondeformable structure, a rigid coregistration was used for the merging process of high-resolution cross-sectional imaging datasets. To limit observer subjectivity, we did not directly merge DWI and CT scans using a manual fine-tuning. We chose a multistep process involving the selection of 3 invariable landmarks on the 3D-T1 VIBE MR imaging and CT datasets and a reslicing of the bϭ1000 stack on the 3D-T1 VIBE MR imaging dataset. On that basis, 50% of the fusions were qualified as perfect and the remainder as good (Ͻ 1 mm), which can still be considered clinically valuable. Imperfect fusion quality could be accounted for by patient motion between the 2 acquisitions. Another accounting factor could be related to the variation between the CT scan isomorphism and the MR imaging matrix diffeomorphism, which depends on magnetic susceptibility. Diffeomorphic artifacts are expected to be larger using ultrafast gradient-echo sequences such as VIBE compared with spin-echo (bϭ1000) sequences. Theses artifacts could lead to minor voxel mismatch between MR imaging and CT datasets. The use of the recently developed spin-echo 3D sequence (sampling perfection with application-optimized contrasts using different flip angle evolution, SPACE; Siemens) might be an interesting alternative to reduce these artifacts. 25 In the present cohort, although 2 patients exhibited rMEC lesions as round pearls in an aerated mastoidectomy cavity, the fusion process was qualified as perfect. Other postprocessing techniques could be used in the future with the aim of a fully automatic fusion, which should reduce processing time and manual registration bias. 26

Clinical Implications for Patients
The present results disclose and further support the idea that fusion of coregistered images can improve rMEC assessment and help in guiding the surgical procedure. Preoperative surgical approaches and specific surgical risks can be expected to be better evaluated, so new developments in endoscopic ear surgery could be expected as part of a safe surgical approach. 27 The exact location of rMEC extensions is of high interest because it should allow minimally invasive surgical procedures without any transmastoid surgery conversion during MEC removal. 6

CONCLUSIONS
Image fusion between coregistered conventional non-EPI-DWI and temporal bone CT significantly improved the local assessment of recurrent cholesteatoma. Further supportive of this additional value, lesion segmentation between and within observer was improved. The impact on operation duration and postoperative complications should be evaluated in future studies.