Automated Color-Coding of Lesion Changes in Contrast-Enhanced 3D T1-Weighted Sequences for MRI Follow-up of Brain Metastases

BACKGROUND AND PURPOSE: MR imaging is the technique of choice for follow-up of patients with brain metastases, yet the radiologic assessment is often tedious and error-prone, especially in examinations with multiple metastases or subtle changes. This study aimed to determine whether using automated color-coding improves the radiologic assessment of brain metastases compared with conventional reading. MATERIALS AND METHODS: One hundred twenty-one pairs of follow-up examinations of patients with brain metastases were assessed. Two radiologists determined the presence of progression, regression, mixed changes, or stable disease between the fol-low-up examinations and indicated subjective diagnostic certainty regarding their decisions in a conventional reading and a second reading using automated color-coding after an interval of 8 weeks. RESULTS: The rate of correctly classi ﬁ ed diagnoses was higher (91.3%, 221/242, versus 74.0%, 179/242, P , .01) when using automated color-coding, and the median Likert score for diagnostic certainty improved from 2 (interquartile range, 2 – 3) to 4 (interquartile range, 3 – 5) ( P , .05) compared with the conventional reading. Interrater agreement was excellent ( k ¼ 0.80; 95% CI, 0.71 – 0.89) with automated color-coding compared with a moderate agreement ( k ¼ 0.46; 95% CI, 0.34 – 0.58) with the conventional reading approach. When considering the time required for image preprocessing, the overall average time for reading an examination was longer in the automated color-coding approach (91.5 [SD, 23.1]seconds versus 79.4 [SD, 34.7] seconds, P , .001). CONCLUSIONS: Compared with the conventional reading, automated color-coding of lesion changes in follow-up examinations of patients with brain metastases signi ﬁ cantly increased the rate of correct diagnoses and resulted in higher diagnostic certainty.

as the technique of choice for screening and follow-up of patients with increased risk of brain metastases. 5,6 Its excellent soft-tissue contrast and spatial resolution were found to be crucial for improving imaging-based assessment and follow-up of almost any neurooncologic disease. 5,7,8 Because most brain metastases tend to show a strong gadolinium contrast enhancement, contrast media-enhanced T1WI sequences are a cornerstone of imaging brain metastases. 5,8 While the capabilities of MR imaging for the depiction of small brain metastases are unquestionable, accurate assessment of multiple sequences and planes can be tedious and error-prone in daily radiologic routine. In particular, the interval appearance of new small lesions or a subtle increase or decrease in lesion size may be easily missed in follow-up examinations, though these findings might have important therapeutic implications. 9,10 Furthermore, in patients with multiple metastatic lesions that show a mixed response to treatment on follow-up examinations, the radiologist might be prone to a satisfaction-of-search bias, leading to an inaccurate diagnosis. 11,12 Recently, automated color-coding (ACC) of longitudinal MR imaging follow-up examinations has been reported to be beneficial for the assessment of brain lesions. This finding was reported for ACC of FLAIR sequences on follow-up examinations in multiple sclerosis, using different software approaches. [13][14][15] Other studies applied similar techniques to the neuro-oncologic followup of patients with astrocytomas and high-grade gliomas. [16][17][18] On the basis of these previous results, our hypothesis was that application of ACC at the follow-up assessment of brain metastases would yield comparable diagnostic benefits. Therefore, the purpose of this study was to compare the assessment of gadoliniumenhanced, 3D T1-weighted MR imaging follow-up examinations of patients with brain metastases between a conventional reading approach and an approach using ACC.

Patients
The institutional review board (University Hospital Cologne) reviewed and approved the study plan and waived the need for informed patient consent due to the retrospective character of the study. The PACS and the radiologic information system (ORBIS; Dedalus HealthCare) were retrospectively screened for patients 18 years of age or older who were diagnosed with metastatic disease in the brain. In-clusion criteria comprised 2 consecutive MR imaging examinations of the brain for follow-up that in-cluded contrast-enhanced 3D T1WI sequences between January 2013 and June 2019. The examinations where either performed in our institution or provided from referring institutions. This initial screening yielded 123 patients. After exclusion of 13 patients without metastases in any of the 2 scans, 5 patients with interim surgery of brain metastases between follow-up examinations, 8 patients without contrast-enhanced 3D T1WI acquisitions, and 2 patients with severe motion artifacts, 95 patients with 121 follow-up pairs remained for study inclusion: 82 patients with 1 follow-up pair, 7 patients with 2 follow-up pairs, 4 patients with 3 follow-up pairs, 2 patients with 4 follow-up pairs, and 1 patient with 5 follow-up pairs. Figure 1 depicts inclusion and exclusion of study subjects.

Image Acquisition
Gadolinium-enhanced (0.5 mmol/mL, gadoterate meglumine, Dotarem; Guerbet), T1-weighted sequences with fat suppression and the use of a standard head coil were acquired within the regular brain metastases follow-up in our institution. Moreover, further 3D T1-weighted sequences with heterogeneous contrast media protocols from different vendors and scanner generations were included from referring institutions. The data set comprised different field strengths of 1T (n = 3), 1.5T (n = 20), and 3T (n = 219). Tables 1 and 2 give an overview of detailed MR imaging acquisition parameters.

Ground Truth Annotation
To establish a reference standard for a follow-up diagnosis, a radiologist with 4 years of experience and a senior neuroradiologist with 11 years of experience double-checked the radiologic reports for all included follow-up pairs in a consensus reading. These radiologists were not involved in the comparative assessment, which is outlined below. We refrained from using fixed size thresholds as proposed in the Response Assessment in Neuro-Oncology criteria for brain metastases (RANO-BM) criteria 19 because our study was directed toward investigating the use of ACC for clinical evaluation outside of the trials. For each of the patients and follow-up pairs, respectively, the reference standard was determined as the following:   1. Stable (no change regarding number or size of metastases) 2. Progressive disease (any number of new interval metastases and/or any unequivocal increase in lesion size) 3. Disease regression (any number of intermittently disappeared metastases and/or any unequivocal decrease in lesion size) or 4. Mixed changes (the presence of components in 2 and 3 above).
To avoid inclusion of pseudoprogressions or erroneous inclusion of lesions, both radiologists had full access to all imaging and correlative clinical data.

Follow-up Assessment of Brain Metastasis
To compare the conventional follow-up assessment with the ACC approach, 2 readers with 3 and 4 years of experience in neuroradiologic imaging independently reviewed all MR imaging follow-up pairs in 2 dedicated reading sessions. Both readers were blinded to the above-mentioned ground truth diagnoses as well as clinical data that could give an indication of them. Readings were performed at the same workstation under standardized reading conditions with a time interval of 8 weeks between the conventional reading and the reading using ACC, to avoid a recall bias. The patient order was randomized before each reading session.
In the first session, the conventional readout was performed as in clinical routine using a side-by-side setup within the PACS. Manual linking and coregistration of follow-up pairs were allowed.
In the second session, the ACC approach was conducted independently by both readers, each of whom reviewed the same follow-up pairs as in the conventional approach, using a CE-certified and FDA-approved software that facilitates ACC of follow-up examinations of the same patient (MR Longitudinal Brain Imaging [LoBI]; Philips Healthcare). The software is integrated into the vendor's image viewer (IntellisSpace Portal, Version 11; Philips Healthcare), however is not constrained to MR images generated by Philips Healthcare scanners and generally applicable. The software automatically performs a rigid coregistration. Then, the application performs an intensity normalization and subsequent subtraction of the selected sequences. After coregistration, normalization, and subtraction, both sequences are linked at the same anatomic level, allowing manual correction if necessary. Additionally, the software creates an overlay map, which highlights a focal increase in signal intensity in red and a focal decrease in signal intensity in blue, respectively (Fig 2). The color intensity of the color-coding can be adjusted seamlessly. Both linked sequences and the overlay map are displayed side by side and can be viewed simultaneously on the same anatomic level.
In both reading sessions, readers were asked to pick one of the following diagnoses determined in the reference standard annotation (stable disease, disease progression, disease regression, or mixed changes). Additionally, they indicated diagnostic certainty regarding their decision on a 5-point Likert scale.
The average time required for loading the images within the PACS was recorded in the conventional reading session. The average time required for loading the application and processing the images was recorded in the second reading when using the ACC software. Furthermore, the time from being presented with the images to making the diagnosis was recorded for each follow-up examination pair in both reading sessions. All time measurements were performed by a radiologist not involved in the readouts.

Statistical Analysis
The rate of correctly classified diagnoses was calculated for each type of diagnosis (ie, disease progression, disease regression, mixed changes, stable disease) and as an overall rate including all followup pairs. The rates of correctly classified diagnoses attained by the readers were compared between the 2 reading approaches using the McNemar test. We refrained from calculating the diagnostic accuracy and specificity due to the study focus on patients with known metastatic brain disease and the corresponding lack of healthy study subjects. Likert scales were compared using the Wilcoxon signedrank test. Interreader agreement was evaluated using the Cohen k and interpreted as follows: excellent agreement (k $ 0.8), good agreement (k $ 0.6), moderate agreement (k $ 0.4), and poor agreement (k , 0.4). A P value , .05 was considered statistically significant. Rates of correctly classified diagnoses are indicated by percentages, and Likert scales, as median and interquartile range. Continuous variables are indicated by mean (SD).

Patients
Of the 95 included patients, 55 were women and 40 were men. The mean patient age was 61 (SD, 14) years (range, 27-86 years). The underlying primary tumor was malignant melanoma in 41 patients, lung cancer in 31 patients, breast cancer in 13 patients, rectal cancer in 4 patients, sarcoma in 2 patients, esophageal cancer in 2 patients, and renal cancer as well as pancreatic cancer in 1 patient each. Seventy-two of 121 follow-up pairs had the same scanner type and similar protocols, respectively, whereas 49/121 of the follow-up pairs had differences in scanner type and/or image acquisitions.

Comparison of Follow-up Assessment with and without ACC of Lesion Changes
In the reading with automated coregistration, the rate of correctly classified diagnoses was 91.3% (221/242) compared with 74.0% (179/242) in the reading without coregistration (P , .05). Regarding the individual diagnosis, the rate of correctly classified diagnoses for disease progressions was higher in the reading with coregistration and color-coding (93.3%, 56/60) compared with the conventional reading (81.7%, 49/60), yet without attaining statistical significance (P ¼ .07). Conversely, all follow-up pairs showing disease regressions were correctly identified in the ACC approach (100.0%, 60/60) compared with a rate of 81.7% (49/60) correctly classified disease regressions in the conventional reading (P , .05). The lowest detection rate in the conventional reading approach was found for mixed changes (50.0%, 31/62). However, by using automated coregistration and color-coding, the rate of correctly classified diagnoses for mixed change was significantly increased to 88.7% (55/62; P , .05). In contrast, the rate of correctly classified stable disease was the same in both reading approaches (83.3%, 50/60; P ¼ .77). There were 2 follow-up pairs for reader 1, and 3 for reader 2, in which a stable disease status was incorrectly diagnosed differently. For one follow-up pair, the incorrect diagnosis made was mixed changes, for the other 4 follow-up pairs, it was disease progression.
For the subgroup of follow-up pairs with divergent image acquisitions or scanner types, respectively, the overall proportion of correctly identified diagnoses was even slightly higher than in the overall collective, both in the conventional reading (75.5%, 37/49, versus 74%, 179/242) and the ACC approach (96.9%, 95/98, versus 91.3%, 221/242). Table 3 gives an overview of the correctly classified diagnoses. Table 4 indicates lesion numbers and lesion sizes.
The mean reading time for both readers was 74.2 (SD, 34.7) seconds in the conventional reading approach and was reduced to 51.8 (SD, 23.1) seconds in the ACC approach (P , .001, Fig 4). The mean time required for loading the images within the PACS was 5.2 (SD, 0.6) seconds in the conventional reading, and the mean time required for loading the application and processing the images was 39.7 (SD,1.4) seconds for the ACC software. Thus, the overall mean assessment time was longer in the ACC approach (91.5 [SD, 23.1] seconds) compared with the conventional reading using the PACS software (79.4 [SD, 34.7] seconds, P , .001).

DISCUSSION
In this study, we evaluated whether ACC of follow-up MR imaging examinations can improve the assessment of patients with brain metastases compared with a conventional reading approach regularly performed in daily radiologic routine. We found a significant improvement in the rate of correctly classified diagnoses for disease progression, disease regression, and mixed changes while the determination of stable disease did not benefit from using the ACC software. Concordant to these findings, the interreader agreement and the diagnostic certainty were higher in the reading with ACC. While the reading time itself was significantly lower with the ACC software, this advantage was nullified by the higher application loading and image-processing time, leading to a slightly higher overall assessment time in the ACC approach.
To our knowledge, our study is the first to evaluate the ACC for longitudinal metastasis assessment based on gadolinium-enhanced T1-weighted sequences. Our results are in line with previous studies investigating different software using similar concepts of automated coregistration and highlighting of lesion changes as well as studies using the same software for longitudinal evaluation of lesion changes in FLAIR images. 13,15,16,18 Galletto Pregliasco et al 15 found a higher sensitivity and subjective confidence in the detection of new lesions on MR imaging of patients with multiple sclerosis using automated coregistration and a colored overlay map to depict lesion development. Lennartz et al 16 reported a higher sensitivity and diagnostic accuracy for the longitudinal assessment of astrocytomas in FLAIR sequences, in particular for subtle changes. Compared with astrocytomas, the assessment of brain metastases may be much more challenging and burdensome, particularly when there are a high number of lesions. This challenge is reflected by the relatively low rate of correct diagnoses for mixed changes we found in the conventional reading approach. The overall workload resulting from these followup examinations is further enhanced because follow-up of brain metastases is such a frequently requested examination in daily clinical routine and the number of ordered examinations is increasing. 20 The high clinical relevance of brain metastasis assessment underlines the need for remedies for reducing the workload, while, at the same time, providing comparable or higher accuracy. Diagnostic assessment of MR imaging follow-up studies with mixed or subtle changes might specifically benefit from ACC of changes in consideration of a satisfaction-of-search bias and fatigue of radiologists. 11,21-23  Another important aspect of diagnostic tools is the ease of use and simplicity of implementation. In light of the recently growing use of artificial intelligence and deep learning-based applications, fully automated detection and assessment of brain metastases and cerebral lesions seems to be a promising concept. 24,25 However, besides ethical concerns, a straightforward integration into the daily routine is often problematic because these approaches usually involve experimental and complex networks, potentially hampering use in daily routine. [26][27][28] In contrast, the software evaluated in our study is launched within the PACS and may, therefore, be suitable for a routine workflow implementation. Of note, readers attained an even higher overall proportion of correct diagnoses in the subgroup of patients with divergent scanner types or protocols between the 2 examinations compared with the overall data set, which indicates that the proposed approach can alleviate interscanner and protocol differences in clinical routine.
In our study, we found a higher overall reading time including application loading times when using the ACC software, a finding that contrasts with previous studies reporting comparable or shorter reading times with this software. 15,16 While the shorter time required for the readout itself is in line with previous studies, the overall higher assessment time we found is most likely because we used 3D image acquisitions, which require higher processing power. The time required for loading and processing the images for ACC is certainly, in part, dependent on the computing power of the workstation and servers used. Therefore, it can be assumed that loading times might be improved with optimized hardware configurations, but they still may represent a potential hurdle for clinical implementation when using 3D data sets as input data.
While the overall rate of correctly diagnosed follow-up pairs with stable diseases remained constant between the 2 reading approaches, taking into account both readers, the assessment with the coregistration software led to a false determination of mixed disease in 1 case and progressive disease in 4 cases, all of which had been correctly diagnosed as stable disease using the conventional reading approach. Whereas our study does not show a tendency of the investigated software to significantly promote false-positive diagnoses, the potential pitfall of overemphasizing lesion changes should be considered when using the software clinically, and a potentially lower specificity should be assessed in a larger-scale clinical investigation.
Most important, we did not assess the response of metastases as per the RANO-BM criteria. 19 The underlying reason for that was that application of these criteria is mostly limited to clinical trials. On the contrary, we aimed to assess the application of the ACC software in the more common scenario of clinical brain metastases assessment outside of such trials. The results that we found might, therefore, not be generalizable to a RANO-BM response assessment. We encourage further investigations to evaluate the applicability and diagnostic value of the ACC software for assessing brain metastasis in clinical trials. Furthermore, the mixed changes category that we included is not included in the RANO-BM criteria. The reason to define this category is that we aimed to evaluate the use of the ACC software for this atypical response pattern because it is considered more prevalent in the increasingly used immune checkpoint inhibitor therapies and is, therefore, of clinical relevance.
There are further limitations to this study that need to be addressed. The results of this study are based on a retrospective and monocentric study design. It was focused on contrastenhanced 3D T1-weighted sequences, while the full readout of MR imaging follow-up in patients with cerebral metastases comprises .1 sequence. Another limitation is that most examinations used in this study were performed on 3T systems, which might limit generalizability to systems with lower field strengths. Moreover, we did not include patients with movement artifacts. In clinical routine, the presence of such artifacts can be expected to negatively impact the assessment using the proposed approach or even lead to technical failure of the application in the worst case. Last, a recall bias cannot be excluded, though we chose to set an 8-week latency period between the sessions and to randomize the order of patients before each reading session.

CONCLUSIONS
This study demonstrated an improved assessment of brain metastases when using a reading approach with ACC, particularly in regard to the detection of mixed-lesion changes. Therefore, we suggest considering such tools in clinical environments with a high throughput of follow-up MR imaging for the longitudinal assessment of brain metastasis.