Deep Learning Enables 60% Accelerated Volumetric Brain MRI While Preserving Quantitative Performance: A Prospective, Multicenter, Multireader Trial

,

D eep learning (DL) is a subset of machine learning that uses convolutional neural networks to process large volumes of data. [1][2][3][4][5][6] While traditional reconstruction techniques can be limited by long scan times, SNR constraints, and motion artifacts, the recent application of DL to image reconstruction can enable faster image acquisitions with equal or enhanced image quality. 2, 3 While DL can boost SNR among other advantages over conventional methods, 2-6 concerns exist over whether postprocessing can mask or alter pathology and whether the quantitative values derived are consistent with those obtained from routine standard of care (SOC) scans over the gamut of scanner vendors and across field strengths.
MR imaging depicts brain anatomy with high spatial and contrast resolution, qualities crucial when applying anatomic segmentation and quantitative volumetric analysis. Quantitative volumetric analysis requires 3D radiofrequency spoiled gradientecho T1-weighted scans, and k-space acceleration opportunities have limits. Reduced excitations and undersampling techniques like compressed sensing accelerate scans at a cost of increased image noise (reduced SNR). Acceleration via decreases in the imaging matrix can improve both SNR and contrast resolution but reduce image sharpness. For this study, both undersampling and reduced imaging matrices were used for the accelerated scans (FAST), which were then processed with a commercially available DL tool (SubtleMR; Subtle Medical) that provides both denoising and sharpness enhancement (FAST-DL). Our goal was to match or exceed SOC image SNR and spatial resolution while maintaining clinical and quantitative integrity. 7 In this prospective, multireader, multicenter study, we explored the impact of DL-based enhancement of 60% accelerated 3D T1-weighted brain MR image acquisitions. We found that the DL-processed images demonstrated high volumetric quantification accuracy and matched clinical disease status predictability and provided what readers perceived as superior image quality compared with the longer SOC examinations, suggesting good generalizability, accuracy, and potential utility of DL enhancement in routine clinical settings.

Participants
With Western Institutional Review Board approval and patient consent, 40 consecutive subjects (mean age, 69 [SD,17] years; 21 men, 19 women) undergoing clinically indicated brain MR imaging examinations for subjective memory loss were prospectively recruited during an 8-month period.

Image Processing
FAST-DL was performed off-line using an FDA-cleared, vendoragnostic, DICOM-based, convolutional neural networks-dependent deep learning artificial intelligence image-enhancement software product, SubtleMR (Version 1.2). The training set included hundreds of thousands of MR imaging datasets from a variety of vendors (GE Healthcare, Philips Healthcare, Siemens, Hitachi, and so forth), scanner models, field strengths, and clinical sites, as well as a variety of disease states/clinical indications, thus experiencing a range of tissue contrasts, acquisition parameters, patient anatomies, and variable image quality.
The DL network was trained on paired low-/high-resolution images to impart structure-preserving noise reduction and sharpness enhancement to newly acquired images. 7 Processing does not use proprietary raw k-space input (DICOM-based) and is, thus, vendor-agnostic. For the study, DL processing required ,1 minute per series on a scanner-connected GPU server and finished before the next sequence acquisition was completed, thus not impacting overall examination time. Images were gathered from different sites and presented to the reviewers on a commercial DICOM viewer.
The SOC, FAST, and FAST-DL image sets were processed with a machine learning-based FDA-cleared quantitative volumetric software product, NeuroQuant (Cortechs.ai). The hippocampal occupancy score (HOC), a biomarker to predict the progression of neurodegenerative diseases, as well as the volumes of the hippocampi (HV), superior lateral ventricles (SLV), and inferior lateral ventricles (ILV) were analyzed for this study.

Radiologic Assessment
For the image-quality assessment, 2 experienced board-certified neuroradiologists (.17 years' experience each) were presented with 40 paired side-by-side multiplanar 3D T1-weighted series datasets (360 series). The blinded datasets (SOC versus FAST, SOC versus FAST-DL, FAST-DL versus FAST) were randomized in disease classification, image plane, and left-right display order. The readers evaluated 2 images side-by-side and provided a single Likert scale ranking between 1 and 5 that described whether the left or right image was superior (3 ¼ both images were preferred equally; 2 or 4 ¼ right/left mildly preferred; 1 or 5 ¼ right/left strongly preferred) for the following: 1) perceived SNR; 2) perceived spatial resolution (sharpness); 3) imaging artifacts; 4) anatomic/lesion conspicuity; 5) image contrast; and 6) gray-white matter differentiation. Anatomic conspicuity of brain structures such as the deep gray nuclei was used in cases in which a lesion was not conspicuous on the 3D T1-weighted images. Sample lesions in our datasets included infarcts and prominent white matter ischemic disease.
To assess clinical classification performance, we categorized the quantitative biomarkers obtained from 80 datasets (40 SOC and 40 FAST-DL) in a blinded, randomized fashion. Each dataset was rated using a binary predictive classification system (healthy/ mild cognitive impairment [MCI] versus dementia) with ground truth established according to the statistical significance of 3 biomarkers falling .2 SDs from the mean: HOC (,5%), HV (,5%), and ILV (.95%) based on an (n . 4000) age-and sexmatched normative data base, with 0/3 and 1/3 statistically significant biomarkers categorized as healthy/MCI; and 2/3 and 3/3 categorized as likely dementia.
Following qualitative feature ranking and quantitative analysis, both readers were presented the SOC and FAST-DL datasets in a side-by-side fashion, randomized in right-left orientation to qualitatively assess the overall diagnostic quality of the 3D T1weighted images before postprocessing with NeuroQuant and again after postprocessing with NeuroQuant, with the goal of visually assessing the quality of matched color-coded segmentation of the latter. The FAST scans were excluded from this analysis because they would not be typically used as input to quantitative segmentation software given their lower spatial resolution.

Statistical Analysis
Wilcoxon rank sum tests were performed to assess the equivalence or superiority of the image quality for each feature (Table  1). Statistically significant superiority for a feature was determined by a P value , .05.
Adjustment for significance tests for multiple comparisons was made using the Bonferroni correction, which adjusts the significance level to P , .05/.06 (0.00833).
Paired t test analysis was performed to test the equivalence of quantitative data on both the SOC versus FAST-DL images ( Table 2) and SOC versus FAST images (Table 3). Linear regression graphs (Figs 1 and 2) and Bland-Altman analysis (Figs 3 and 4) were performed to assess quantitative volumetric biomarker equivalence of the datasets. The Spearman rank correlation test was applied to assess interreader agreement between the 2 neuroradiologists on image-quality ratings. Additionally, interrater reliability analysis was performed using an equalspacing weighted Cohen k statistic to measure the consistency of the 2 readers' evaluation of image quality.

Qualitative and Quantitative Performance
FAST-DL was statistically superior to SOC in subjective image quality for perceived SNR, sharpness, artifact reduction, anatomic/lesion conspicuity, and image contrast (all P values , .008), despite a 60% reduction in sequence scan time. Both FAST-DL and SOC were statistically superior to FAST for all analyzed features (all P values , .001). Wilcoxon rank sum statistical results are collectively summarized in Table 1.
Paired t test analysis demonstrated excellent agreement of quantitative data on both the SOC and FAST-DL images ( Table  2). As expected, there was less agreement between the SOC and FAST datasets (Table 3) due to the lower spatial resolution of the FAST images. There was no statistically significant difference between mean HOC values in the SOC (0.68 [SD, 0.16]) and the mean FAST-DL (0.68 [SD, 0.16]) datasets. The difference of the HV, SLV volumes, and ILV volumes was also negligible (,2%) for the SOC and FAST-DL datasets. The linear regression graphs (Fig 1) and Bland-Altman plot graph analysis (Fig 3) further demonstrated strong agreement between quantitative values in each dataset across the range of conditions (normal, MCI, Alzheimer disease) with the HOC ranging from 0.32 to 0.95 mL. There was 100% agreement in clinical disease classification of both the SOC and FAST-DL datasets (n = 29 healthy/MCI and n = 11 dementia). The cross-correlation factor and degree of scatter was consistently worse for the SOC and FAST images compared with the SOC and FAST-DL images as demonstrated on the linear regression graphs (Fig 2) and Bland Altman plot graph analysis (Fig 4).
There was excellent interreader agreement between the 2 neuroradiologists on the Spearman rank correlation test applied to the Likert image quality ratings, with a Spearman r value of 0.725 (P , .01). The k value of 0.62 (P , .001) also confirms substantial interrater agreement on the Likert scale rankings.

Overall Diagnostic Quality
All SOC and FAST-DL datasets were rated of diagnostic quality by both interpreting neuroradiologists. Both readers determined that there was similar quality of segmentation for both the SOC and FAST-DL datasets. Representative imaging examples for image-

DISCUSSION
DL enhancement of MR images is known to provide multiple benefits, including increased SNR, 2,3 but questions remain about reliability in general clinical use. 8 The approach used in our multicenter, multivendor study explored the impact on the image quality and consistency of quantitative volumetric analysis results obtained with FAST-DL compared with that obtained with SOC scans. Quantitative volumetric MR imaging analytical tools are in widespread clinical use for the evaluation of patients with dementia, seizures, multiple sclerosis, traumatic brain injury, and pediatric brain disorders. The software segments, labels, and calculates the volumes of substructures (including lesions) in the brain. The derived quantitative values are compared with a large normative age and sex-matched data base, aiding in the diagnosis and longitudinal follow-up of clinical conditions such as Alzheimer's disease. Quantitative assessment reduces reader subjectivity. [9][10][11][12] MR imaging provides excellent anatomic detail and superb contrast resolution but involves trade-offs in SNR, spatial resolution, and scan duration. 7 While DL-based augmentation is a recognized solution for accelerated MR imaging, it is important to validate the reliability and generalizability of its enhancement capabilities with quantitative biomarker accuracy. 7 The MR imaging experience is uncomfortable and associated with frank anxiety reactions in up to 30% of patients. 13,14 Faster MR image acquisition can thus increase patient satisfaction and may reduce motion artifacts. Motion is a significant challenge in MR imaging, occurring in 29% of inpatient/ emergency department examinations and 7% of outpatient studies and can lead to repeat portions of or even complete examinations. 15 Andre et al 16 found that 19.8% of all MR imaging sequences need to be repeated due to motion artifacts, a $592 revenue loss per hour and $115,000 loss annually per scanner due to motion artifacts. DLbased reconstruction solutions promise to enable shorter examinations with decreased patient motion and improved patient comfort. 14 In our study, we achieved a scan-time reduction of 60% while exceeding perceived routine 3D T1-weighted image quality. If DLenhanced fast protocols were used on all pulse sequences across every study, one could anticipate a proportional increase in examination-based workflow efficiency for an imaging facility. One recent trial explored DL enhancement across all pulse sequences in clinical spine MR imaging, with preservation of quantitative features using a structural similarity index measure as well as gains in perceived SNR and artifact reduction, despite a 40% scan-time reduction. 2 Future research could explore whether scan-time reduction of this scale results in a true-positive impact on workflow, eg, the ability to scan more patients per day.
In this trial, the SOC images serve as the standard for image preference. Our randomized blinded assessment of the imaging features is meant to reflect human subjective perception  of comparative image quality. A radiologist's qualitative assessment of noninferiority is critical before a DL-enhanced alternative would be considered acceptable for clinical use. On the other hand, processed images should satisfy both qualitative and quantitative measures to ensure that diagnostically relevant features are not altered and the integrity of the processed image information is maintained.
Concerns exist about DL postprocessing introducing instabilities in an image, in which tiny perturbations in the sampling domain have been shown to be capable of translating into noticeable artifacts on the reconstructed image. 8 This issue has been shown for highly-contrived noise additions to k-space data, but it is unclear whether such effects occur under normal operating conditions. The current method starts from image-based DICOM data rather than within the k-space and is likely less susceptible to this effect.
To our knowledge, this is the first prospective, randomized, multicenter study of DL reconstruction capabilities assessing the impact on the integrity of quantitative volumetric analysis of clinical brain MR imaging examinations. The DL tool applied in this study shifts the usual MR imaging trade-off equation by imprinting a boost in spatial resolution on the target FAST series, which, due to inherently larger native voxel sizes, can have a higher SNR and contrast-to-noise ratio than even the basis SOC series. 7 Along with sharpness enhancement, DL offers structure-preserving denoising, contributing to statistically significant gains in perceived SNR compared with SOC.
Blinded subjective assessment by the neuroradiologists found that the 60% accelerated, DL-enhanced 3D T1-weighted brain MR images delivered consistent clinical classification and were superior to standard of care MR imaging across essentially all analyzed quality features (perceived SNR, perceived spatial resolution, artifact reduction, anatomic/lesion conspicuity, and image contrast). These findings offer confidence that DL processing can add value and efficiency to clinical MR imaging brain examinations.
The quantitative biomarkers of HV, HOC, SLV volume, and ILV volume were statistically equivalent for the FAST-DL sequences and the SOC, supporting the absence of corruption by DL processing and demonstrating the robustness of the DL tool in maintaining quantitative integrity and enhancing image quality despite significant scan-time acceleration. Not unexpectedly, the cross-correlation factor was inferior for the SOC versus the lower resolution FAST dataset.
The strengths of this study include the use of a prospective, randomized, multicenter, multireader study design with images obtained on magnets from multiple vendors and of variable ages and field strengths, with preserved accuracy of quantitative volumetric measures and clinical predictive categories. The results of this trial support the use of DL enhancement to shorten clinical MR imaging brain examinations, even when additional quantitative tools such as volumetric analysis are applied.
Weaknesses include the small number of imaging subjects and the use of only a single DL and quantitative brain-analysis tool. The DL-enhancement tool used in this study is a vendoragnostic, DICOM-based, commercially available solution. It is possible that alternative vendor-specific, k-space-based, commercially available DL enhancement tools may perform differently, though preliminary data suggest observed gains in acceleration and perceived quality with other DL-enhancement tools as well. 17,18 However, this is the first study that the authors are aware of that specifically confirms the quantitative volumetric accuracy and consistent clinical disease categorization of the DLenhanced dataset.
Future investigations might explore different methods and tools. Another area of future research might include a similar methodology applied to different clinical scenarios that demand accurate segmentation but where scan time acceleration would be desirable, such as in patients with multiple sclerosis, intracranial metastases, epilepsy, and traumatic brain injury. Follow-up studies could also assess whether the difference between the FAST and FAST-DL datasets was significant enough to impact the correct clinical diagnosis or alter the reader's ability to detect a lesion.

CONCLUSIONS
DL can enable 60% faster brain MR image acquisitions with matched clinical disease status predictability and statistically superior perceived image quality, while maintaining high quantitative accuracy compared with the longer SOC examinations. This trial supports the reliability, efficiency, and utility of DLbased enhancement for quantitative imaging. Shorter scan times may boost the use of volumetric quantitative MR imaging in routine clinical settings.