Correction of Motion Artifacts Using a Multiscale Fully Convolutional Neural Network

The authors implement and validate an MRI motion-artifact correction method using a multiscale fully convolutional neural network. Application of the network resulted in notably improved image quality without the loss of morphologic information. For synthetic test data, the average reduction in mean squared error was 41.84%. The blinded reader study on the real-world test data resulted in significant reduction in mean artifact scores across all cases. BACKGROUND AND PURPOSE: Motion artifacts are a frequent source of image degradation in the clinical application of MR imaging (MRI). Here we implement and validate an MRI motion-artifact correction method using a multiscale fully convolutional neural network. MATERIALS AND METHODS: The network was trained to identify motion artifacts in axial T2-weighted spin-echo images of the brain. Using an extensive data augmentation scheme and a motion artifact simulation pipeline, we created a synthetic training dataset of 93,600 images based on only 16 artifact-free clinical MRI cases. A blinded reader study using a unique test dataset of 28 additional clinical MRI cases with real patient motion was conducted to evaluate the performance of the network. RESULTS: Application of the network resulted in notably improved image quality without the loss of morphologic information. For synthetic test data, the average reduction in mean squared error was 41.84%. The blinded reader study on the real-world test data resulted in significant reduction in mean artifact scores across all cases (P < .03). CONCLUSIONS: Retrospective correction of motion artifacts using a multiscale fully convolutional network is promising and may mitigate the substantial motion-related problems in the clinical MRI workflow.

P atient motion during MRI examinations results in artifacts that are a frequent source of image degradation in clinical practice, reportedly impacting image quality in 10%-42% of examinations of the brain. 1,2 Motion artifacts that substantially affect the diagnostic value of an MRI examination may be recognized at the time of image acquisition, resulting in repeat sequences in nearly 20% of all MRI examinations. 1,3 These repeat sequences incur substantial temporal and financial costs to the radiology department. 1 Because there is no guarantee that a patient will be better able to lie motionless during the repeat sequence, the diagnostic value of the images is often impaired.
The problem of motion has been addressed extensively by the MRI research community, leading to a large number of proposed techniques to reduce or eliminate motion artifacts in MRI. 4 Among the most widely used methods are prospective 5,6 and retrospective 7,8 navigator-based approaches, in which position information is extracted from data acquired using the MRI scanner itself. Most of these methods, however, are limited to a particular imaging situation and/or require additional scan time, which is usually undesirable. One of the most popular techniques for motion correction is the PROPELLER technique, in which rotating strips of several parallel k-space lines are acquired, leading to a strong oversampling of the k-space center. 9 Although widely used in clinics, it involves an increased acquisition time and can fail to correct for artifacts due to through-plane motion. Finally, iterative "autocorrection" methods that retrospectively suppress motion artifacts without additional information have been presented. 10,11 Most of these approaches, however, usually produce images with residual artifacts after correction.
On the other hand, deep neural network techniques have recently received much attention due to impressive results in many computer vision tasks. 12 In particular, fully convolutional neural networks (FCNs) have been successfully applied to complex image-to-image translation tasks such as semantic segmentation 13 or denoising. 14 In an early feasibility study, it was demonstrated that FCNs can also be applied to retrospectively correct motion artifacts in MRI. 15 Recently, this approach was used in a work by Duffy et al, 16 which relied on a large dataset and also explored the applicability of other network architectures to address this problem. Here we present an alternative artifact-correction method that relies on a multiscale FCN and includes both translational and rotational motion as well as a variety of complex patient motion profiles throughout the scan. The multiscale network architecture was trained in a residual learning setup, which allowed efficient capture of both high-and low-level artifact features in the input images. To validate our hypothesis that the presented method can significantly reduce the level of motion artifacts in MR brain images, a blinded reader study was conducted in which 2 experienced neuroradiologists visually assessed the degree of motion artifacts in a real-world test dataset of clinical MR brain images before and after correction by the trained network.

Data Acquisition and Analysis
An institutional review board-approved retrospective Health Insurance Portability and Accountability Act-compliant study was performed, and patient consent was waived. Training of the FCN was accomplished using a dataset with simulated artifacts introduced into in vivo clinical brain image data. To create a dataset representative of images obtained in the clinical routine, we selected 46 scans from consecutive patients undergoing clinically indicated MRI brain examinations from the image archive of the department. Common indications included known or suspected intracranial tumor (primary or metastatic, including follow-up), known or suspected acute ischemic stroke (including follow-up), known or suspected demyelinating disease, dementia, suspected transient ischemic attack, and headache. Findings included acute and subacute ischemic infarcts, metastatic lesions, and microvascular angiopathy, among others. Eight scans (17.4%) did not show any detectable pathology. Patient age ranged from 28 to 89 years (mean age, 58.7 years), with a male/female ratio of 1:1.05.
All image volumes were manually reviewed on a slice-by-slice basis by an experienced neuroradiologist. Annotation of rigid body motion was performed using a previously defined 5-point Likert scale with a range of S ¼ 0-4, in which the scores correspond to no (S ¼ 0), minimal (S ¼ 1), mild (S ¼ 2), moderate (S ¼ 3), and severe (S ¼ 4) artifacts. 1 Of note, scans with S ! 3 are considered marginal in diagnostic quality and should be repeated. 1 In this initial review, a single motion-artifact score was given to each scan/volume (ie, sequence). All scans had been performed at a single institution on 1 of two 3T MRI scanners (both Ingenia; Philips, Best, the Netherlands) and consisted of T2weighted multislice 2D turbo spin-echo whole-brain sequences. Scan parameters were in the following ranges: TE ¼ 80-100 ms, TR ¼ 3000-5700 ms, flip angle ¼ 90°, echo-train length ¼ 12-21, acquisition matrix size ¼ 400-480 Â 280-400, reconstructed image matrix size ¼ 512-560 Â 512-560, number of slices ¼ 30-50, slice thickness ¼ 3-5 mm, number of signal averages ¼ 1-2, pixel bandwidth ¼ 142-204 Hz/px, phase FOV ¼ 74%-88%. Only magnitude data were used because the images were retrieved from the DICOM archive of the imaging department. All retrieved DICOM data were anonymized.

Dataset Generation
All volumes that were deemed artifact-free by the radiologist (18 of 46) were then used to generate the synthetic training and test dataset (16 and 2 volumes, respectively). In each volume, the lower 8 and top 5 slices were discarded to restrict the analysis to clinically relevant parts of the scan, resulting in 312 and 39 slices that were used for generation of the synthetic training and the test dataset, respectively. Artifacts simulating rigid translational and rotational in-plane motion were then introduced into the Fourier-transformed data, in which the parameter ranges were selected to generate a large range of realistic artifact appearances. For each input image, the assumed echo-train length of the turbo spin-echo readout was chosen randomly in the range of 8-32. Similarly, the assumed extent of zero-padding in k-space was chosen randomly in the range of 0-100. Motion trajectories, (ie, translation/rotation vectors as a function of scan time) were generated randomly to simulate the artifacts. The different types of motion trajectories that were used in this study are shown exemplarily in Fig 1. In the "sudden motion" trajectory (top and middle profiles in Fig 1), the subject is assumed to lie still for a large part of the examination, until a swift translation or rotation of the head occurs. The time point of the sudden motion was taken randomly as a fraction of the total scan time in the range of one-third to seven-eighths. In addition, a large range of random motion trajectories was simulated (bottom profile in Fig 1) using a random colored noise generator. 17 The exponent of the power spectral density of the generator was randomly chosen in the range of 1-100 to create both high-and low-frequency motion profiles. To account for possible motion during the "waiting time" in a multislice acquisition sequence (ie, the time that is spent to acquire data from other slices), we added small random shifts to the motion profiles after each assumed acquisition of an interleaf (defined by the echo-train length) in the k-space, as shown in the bottom profile in Fig 1. The maximum magnitude of the motion was chosen randomly in the range of 1-4 px and 0.5°-4.0°for translation and rotation artifacts, respectively. For artifacts due to rotation, the center of rotation was also varied randomly in the range of 0-100 px in each direction.
The synthetic training dataset was generated using 16 of the 18 volumes. For each input image, data augmentation was realized using random translation (0-10 px), random rotation (0°-10°), and random deformation of each input image before insertion of the artifacts. For the latter, 2 random second-order 2D polynomials were used as pixel-shift maps for x-and y-deformation. An artifact-only image was calculated by subtracting the artifact-free reference image from the artifact-corrupted image. For the synthetic training dataset, 300 image pairs (artifact-corrupted and artifact-only images) were created for each input image, resulting in 93,600 image pairs in total. The synthetic test dataset was derived from the 2 remaining volumes and consisted of 11,700 image pairs.
To evaluate the performance of the network on data with artifacts due to actual patient motion, we used the 28 clinical volumes that were initially rated with the score S ¼ 1-4 (ie, containing motion artifacts) as a real-world test dataset.

Network Structure, Training, and Evaluation
The network structure used in this study, depicted in Fig 2, was based on a recently proposed architecture called Foveal FCN. 18 In contrast to standard FCN architectures, it involved the processing of the input image at 3 different scales, thereby realizing an efficient extraction of both high-and low-level features with minimal memory requirement. Even for the relatively large input images used in this study (512-560 Â 512-560 pixels), a minibatch size of 32 could be used. The FCN allowed a patch-based processing of the image, which did not affect the overall outcome of the correction. Therefore, the image was divided into nonoverlapping patches, which were processed in conjunction with larger, but down-sampled patches at the same position (depicted by the different colored boxes in Fig 2). Feature extraction for each patch was performed using 2 layers, each consisting of a convolutional layer, followed by batch normalization, and a rectified linear unit. Larger kernel sizes were used for the convolutional layers in the higher scales to allow larger effective receptive fields. Feature integration was realized using average unpooling and convolutional layers. This patch-based processing allowed processing of input images with variable size.
The network was trained for 50 epochs using stochastic gradient descent, in which the Adam algorithm 19 was used to update the learning rate for each parameter. The mean squared error (MSE) between the artifact estimate of the network and the After training, the network was applied to the synthetic test dataset (with simulated artifacts), which allowed direct visual comparison with the artifact-free reference image. In addition, quantitative analysis was performed using the MSE and the structural similarity index 20 between artifact-corrected and reference images as metrics. Two-sample t tests were performed to test whether the network application yielded a significant alteration of these metrics. To evaluate the impact on artifact-free images, the network was also applied to all slices of the 2 test volumes without simulated artifacts.  Architecture of the used Foveal fully convolutional neural network. The input image was split into different patches (indicated by red box), and each patch was processed in conjunction with larger, down-sampled patches at the same location (blue and green boxes). The size of the patches was chosen to account for the loss of border pixels in every convolutional layer. Each feature-extraction path consisted of 2 layers, each comprising a convolutional layer (C), batch normalization (B), and a rectified linear unit (R) activation. Feature integration was realized using average unpooling (U) and convolutional layers. Kernel sizes and the number of channels are denoted as k and n, respectively. The output was the estimate of the motion artifacts by the network for the selected image patch.
In addition, the trained network was also evaluated on the real-world test dataset. For each section, the artifact estimate of the network was subtracted from the original input image. Because artifact-free reference images were not available for these 28 clinical motion cases, all artifact-corrupted input and artifact-corrected output images (962 in total) were rated on a section-by-section level in a blinded reader study using the 0-4 qualitative scale described previously. The images were shown to 2 board-certified neuroradiologists in random order and did not contain any information regarding origin (ie, before/after correction). A 1-sample t test was performed for each artifact score class using a significance level of .05, corresponding to critical values in the range of 1.65-1.75 for the different score classes.

RESULTS
For the synthetic test data, the network-based correction resulted in a reduction of motion artifacts and yielded image data with improved image quality. Figure 3 shows the performance of the network for 2 sample slices from the synthetic test dataset. Only minor residual artifacts can be identified in the corrected images (marked by blue arrows). In some parts of the corrected images, a mild blurring can be observed compared with the ground truth images (marked by red arrows). This is confirmed by inspection of the difference between corrected and ground truth images (rightmost column in Fig 3), where the contours of certain anatomic structures can be discerned. Most important, these structures cannot be identified in the artifact-estimate image (ie, the network output).
The results of the quantitative analysis on the synthetic test dataset are visualized in Fig 4, which reveals a substantial reduction of the MSE due to application of the network. On average, the network-based artifact correction resulted in a reduction of the MSE of 41.84%. Similarly, an average increase of the structural similarity index from 0.863 to 0.924 was observed (ie, the network increased similarity to the ground truth). Both the reduction of MSE and increase of the structural similarity index were confirmed to be statistically significant (P , :001 in both cases). For the images without motion artifacts, the structural similarity index was only slightly reduced from 1.0 to 0.99, suggesting that the modifications by the network were negligible in these cases. The network performance on 2 sample cases that were rated as artifact-free by the radiologists is shown in the Online Figure. Similar results were obtained for the test dataset with real motion cases, as shown exemplarily in Fig 5. Substantial improvement of image quality was observed for nearly all cases, often with no, or only minor, residual artifacts (marked by blue arrows). Similar to the synthetic test data, minor blurring was  observed in some parts of the images (marked by red arrows), typically in regions with severe artifacts in the input images. Most interesting, the algorithm showed robust performance even for radiologically detectable brain pathologies that were not present in the training dataset, such as the lesions in the third and sixth rows of Fig 5 (marked by white arrows), suggesting that the FCN appropriately targeted the motion artifacts only and left the underlying image data relatively unaltered.
These qualitative findings were confirmed by the results of the blinded reader study. The mean artifact scores for the real-world test dataset before and after the network-based correction are listed in the Table for both readers. Application of the correction resulted in a reduction of the mean artifact score for all artifact levels and both readers. As can be seen from the Table, the total reduction of the mean artifact score was substantially larger for the higher artifact score classes. This reduction was statistically significant for all score classes, as confirmed by the 1-sample t tests for the different artifact score classes: P , :03 and t . 3:0 in all cases. The detailed results of the reader study are shown in Fig  6 in matrix form, where each cell indicates the number of images for a particular score pair (before/after correction). Both matrices are dominated by values on or below the diagonal, again confirming the overall positive impact of the network-based artifact correction. Agreement between both readers was high: Both readers agreed in 74.4% of all images and disagreed by 1/2/3/4 scores in 24.2%/1.2%/0.1%/0.0% of all images. The weighted Cohen k was k ¼ 0:82.

DISCUSSION
The presented network-based motion artifact correction represents a purely retrospective correction technique, requiring only standard-magnitude images from a traditional DICOM repository (PACS for example), which can be performed at any time after image acquisition. Its artifact-detection and removal capabilities rely on the prior knowledge that has been encoded in the network parameters during training. As a consequence, it does not require additional scan time or data input apart from the magnitude-only MRI. It may, hence, be applied retrospectively to all suitable datasets in an image archive, potentially mitigating artifact-related clinical problems such as difficult radiologic interpretations, reduced workflow efficiency, and increased institutional costs. A fundamental requirement for such applications is the ability of the network to generalize to unseen anatomic and pathologic structures. The successful identification of artifacts in clinical motion cases, even in the presence of pathologies that were not seen during training (Fig 5), is very promising in this regard.
The real-world applicability of this technique was preliminarily assessed in the clinical reader portion of this study, which suggested that optimal FCN performance (resulting in the highest final image quality) was generally obtained for cases with motion artifacts that were scored as "mild" before correction (scores S ¼ 1 or S ¼ 2). In many of these cases, the resulting images were considered artifact-free by the blinded radiologists. The small number of cases in which an increased artifact score was given after the correction-8 and 3 of 481 images for readers 1 and 2, respectively-may be attributed to intrareader variability. Direct visual comparison of these images before and after the correction did not reveal an increased artifact level due to the application of the network.
On the other hand, reduction of the mean artifact score was most pronounced for the severe-artifact cases (initial motion scores of 3 and 4), which almost always yielded lower motion scores than those assigned before correction. Of note, the complete removal of all artifacts was not achieved in all of these cases, however, as confirmed by visual inspection (Fig 5). It is currently not clear whether these residual artifacts constitute a fundamental limitation of the presented method or whether further improvements are possible. Given that performance on classic computer vision tasks such as image classification has recently been considerably improved by increasing network depth, 21,22 more comprehensive training datasets, as well as various other technical developments, 23,24 it is conceivable that the application of such techniques to the examined problem may yield even lower residual artifact levels. Such developments may be further supported by applying the network to complex input data. Because motion artifacts are typically very prominent in the phase images, separation of artifacts and anatomy may be facilitated. Alternatively,  other loss functions such as L1 25 or perceptual loss 26 may be explored in this regard. Recently, several works also suggested an adversarial loss to improve the visual appearance of the corrected images. 16,27 While further research is needed on this topic, we avoided adversarial loss terms in this study due to the potential threat of generating visually pleasing but synthetic structures, which, in the worst case, may be misinterpreted as pathologies.
Compared with the results of the clinical reader study, quantitative analysis of the network performance on synthetic data revealed a somewhat limited reduction of the MSE, in particular in view of the striking visual improvements of image quality as shown in Figs 3 and 5. This discrepancy between visual assessment and MSE may be partially explained by the fact that the correction capability of the network relies mainly on a removal of ghosting artifacts, whereas the corresponding minor reduction in signal intensity of the original anatomic structures is relatively unaffected. While the latter effect has a negligible impact on visual perception, it significantly affects the MSE. This interpretation confirms previous reports of the inadequacy of the MSE for measuring image similarity. 20,28 Another potential limitation may be the residual image blurring that could be identified in select cases following filtering, particularly in images that were initially scored as having severe motion artifacts. Arguably correction of this blurring may represent a more difficult image translation task than removal of the typical ghosting artifacts due to motion. Recent works, 29,30 however, suggest that neural networks may also enable such image deblurring.
Future works should address these current limitations, as well as extend the presented clinical validation to more diverse datasets. In particular, cases with small lesions (eg, intracranial metastases) or anatomic structures that may have an appearance similar to motion artifacts (eg, small vessels) may prove critical. In addition, the performance of the network in case of other MRI contrasts, other anatomies, or additional artifacts, such as signal voids, wrap-around artifacts, or N/2 ghosting, 31 should be examined in detail.
The presented Foveal FCN offers the potential for retrospective improvement in image quality in examinations with motion during acquisition. Our study suggests that a network-based correction technique is capable of significantly improving image quality in clinical, motion-degraded images. Admittedly, this technique may not yet be capable of completely removing all motion-related artifacts, though implementation of such an FCN may prove a useful asset in the clinical workflow. Radiologists often claim a capacity to "see through" certain types of mild artifacts so that a modest reduction in the degree of artifacts may suffice to enable a reliable interpretation of the images. In addition, the presented method is largely orthogonal to other techniques for motion-artifact reduction such as those based on MRI navigators or external tracking devices. It may be used to correct for residual motion artifacts that often remain even when such techniques are applied.

CONCLUSIONS
This work demonstrates the feasibility of retrospective motionartifact correction in MRI using a multiscale FCN. The presented method does not require additional input data apart from magnitude-only MRI and appears to effectively correct for motion artifacts, even in case of unseen pathologies.