Removing inter-subject technical variability in magnetic resonance imaging studies
Introduction
In recent years, there has been an increase in the number of multi-site neuroimaging studies, including the Human Connectome Project (HCP), the Alzheimer's Disease Neuroimaging Initiative (ADNI), and the Australian Imaging, Biomarkers and Lifestyle Flagship Study of Aging (AIBL). In structural magnetic resonance imaging (MRI) studies, larger samples of subjects yield more power to detect structural variations in different subgroups, for example, changes in the hippocampal volume associated with Alzheimer's disease (AD) and mild cognitive impairment (MCI). However, because MRI intensities are acquired in arbitrary units, it has often been found that the differences in MRI intensities between scanning parameters and studies are larger than the biological differences observed in these images. For instance, (Shinohara et al. (2014) shows that in the ADNI and AIBL studies, which have highly standardized protocols, striking differences in the raw intensities are observed between imaging sites.
Since the raw image intensities are non-comparable across sites and between subjects, intensity normalization is paramount before performing between-subject intensity comparisons at the voxel level. While intensity normalization is not as important in other applications such as morphometry and brain volumetrics (Ashburner and Friston, 2000, Jovicich et al., 2013), it is essential for analyzing change in intensities within an MRI volume over time (Ghassemi et al., 2015a, Sweeney et al., 2016), developing intensity-based biomarkers (Chong and Lim, 2009, Meier et al., 2007, Vardhan et al., 2014) and for regression analyses at the voxel level (Hartung et al., 2014, Smith et al., 2004). The challenge of intensity normalization has been largely addressed in the literature (Jager et al., 2006, Leung et al., 2010, Madabhushi et al., 2006, Nyúl and Udupa, 1999, Nyúl et al., 2000, Shinohara et al., 2011, Shinohara et al., 2014, Weisenfeld and Warfield, 2004), with several methods reviewed in (Shah et al., 2011). Recently, a novel intensity normalization method, called White Stripe (Shinohara et al., 2014), was developed to bring raw image intensities to a biologically interpretable intensity scale. The method applies a z-score transformation to the whole brain using parameters estimated from a latent subdistribution of normal-appearing white matter (NAWM). The use of NAWM for normalization makes the method suitable for many studies of brain abnormalities, as in the case of multiple sclerosis (MS) lesions. While the method has been shown to make the white matter (WM) comparable across subjects, it was noted that residual across-subject variability was still present in the grey matter (GM).
In this work, we investigate between-scan technical variability that is left uncorrected by intensity normalization. We show that while common intensity normalization methods successfully correct for global intensity shifts associated with scanner site, substantial between-scan technical variation remains. This technical variation can be due to scanning parameters, scanner manufacturers, scanner field strength, and other factors. We refer to any post-normalization inter-scan variation that is not biological in nature as a “scan effect.”
To correct for scan effects, we propose Removal of Artificial Voxel Effect by Linear regression (RAVEL). RAVEL is a tool for removing unwanted variation present after intensity normalization. RAVEL is inspired by the batch effect correction tools SVA (Leek and Storey, 2007, Leek and Storey, 2008) and RUV (Gagnon-Bartsch and Speed, 2012) used broadly in genomics. In the analysis of gene expression and other genomic data, residual noise after intensity normalization is referred to as batch effects, because experiments are often performed in batches run on different dates. If not accounted for, batch effects have been shown to lead to spurious associations (Leek et al., 2010). To make a parallel with brain-imaging studies, batch effects are comparable to scan effects, where a single scan plays the role of a batch.
We use the linear model introduced in (Leek and Storey, 2007) to decompose the variation of the normalized intensities into a biological component of interest (variation associated with clinical covariates) and an unknown, unwanted variation component to be estimated from the data. The unwanted variation component encapsulates both technical variation and biological variation that is not of interest in the study. We register the different scans to a common template to allow the use of voxel-wise linear models, and estimate the unwanted variation component from regions of the brain that are not expected to be associated with the clinical covariates of interest. This follows the methodology of the RUV batch effect correction tool (Gagnon-Bartsch and Speed, 2012) which was later discussed in Leek (2014) for RNA sequencing. Unlike intensity-normalization methods, RAVEL utilizes all images in the study to leverage information about unwanted variability. Here, we use voxels that are consistently labelled as cerebrospinal fluid (CSF) across subjects as a control region; these voxels are not expected to be associated with disease (Luoma et al., 1993).
We evaluate the performance of RAVEL using a large subset of the ADNI database consisting of more than 900 subjects. We demonstrate our method by using the T1-weighted (T1-w) images from subjects with AD and MCI, as well as healthy controls. We follow the work of Fortin et al. (2014) to benchmark RAVEL against two intensity normalization procedures without any scan effect correction: the popular histogram matching algorithm and White Stripe. We focus on showing that RAVEL improves the replicability of the biological findings. Critically, we show that a reduction of technical variation does not result in removing biological variability. Namely, making intensity densities more similar does not necessarily improve sensitivity to biological changes; on the contrary, overmatching of distributions can result in the removal of biologically relevant signal. To show improvement in terms of biological findings, we first demonstrate that the top voxels associated with AD in the RAVEL-corrected dataset are more replicable across independent subsets of subjects. We measure the replicability of the results by randomly splitting the ADNI dataset into discovery and validation cohorts multiple times. Then, we show that the top voxels associated with AD after RAVEL correction are more enriched for brain regions known to undergo structural changes in AD. Finally, we show that the average hippocampal intensity after RAVEL correction performs better than intensity-normalized-only images in discriminating between AD patients and healthy controls, and between MCI patients and healthy controls. This shows that RAVEL-corrected T1-w intensities are more biologically meaningful than intensity-normalized-only images for group comparisons, and also potentially promising for the development of biomarkers.
Section snippets
Study population
Our dataset consists of a subset of 917 subjects downloaded from the ADNI database (adni.loni.usc.edu). For each subject, we selected a study visit at random. We obtained 506, 184, and 227 subjects from the ADNI, ADNI-2, and ADNI-GO phases, respectively. We present summary statistics of the study population in Table 1. The selected scans were acquired at 83 different imaging sites, with a median number of 10 patients per site. The scans are also well-balanced for disease status across sites.
Results
We compared RAVEL to three normalization strategies: raw image intensities (no normalization), White Stripe (Shinohara et al., 2014), and histogram matching (Shah et al., 2011).
Discussion
In this work, we have presented the scan effect correction tool RAVEL, to correct for inter-scan unwanted variability in MRI studies that is present after intensity normalization. We have shown that RAVEL, applied after normalizing the intensities with White Stripe, substantially improves the replicability of the regions of the brain found to be the most associated with AD. RAVEL, inspired by the batch effect correction tools SVA and RUV, infers the unwanted variation in the images by using
Abbreviations
- AD
Alzheimer's disease
- ADNI
Alzheimer's Disease Neuroimaging Initiative
- ANTs
Advanced normalization tools
- AUC
Area under the curve
- BET
Brain extraction tool
- CAT
Concordance at the top
- CBGM
Cerebellar gray matter
- CSF
Cerebrospinal fluid
- DTI
Diffusion tensor imaging
- FAST
FMRIB's automated segmentation tool
- FMRIB
Oxford Centre for Functional MRI of the Brain
- FSL
FMRIB Software Library
- GM
Grey matter
- MCI
Mild cognitive impairment
- MRI
Magnetic resonance imaging
- NAWM
Normal-appearing white matter
- NIFTI
The Neuroimaging Informatics
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
JPF and RTS developed the method. JPF analyzed the data and wrote the software. RTS supervised the study. JPF and RTS wrote the manuscript with comments from EMS, JM, and CMC. All authors read and approved the final manuscript.
Funding
The research of Shinohara and Crainiceanu was supported by Award Numbers R01NS085211 from the National Institute of Neurological Disorders and Stroke. RTS is partially supported by R01EB017255 from the National Institute for Biomedical Imaging and Bioengineering.
Acknowledgments
We would like to thank Paul Yushkevich and Sandhitsu Das for insightful discussions concerning biomarkers in AD.
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit
References (85)
- et al.
Voxel-based morphometry—the methods
NeuroImage
(2000) - et al.
Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain
Med. Image Anal.
(2008) - et al.
Alzheimer's disease: pathogenesis and prevention
Alzheimers Dement.
(2012) - et al.
Using voxel-based morphometry to map the structural changes associated with rapid conversion in MCI: a longitudinal mri study
NeuroImage
(2005) - et al.
Classifying spatial patterns of brain activity with machine learning methods: application to lie detection
NeuroImage
(2005) - et al.
Combining multivariate voxel selection and support vector machines for mapping and classification of fmri spatial patterns
NeuroImage
(2008) - et al.
Brain morphometry reproducibility in multi-center 3 t mri studies: a comparison of cross-sectional and longitudinal segmentations
NeuroImage
(2013) - et al.
Diffusion tensor imaging and tract-based spatial statistics in alzheimer's disease and mild cognitive impairment
Neurobiol. Aging
(2011) - et al.
Is the signal intensity of cerebrospinal fluid constant? intensity measurements with high and low field magnetic resonance imagers
Magn. Reson. Imaging
(1993) - et al.
Time-series modeling of multiple sclerosis disease activity: a promising window on disease progression and repair potential?
Neurotherapeutics
(2007)
Regionally-specific diffusion tensor imaging in mild cognitive impairment and Alzheimer's disease
NeuroImage
Amygdala atrophy is prominent in early alzheimer's disease and relates to symptom severity
Psychiatry Res.
Australian imaging biomarkers lifestyle flagship study of ageing, and Alzheimer's disease neuroimaging initiative. Statistical normalization techniques for magnetic resonance imaging
Neuroimage Clin.
Advances in functional and structural mr image analysis and implementation as fsl
NeuroImage
Oasis is automated statistical inference for segmentation, with applications to multiple sclerosis lesion segmentation in mri
Neuroimage Clin.
Relating multi-sequence longitudinal intensity profiles and clinical covariates in incident multiple sclerosis lesions
NeuroImage Clin.
Mri t2 hypointensity of the dentate nucleus is related to ambulatory impairment in multiple sclerosis
J. Neurol. Sci.
Neuron loss and shrinkage in the amygdala in alzheimer's disease
Neurobiol. Aging
Antsr: Ants in r
T2 hypointensity in the deep gray matter of patients with multiple sclerosis: a quantitative magnetic resonance imaging study
Arch. Neurol.
Volumetric mri measurements can differentiate Alzheimer's disease, mild cognitive impairment, and normal aging
Int. Psychogeriatr.
Chromatin Immunoprecipitation and High-Density Tiling Microarrays: A Generative Model, Methods for Analysis, and Methodology Assessment in the Absence of a “Gold Standard”
Cognitive impairment is associated with subcortical magnetic resonance imaging grey matter t2 hypointensity in multiple sclerosis
Mult. Scler.
Beyond the hippocampus: Mri volumetry confirms widespread limbic atrophy in ad
Neurology
Disease state prediction from resting state functional connectivity
Magn. Reson. Med.
Mapping gray matter loss with voxel-based morphometry in mild cognitive impairment
Neuroreport
Neuroimaging biomarkers in alzheimer's disease
Prediction of mci to ad conversion, via MRI, csf biomarkers, and pattern classification
Neurobiol. Aging
A comprehensive overview of infinium humanmethylation450 data processing
Brief. Bioinform.
Posterior cingulum white matter disruption and its associations with verbal memory and stroke risk in mild cognitive impairment
J. Alzheimers Dis.
Magnetic resonance imaging of the entorhinal cortex and hippocampus in mild cognitive impairment and Alzheimer's disease
J. Neurol. Neurosurg. Psychiatry
Fronto-temporal-lobe atrophy in early-stage alzheimer's disease identified using an improved detection methodology
Psychiatry Res.
Functional normalization of 450 k methylation array data improves replication in large cancer studies
Genome Biol.
Presymptomatic hippocampal atrophy in alzheimer's disease. a longitudinal mri study
Brain
Using control genes to correct for unwanted variation in microarray data
Biostatistics
Analytic estimation of statistical significance maps for support vector machine based multi-variate image analysis and classification
NeuroImage
Quantitative measurement of tissue damage and recovery within new t2w lesions in pediatric-and adult-onset multiple sclerosis
Mult. Scler. J.
Normalization of white matter intensity on t1-weighted images of patients with acquired central nervous system demyelination
J. Neuroimaging
Profound loss of layer ii entorhinal cortex neurons occurs in very mild alzheimer's disease
J. Neurosci.
Voxel-based mri intensitometry reveals extent of cerebral white matter pathology in amyotrophic lateral sclerosis
Amygdalar volume and psychiatric symptoms in alzheimer's disease: an mri analysis
Acta Neurol. Scand.
Multiple-laboratory comparison of microarray platforms
Nat. Methods
Cited by (0)
- 1
Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf