Elsevier

NeuroImage

Volume 132, 15 May 2016, Pages 198-212
NeuroImage

Removing inter-subject technical variability in magnetic resonance imaging studies

https://doi.org/10.1016/j.neuroimage.2016.02.036Get rights and content

Highlights

  • Between-scan unwanted variation is still present after intensity normalization.

  • We propose a scan-effect removal tool for removing post-normalization artifacts.

  • We model the unwanted variability between MRI T1-w scans using CSF region.

  • We use a large subset of the ADNI database to showcase our method.

  • We show that our method improves replicability of the voxels associated with AD.

  • MRI intensities corrected by our method improves prediction of AD and MCI.

Abstract

Magnetic resonance imaging (MRI) intensities are acquired in arbitrary units, making scans non-comparable across sites and between subjects. Intensity normalization is a first step for the improvement of comparability of the images across subjects. However, we show that unwanted inter-scan variability associated with imaging site, scanner effect, and other technical artifacts is still present after standard intensity normalization in large multi-site neuroimaging studies. We propose RAVEL (Removal of Artificial Voxel Effect by Linear regression), a tool to remove residual technical variability after intensity normalization. As proposed by SVA and RUV [Leek and Storey, 2007, 2008, Gagnon-Bartsch and Speed, 2012], two batch effect correction tools largely used in genomics, we decompose the voxel intensities of images registered to a template into a biological component and an unwanted variation component. The unwanted variation component is estimated from a control region obtained from the cerebrospinal fluid (CSF), where intensities are known to be unassociated with disease status and other clinical covariates. We perform a singular value decomposition (SVD) of the control voxels to estimate factors of unwanted variation. We then estimate the unwanted factors using linear regression for every voxel of the brain and take the residuals as the RAVEL-corrected intensities. We assess the performance of RAVEL using T1-weighted (T1-w) images from more than 900 subjects with Alzheimer's disease (AD) and mild cognitive impairment (MCI), as well as healthy controls from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. We compare RAVEL to two intensity-normalization-only methods: histogram matching and White Stripe. We show that RAVEL performs best at improving the replicability of the brain regions that are empirically found to be most associated with AD, and that these regions are significantly more present in structures impacted by AD (hippocampus, amygdala, parahippocampal gyrus, enthorinal area, and fornix stria terminals). In addition, we show that the RAVEL-corrected intensities have the best performance in distinguishing between MCI subjects and healthy subjects using the mean hippocampal intensity (AUC = 67%), a marked improvement compared to results from intensity normalization alone (AUC = 63% and 59% for histogram matching and White Stripe, respectively). RAVEL is promising for many other imaging modalities.

Introduction

In recent years, there has been an increase in the number of multi-site neuroimaging studies, including the Human Connectome Project (HCP), the Alzheimer's Disease Neuroimaging Initiative (ADNI), and the Australian Imaging, Biomarkers and Lifestyle Flagship Study of Aging (AIBL). In structural magnetic resonance imaging (MRI) studies, larger samples of subjects yield more power to detect structural variations in different subgroups, for example, changes in the hippocampal volume associated with Alzheimer's disease (AD) and mild cognitive impairment (MCI). However, because MRI intensities are acquired in arbitrary units, it has often been found that the differences in MRI intensities between scanning parameters and studies are larger than the biological differences observed in these images. For instance, (Shinohara et al. (2014) shows that in the ADNI and AIBL studies, which have highly standardized protocols, striking differences in the raw intensities are observed between imaging sites.

Since the raw image intensities are non-comparable across sites and between subjects, intensity normalization is paramount before performing between-subject intensity comparisons at the voxel level. While intensity normalization is not as important in other applications such as morphometry and brain volumetrics (Ashburner and Friston, 2000, Jovicich et al., 2013), it is essential for analyzing change in intensities within an MRI volume over time (Ghassemi et al., 2015a, Sweeney et al., 2016), developing intensity-based biomarkers (Chong and Lim, 2009, Meier et al., 2007, Vardhan et al., 2014) and for regression analyses at the voxel level (Hartung et al., 2014, Smith et al., 2004). The challenge of intensity normalization has been largely addressed in the literature (Jager et al., 2006, Leung et al., 2010, Madabhushi et al., 2006, Nyúl and Udupa, 1999, Nyúl et al., 2000, Shinohara et al., 2011, Shinohara et al., 2014, Weisenfeld and Warfield, 2004), with several methods reviewed in (Shah et al., 2011). Recently, a novel intensity normalization method, called White Stripe (Shinohara et al., 2014), was developed to bring raw image intensities to a biologically interpretable intensity scale. The method applies a z-score transformation to the whole brain using parameters estimated from a latent subdistribution of normal-appearing white matter (NAWM). The use of NAWM for normalization makes the method suitable for many studies of brain abnormalities, as in the case of multiple sclerosis (MS) lesions. While the method has been shown to make the white matter (WM) comparable across subjects, it was noted that residual across-subject variability was still present in the grey matter (GM).

In this work, we investigate between-scan technical variability that is left uncorrected by intensity normalization. We show that while common intensity normalization methods successfully correct for global intensity shifts associated with scanner site, substantial between-scan technical variation remains. This technical variation can be due to scanning parameters, scanner manufacturers, scanner field strength, and other factors. We refer to any post-normalization inter-scan variation that is not biological in nature as a “scan effect.”

To correct for scan effects, we propose Removal of Artificial Voxel Effect by Linear regression (RAVEL). RAVEL is a tool for removing unwanted variation present after intensity normalization. RAVEL is inspired by the batch effect correction tools SVA (Leek and Storey, 2007, Leek and Storey, 2008) and RUV (Gagnon-Bartsch and Speed, 2012) used broadly in genomics. In the analysis of gene expression and other genomic data, residual noise after intensity normalization is referred to as batch effects, because experiments are often performed in batches run on different dates. If not accounted for, batch effects have been shown to lead to spurious associations (Leek et al., 2010). To make a parallel with brain-imaging studies, batch effects are comparable to scan effects, where a single scan plays the role of a batch.

We use the linear model introduced in (Leek and Storey, 2007) to decompose the variation of the normalized intensities into a biological component of interest (variation associated with clinical covariates) and an unknown, unwanted variation component to be estimated from the data. The unwanted variation component encapsulates both technical variation and biological variation that is not of interest in the study. We register the different scans to a common template to allow the use of voxel-wise linear models, and estimate the unwanted variation component from regions of the brain that are not expected to be associated with the clinical covariates of interest. This follows the methodology of the RUV batch effect correction tool (Gagnon-Bartsch and Speed, 2012) which was later discussed in Leek (2014) for RNA sequencing. Unlike intensity-normalization methods, RAVEL utilizes all images in the study to leverage information about unwanted variability. Here, we use voxels that are consistently labelled as cerebrospinal fluid (CSF) across subjects as a control region; these voxels are not expected to be associated with disease (Luoma et al., 1993).

We evaluate the performance of RAVEL using a large subset of the ADNI database consisting of more than 900 subjects. We demonstrate our method by using the T1-weighted (T1-w) images from subjects with AD and MCI, as well as healthy controls. We follow the work of Fortin et al. (2014) to benchmark RAVEL against two intensity normalization procedures without any scan effect correction: the popular histogram matching algorithm and White Stripe. We focus on showing that RAVEL improves the replicability of the biological findings. Critically, we show that a reduction of technical variation does not result in removing biological variability. Namely, making intensity densities more similar does not necessarily improve sensitivity to biological changes; on the contrary, overmatching of distributions can result in the removal of biologically relevant signal. To show improvement in terms of biological findings, we first demonstrate that the top voxels associated with AD in the RAVEL-corrected dataset are more replicable across independent subsets of subjects. We measure the replicability of the results by randomly splitting the ADNI dataset into discovery and validation cohorts multiple times. Then, we show that the top voxels associated with AD after RAVEL correction are more enriched for brain regions known to undergo structural changes in AD. Finally, we show that the average hippocampal intensity after RAVEL correction performs better than intensity-normalized-only images in discriminating between AD patients and healthy controls, and between MCI patients and healthy controls. This shows that RAVEL-corrected T1-w intensities are more biologically meaningful than intensity-normalized-only images for group comparisons, and also potentially promising for the development of biomarkers.

Section snippets

Study population

Our dataset consists of a subset of 917 subjects downloaded from the ADNI database (adni.loni.usc.edu). For each subject, we selected a study visit at random. We obtained 506, 184, and 227 subjects from the ADNI, ADNI-2, and ADNI-GO phases, respectively. We present summary statistics of the study population in Table 1. The selected scans were acquired at 83 different imaging sites, with a median number of 10 patients per site. The scans are also well-balanced for disease status across sites.

Results

We compared RAVEL to three normalization strategies: raw image intensities (no normalization), White Stripe (Shinohara et al., 2014), and histogram matching (Shah et al., 2011).

Discussion

In this work, we have presented the scan effect correction tool RAVEL, to correct for inter-scan unwanted variability in MRI studies that is present after intensity normalization. We have shown that RAVEL, applied after normalizing the intensities with White Stripe, substantially improves the replicability of the regions of the brain found to be the most associated with AD. RAVEL, inspired by the batch effect correction tools SVA and RUV, infers the unwanted variation in the images by using

Abbreviations

    AD

    Alzheimer's disease

    ADNI

    Alzheimer's Disease Neuroimaging Initiative

    ANTs

    Advanced normalization tools

    AUC

    Area under the curve

    BET

    Brain extraction tool

    CAT

    Concordance at the top

    CBGM

    Cerebellar gray matter

    CSF

    Cerebrospinal fluid

    DTI

    Diffusion tensor imaging

    FAST

    FMRIB's automated segmentation tool

    FMRIB

    Oxford Centre for Functional MRI of the Brain

    FSL

    FMRIB Software Library

    GM

    Grey matter

    MCI

    Mild cognitive impairment

    MRI

    Magnetic resonance imaging

    NAWM

    Normal-appearing white matter

    NIFTI

    The Neuroimaging Informatics

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

JPF and RTS developed the method. JPF analyzed the data and wrote the software. RTS supervised the study. JPF and RTS wrote the manuscript with comments from EMS, JM, and CMC. All authors read and approved the final manuscript.

Funding

The research of Shinohara and Crainiceanu was supported by Award Numbers R01NS085211 from the National Institute of Neurological Disorders and Stroke. RTS is partially supported by R01EB017255 from the National Institute for Biomedical Imaging and Bioengineering.

Acknowledgments

We would like to thank Paul Yushkevich and Sandhitsu Das for insightful discussions concerning biomarkers in AD.

Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit

References (85)

  • M.M. Mielke et al.

    Regionally-specific diffusion tensor imaging in mild cognitive impairment and Alzheimer's disease

    NeuroImage

    (2009)
  • Stéphane P. Poulin et al.

    Amygdala atrophy is prominent in early alzheimer's disease and relates to symptom severity

    Psychiatry Res.

    (2011)
  • Russell T. Shinohara et al.

    Australian imaging biomarkers lifestyle flagship study of ageing, and Alzheimer's disease neuroimaging initiative. Statistical normalization techniques for magnetic resonance imaging

    Neuroimage Clin.

    (2014)
  • Stephen M. Smith et al.

    Advances in functional and structural mr image analysis and implementation as fsl

    NeuroImage

    (2004)
  • Elizabeth M. Sweeney et al.

    Oasis is automated statistical inference for segmentation, with applications to multiple sclerosis lesion segmentation in mri

    Neuroimage Clin.

    (2013)
  • Elizabeth M. Sweeney et al.

    Relating multi-sequence longitudinal intensity profiles and clinical covariates in incident multiple sclerosis lesions

    NeuroImage Clin.

    (2016)
  • C.W. Tjoa et al.

    Mri t2 hypointensity of the dentate nucleus is related to ambulatory impairment in multiple sclerosis

    J. Neurol. Sci.

    (2005)
  • T.H. Vereecken et al.

    Neuron loss and shrinkage in the amygdala in alzheimer's disease

    Neurobiol. Aging

    (1994)
  • Brian B. Avants et al.

    Antsr: Ants in r

  • Rohit Bakshi et al.

    T2 hypointensity in the deep gray matter of patients with multiple sclerosis: a quantitative magnetic resonance imaging study

    Arch. Neurol.

    (2002)
  • Cássio M.C. Bottino et al.

    Volumetric mri measurements can differentiate Alzheimer's disease, mild cognitive impairment, and normal aging

    Int. Psychogeriatr.

    (2002)
  • Richard Walter Bourgon

    Chromatin Immunoprecipitation and High-Density Tiling Microarrays: A Generative Model, Methods for Analysis, and Methodology Assessment in the Absence of a “Gold Standard”

    (2006)
  • S.D. Brass et al.

    Cognitive impairment is associated with subcortical magnetic resonance imaging grey matter t2 hypointensity in multiple sclerosis

    Mult. Scler.

    (2006)
  • D.J. Callen et al.

    Beyond the hippocampus: Mri volumetry confirms widespread limbic atrophy in ad

    Neurology

    (2001)
  • R. Cameron Craddock et al.

    Disease state prediction from resting state functional connectivity

    Magn. Reson. Med.

    (2009)
  • Gaël Chételat et al.

    Mapping gray matter loss with voxel-based morphometry in mild cognitive impairment

    Neuroreport

    (2002)
  • Mei Sian Chong et al.

    Neuroimaging biomarkers in alzheimer's disease

  • Christos Davatzikos et al.

    Prediction of mci to ad conversion, via MRI, csf biomarkers, and pattern classification

    Neurobiol. Aging

    (2011)
  • Sarah Dedeurwaerder et al.

    A comprehensive overview of infinium humanmethylation450 data processing

    Brief. Bioinform.

    (2014)
  • Lisa Delano-Wood et al.

    Posterior cingulum white matter disruption and its associations with verbal memory and stroke risk in mild cognitive impairment

    J. Alzheimers Dis.

    (2012)
  • A.T. Du et al.

    Magnetic resonance imaging of the entorhinal cortex and hippocampus in mild cognitive impairment and Alzheimer's disease

    J. Neurol. Neurosurg. Psychiatry

    (2001)
  • Tom F.D. Farrow et al.

    Fronto-temporal-lobe atrophy in early-stage alzheimer's disease identified using an improved detection methodology

    Psychiatry Res.

    (2007)
  • Jean-Philippe Fortin et al.

    Functional normalization of 450 k methylation array data improves replication in large cancer studies

    Genome Biol.

    (2014)
  • N.C. Fox et al.

    Presymptomatic hippocampal atrophy in alzheimer's disease. a longitudinal mri study

    Brain

    (1996)
  • J.A. Gagnon-Bartsch et al.

    Using control genes to correct for unwanted variation in microarray data

    Biostatistics

    (2012)
  • Bilwaj Gaonkar et al.

    Analytic estimation of statistical significance maps for support vector machine based multi-variate image analysis and classification

    NeuroImage

    (2013)
  • Rezwan Ghassemi et al.

    Quantitative measurement of tissue damage and recovery within new t2w lesions in pediatric-and adult-onset multiple sclerosis

    Mult. Scler. J.

    (2015)
  • Rezwan Ghassemi et al.

    Normalization of white matter intensity on t1-weighted images of patients with acquired central nervous system demyelination

    J. Neuroimaging

    (2015)
  • T. Gómez-Isla et al.

    Profound loss of layer ii entorhinal cortex neurons occurs in very mild alzheimer's disease

    J. Neurosci.

    (1996)
  • Viktor Hartung et al.

    Voxel-based mri intensitometry reveals extent of cerebral white matter pathology in amyotrophic lateral sclerosis

    (2014)
  • D. Hornek et al.

    Amygdalar volume and psychiatric symptoms in alzheimer's disease: an mri analysis

    Acta Neurol. Scand.

    (2006)
  • Rafael A. Irizarry et al.

    Multiple-laboratory comparison of microarray platforms

    Nat. Methods

    (2005)
  • Cited by (0)

    1

    Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf

    View full text