A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer's disease
Introduction
More than 30 million people have a clinical diagnosis of Alzheimer's disease (AD) worldwide, and this number is expected to triple by 2050 (Barnes and Yaffe, 2011). This is due to increased life expectancy and improvements in general health care (Ferri et al., 2005). AD is a form of dementia characterized by β-amyloid peptide deposition and abnormal tau accumulation and phosphorylation which eventually lead to neuronal death and synaptic loss (Murphy and LeVine, 2010). AD-related neurodegeneration follows specific patterns which start from subcortical areas in early disease stages and spread to the cortical mantle in later stages of the disease (Braak and Braak, 1996). The classic clinical hallmark of the most common form of AD (i.e., the amnestic type) is represented by deficits in episodic memory, followed by visuo-spatial impairment, spatio-temporal orientation problems, and eventually frank dementia.
Mild cognitive impairment (MCI) is a broad, ill-defined, and highly heterogeneous phenotypic spectrum which causes relatively less noticeable memory deficits than AD. Around 10%–15% of MCI patients per year convert to AD over a relatively short time (Braak and Braak, 1995; Mitchell and Shiri-Feshki, 2008), although the annual conversion rate tends to progressively diminish. The mean conversion rate from MCI to AD is approximately 4% per year. MCI patients who do not develop AD tend to either remain stable, develop other forms of dementia, or even revert to a ‘healthy’ state, which suggests that MCI is a highly variable and common clinical conundrum which is likely dependent on different etio-pathogenetic mechanisms.
AD-related neuropathology can be identified several years before frank AD clinical manifestation (Braak and Braak, 1996; Delacourte et al., 1999; Morris et al., 1996; Serrano-Pozo et al., 2011; Mosconi et al., 2007), and this suggests that the development of AD might be predicted before clinical onset via in vivo biomarkers (e.g. PET and MR imaging as well as blood or cerebrospinal fluid (CSF) biomarkers) (Markesbery, 2010; Baldacci et al., 2018; Hampel et al., 2018; Teipel et al., 2018). Magnetic resonance imaging (MRI)-based biomarkers have attracted interest in diagnosis of AD as well in predicting MCI to AD conversion because they do not involve the use of ionizing radiation like positron emission tomography (PET), are less expensive that PET, and less invasive than the use of cerebrospinal fluid (CSF) biomarkers. MRI-based indices can also provide multi-modal information regarding the structure and function of the brain within the same scanning session, which is typically advantageous in many clinical settings.
For these reasons, there has been a growing interest in developing computational tools that are able, by using MRI-based measures, to discriminate AD patients from healthy individuals, or, most importantly, to discriminate the patients with stable MCI (sMCI) from those MCI patients who, in contrast, progress and develop AD (pMCI). To these ends, different clinical data and imaging modalities have been used so far with a variable rate of success, including for example, PET (Choi and Jin, 2018; Mosconi et al. 2004, 2007; Shaffer et al., 2013; Young et al., 2013), MRI (Filipovych and Davatzikos, 2011; Moradi et al., 2015; Mosconi et al., 2007; Tong et al., 2017, Young et al., 2013), cognitive testing (Casanova et al., 2011; Moradi et al., 2015), and CSF biomarkers (Davatzikos et al., 2011; Hansson et al., 2006; Riemenschneider et al., 2002; Sonnen et al., 2010). In this context, Moradi et al. (2015) and Tong et al., (2017) were amongst the first to: 1) perform feature selection to extract informative voxels from MRI volumes via regularized logistic regression, and 2) use the extracted voxels, along with cognitive measures, to produce support vector machine (SVM)-based predictions, achieving an area under the Receiver Operating Characteristic (ROC) curve (AUC) between 0.9 and 0.92. Similarly, Hojjati et al. (2017) employed baseline resting state functional MRI data to achieve an AUC of 0.95. In their study, the features were engineered by constructing a brain connectivity matrix which is treated as a graph, and the extracted graph measures represented the input of the SVM.
Most of these earlier studies employ a classification pipeline which relies on two independent steps. First, independent component analysis (ICA) (Shaffer et al., 2013), L1 regularization (Moradi et al., 2015; Tong et al., 2017) or morphometry (Davatzikos et al., 2011; Fan et al., 2007), is used to reduce the dimensionality of the data to a smaller set of descriptive factors. Second, these factors are fed into a multivariate pattern classification algorithm. The dimensionality reduction and classification algorithms are two separate mathematical models which involve different assumptions, and this can result in a loss of relevant information during the classification procedures (Nguyen and Torre, 2010). In addition, the most commonly employed classifiers, such as SVM (Moradi et al., 2015; Hojjati et al., 2017; Tong et al., 2017) and Gaussian Processes (Young et al., 2013), require the use of kernels, or data transformations, which are often chosen from a limited and pre-specified set. This process maps the data to a new space in which it is presumed to be easier to separate. However, constructing or choosing an application-specific kernel that acts as a reasonable similarity measure for the classification task is not always possible or easy to achieve.
The use of two separate, and methodologically disjoint, analytical pipelines as well as the need to construct ad hoc kernels can be avoided by employing deep learning algorithms, which have greater representational flexibility than kernel-based methods and can also automatically “learn” the necessary data transformations that maximize an arbitrary performance metric. Recently, such deep-learning methods have been applied to AD vs. healthy controls classification problems (Hosseini-Asl et al., 2016; Liu et al., 2018; Payan and Montana, 2015) and pMCI vs sMCI classification tasks (Choi and Jin, 2018; Lu et al., 2018a, b). Choi and Jin (2018) and Lu et al. (2018a) have used deep-learning to achieve one of the highest pMCI/sMCI classification performances to-date (∼84%–82% conversion rate accuracies for these studies respectively). Their predictions were based on a single (albeit highly informative) imaging modality (PET). A more formal summary of the recent studies and classification methods is presented in Table 3.
The superior representational capacity of deep-learning methods typically relies on a high number of neural network parameters. Frequently, this can result in data overfitting, i.e. an apparently highly satisfactory training performance which however does not generalize well to unseen samples during testing or when applying the model. Another problem is that the data-scarce nature of medical databases is not typically sufficient to build a useful network architecture.
This study therefore aims to develop a parameter-efficient neural network architecture, based on the most recent convolutional neural network layers (i.e. the 3D separable and grouped convolutions) developed in the computer vision research field. Furthermore, we implement a dual-learning approach which simultaneously learns multi-task classification of pMCI vs. sMCI and AD vs. Health Controls (HC) by combining several input streams such as structural MRI measures as well as demographic, neuropsychological, and APOe4 genetic data (the APOe4 gene polymorphism is the only known genetic risk factor for AD in sporadic cases of AD). This new network design yields superior performance on generic visual discrimination tasks like ImageNet (Russakovsky et al., 2015; Chollet, 2017) while maintaining the number of overall network parameters low to efficiently limit the data-overfitting problem. Finally, we develop a novel feature extractor sub-network and we combine the Tensorflow (Abadi et al., 2016) and Keras (Chollet et al., 2015) libraries with our own implementation of 3D separable convolutions which is freely available at https://github.com/simeon-spasov/MCI.
Section snippets
Participants and data
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the
Data preprocessing
Prior to classification, all T1 weighted (T1w) images were registered to a common space (i.e. T1 template). In detail, two different T1 templates were used in order to assess the robustness of our classification methodology to coregistration inaccuracies. First, we built a custom T1 template specific to this study. To this end, we employed all T1w images, which (after N4 bias field correction) were nonlinearly co-registered to each other and averaged iteratively (i.e. the group average was
Architecture overview
A high-level overview of the network design is shown in Fig. 1. In this paper, we developed a feature extractor sub-network (referred to as the multi-modal feature extractor in Fig. 1), inspired by the parameter-efficient separable and grouped convolutional layers presented in AlexNet (Krizhevsky et al., 2012) and Xception (Chollet, 2017; Velickovic et al., 2016). In detail, the layers of the feature extractor are shared between two tasks - MCI-to-AD conversion prediction and AD/HC
Implementation
All experiments were conducted using python version 2.7.12. The neural network was built with the Keras deep learning library using TensorFlow as backend. TensorFlow, which is developed and supported by Google, is an open-source package for numerical computation with high popularity in the deep learning community. The library allows for easy deployment on multiple graphic processing units (GPUs) (CPU-based experimentation would be prohibitive because of time constraints). The Keras wrapper
Performance evaluation
For the evaluation of the classifier, we repeated the sampling strategy to divide the samples in training, validation and test set splits. Since we have 32 samples more in the MCI dataset (16 for pMCI and 16 for sMCI) as compared to the AD/HC dataset, we used these 32 MCI subjects for testing purposes by randomly sampling 16 subjects from the pMCI and sMCI groups. The validation set comprised roughly 10% of the remaining dataset (36 subjects from MCI and AD/HC respectively) and was also
Results
Firstly, we consider the classification performance of our network on four different input biomarker combinations. The four input combinations are: 1) clinical features and T1w MRI images; 2) clinical features and Jacobian Determinant images; 3) clinical features and atlas-masked T1w MRI images; and 4) clinical features, Jacobian Determinant and T1w MRI images. We performed all of these experiments in custom template space. In order to assess the robustness of the neural network model to MRI
Discussion
Deep-learning algorithms extract a hierarchy of features from the input data via flexible and non-linear transformations. These new data representations are learnt in a manner that maximizes an arbitrary performance metric, for example binary cross-entropy. Hence, instead of relying on a priori knowledge or dimensionality reduction algorithms which might result in non-optimal feature selection, deep-learning uses the gradient in the performance metric to directly guide feature extraction, which
Acknowledgements
Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech;
References (52)
- et al.
The optimal template effect in hippocampus studies of diseased populations
Neuroimage
(2010) - et al.
A reproducible evaluation of ANTs similarity metric performance in brain image registration
Neuroimage
(2011) - et al.
The projected effect of risk factor reduction on Alzheimer's disease prevalence
Lancet Neurol.
(2011) - et al.
Classification of Alzheimer's disease and prediction of mild cognitive impairment-to-Alzheimer’s conversion from structural magnetic resource imaging using feature ranking and a genetic algorithm
Comput. Biol. Med.
(2017) - et al.
Staging of alzheimer's disease-related neurofibrillary changes
Neurobiol. Aging
(1995) - et al.
Predicting cognitive decline with deep learning of brain metabolism and amyloid imaging
Behav. Brain Res.
(2018) - et al.
Global prevalence of dementia: a Delphi consensus study
Lancet
(2005) - et al.
Semi-supervised pattern classification of medical images: application to mild cognitive impairment (MCI)
Neuroimage
(2011) - et al.
Alzheimer's disease biomarker-guided diagnostic workflow using the added value of six combined cerebrospinal fluid candidates: aβ 1–42 , total-tau, phosphorylated-tau, NFL, neurogranin, and YKL-40
Alzheimer's Dementia
(2018) - et al.
Association between CSF biomarkers and incipient Alzheimer's disease in patients with mild cognitive impairment: a follow-up study
Lancet Neurol.
(2006)
Predicting conversion from MCI to AD using resting-state fMRI, graph theoretical approach and SVM
J. Neurosci. Methods
FSL. NeuroImage
Multiscale deep neural network based analysis of FDG-PET images for the early diagnosis of Alzheimer's disease
Med. Image Anal.
Machine learning framework for early MRI-based Alzheimer's conversion prediction in MCI subjects
Neuroimage
Early detection of Alzheimer's disease using neuroimaging
Exp. Gerontol.
Optimal feature selection for support vector machines
Pattern Recogn.
Accurate multimodal probabilistic prediction of conversion to Alzheimer's disease in patients with mild cognitive impairment
Neuroimage: Clinical.
TensorFlow: a system for large-scale machine learning
Blood-based biomarker screening with agnostic biological definitions for an accurate diagnosis within the dimensional spectrum of neurodegenerative diseases
Development of Alzheimer-related neurofibrillary changes in the neocortex inversely recapitulates cortical myelogenesis
Acta Neuropathol.
High dimensional classification of structural MRI Alzheimer?s disease data based on large scale regularization
Front. Neuroinf.
Xception: deep learning with depthwise separable convolutions
Keras
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern classification
Neurobiol. Aging
The biochemical pathway of neurofibrillary degeneration in aging and Alzheimer's disease
Neurology
Cited by (282)
Multi-modality approaches for medical support systems: A systematic review of the last decade
2024, Information FusionA hierarchical attention-based multimodal fusion framework for predicting the progression of Alzheimer's disease
2024, Biomedical Signal Processing and ControlTowards interpretable imaging genomics analysis: Methodological developments and applications
2024, Information Fusion
- 1
these authors contributed equally to this publication.
- 2
Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.