Elsevier

NeuroImage

Volume 189, 1 April 2019, Pages 276-287
NeuroImage

A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer's disease

https://doi.org/10.1016/j.neuroimage.2019.01.031Get rights and content

Abstract

Some forms of mild cognitive impairment (MCI) are the clinical precursors of Alzheimer's disease (AD), while other MCI types tend to remain stable over-time and do not progress to AD. To identify and choose effective and personalized strategies to prevent or slow the progression of AD, we need to develop objective measures that are able to discriminate the MCI patients who are at risk of AD from those MCI patients who have less risk to develop AD. Here, we present a novel deep learning architecture, based on dual learning and an ad hoc layer for 3D separable convolutions, which aims at identifying MCI patients who have a high likelihood of developing AD within 3 years.

Our deep learning procedures combine structural magnetic resonance imaging (MRI), demographic, neuropsychological, and APOe4 genetic data as input measures. The most novel characteristics of our machine learning model compared to previous ones are the following: 1) our deep learning model is multi-tasking, in the sense that it jointly learns to simultaneously predict both MCI to AD conversion as well as AD vs. healthy controls classification, which facilitates relevant feature extraction for AD prognostication; 2) the neural network classifier employs fewer parameters than other deep learning architectures which significantly limits data-overfitting (we use ∼550,000 network parameters, which is orders of magnitude lower than other network designs); 3) both structural MRI images and their warp field characteristics, which quantify local volumetric changes in relation to the MRI template, were used as separate input streams to extract as much information as possible from the MRI data. All analyses were performed on a subset of the database made publicly available via the Alzheimer's Disease Neuroimaging Initiative (ADNI), (n = 785 participants, n = 192 AD patients, n = 409 MCI patients (including both MCI patients who convert to AD and MCI patients who do not covert to AD), and n = 184 healthy controls).

The most predictive combination of inputs were the structural MRI images and the demographic, neuropsychological, and APOe4 data. In contrast, the warp field metrics were of little added predictive value. The algorithm was able to distinguish the MCI patients developing AD within 3 years from those patients with stable MCI over the same time-period with an area under the curve (AUC) of 0.925 and a 10-fold cross-validated accuracy of 86%, a sensitivity of 87.5%, and specificity of 85%. To our knowledge, this is the highest performance achieved so far using similar datasets. The same network provided an AUC of 1 and 100% accuracy, sensitivity, and specificity when classifying patients with AD from healthy controls. Our classification framework was also robust to the use of different co-registration templates and potentially irrelevant features/image portions.

Our approach is flexible and can in principle integrate other imaging modalities, such as PET, and diverse other sets of clinical data. The convolutional framework is potentially applicable to any 3D image dataset and gives the flexibility to design a computer-aided diagnosis system targeting the prediction of several medical conditions and neuropsychiatric disorders via multi-modal imaging and tabular clinical data.

Introduction

More than 30 million people have a clinical diagnosis of Alzheimer's disease (AD) worldwide, and this number is expected to triple by 2050 (Barnes and Yaffe, 2011). This is due to increased life expectancy and improvements in general health care (Ferri et al., 2005). AD is a form of dementia characterized by β-amyloid peptide deposition and abnormal tau accumulation and phosphorylation which eventually lead to neuronal death and synaptic loss (Murphy and LeVine, 2010). AD-related neurodegeneration follows specific patterns which start from subcortical areas in early disease stages and spread to the cortical mantle in later stages of the disease (Braak and Braak, 1996). The classic clinical hallmark of the most common form of AD (i.e., the amnestic type) is represented by deficits in episodic memory, followed by visuo-spatial impairment, spatio-temporal orientation problems, and eventually frank dementia.

Mild cognitive impairment (MCI) is a broad, ill-defined, and highly heterogeneous phenotypic spectrum which causes relatively less noticeable memory deficits than AD. Around 10%–15% of MCI patients per year convert to AD over a relatively short time (Braak and Braak, 1995; Mitchell and Shiri-Feshki, 2008), although the annual conversion rate tends to progressively diminish. The mean conversion rate from MCI to AD is approximately 4% per year. MCI patients who do not develop AD tend to either remain stable, develop other forms of dementia, or even revert to a ‘healthy’ state, which suggests that MCI is a highly variable and common clinical conundrum which is likely dependent on different etio-pathogenetic mechanisms.

AD-related neuropathology can be identified several years before frank AD clinical manifestation (Braak and Braak, 1996; Delacourte et al., 1999; Morris et al., 1996; Serrano-Pozo et al., 2011; Mosconi et al., 2007), and this suggests that the development of AD might be predicted before clinical onset via in vivo biomarkers (e.g. PET and MR imaging as well as blood or cerebrospinal fluid (CSF) biomarkers) (Markesbery, 2010; Baldacci et al., 2018; Hampel et al., 2018; Teipel et al., 2018). Magnetic resonance imaging (MRI)-based biomarkers have attracted interest in diagnosis of AD as well in predicting MCI to AD conversion because they do not involve the use of ionizing radiation like positron emission tomography (PET), are less expensive that PET, and less invasive than the use of cerebrospinal fluid (CSF) biomarkers. MRI-based indices can also provide multi-modal information regarding the structure and function of the brain within the same scanning session, which is typically advantageous in many clinical settings.

For these reasons, there has been a growing interest in developing computational tools that are able, by using MRI-based measures, to discriminate AD patients from healthy individuals, or, most importantly, to discriminate the patients with stable MCI (sMCI) from those MCI patients who, in contrast, progress and develop AD (pMCI). To these ends, different clinical data and imaging modalities have been used so far with a variable rate of success, including for example, PET (Choi and Jin, 2018; Mosconi et al. 2004, 2007; Shaffer et al., 2013; Young et al., 2013), MRI (Filipovych and Davatzikos, 2011; Moradi et al., 2015; Mosconi et al., 2007; Tong et al., 2017, Young et al., 2013), cognitive testing (Casanova et al., 2011; Moradi et al., 2015), and CSF biomarkers (Davatzikos et al., 2011; Hansson et al., 2006; Riemenschneider et al., 2002; Sonnen et al., 2010). In this context, Moradi et al. (2015) and Tong et al., (2017) were amongst the first to: 1) perform feature selection to extract informative voxels from MRI volumes via regularized logistic regression, and 2) use the extracted voxels, along with cognitive measures, to produce support vector machine (SVM)-based predictions, achieving an area under the Receiver Operating Characteristic (ROC) curve (AUC) between 0.9 and 0.92. Similarly, Hojjati et al. (2017) employed baseline resting state functional MRI data to achieve an AUC of 0.95. In their study, the features were engineered by constructing a brain connectivity matrix which is treated as a graph, and the extracted graph measures represented the input of the SVM.

Most of these earlier studies employ a classification pipeline which relies on two independent steps. First, independent component analysis (ICA) (Shaffer et al., 2013), L1 regularization (Moradi et al., 2015; Tong et al., 2017) or morphometry (Davatzikos et al., 2011; Fan et al., 2007), is used to reduce the dimensionality of the data to a smaller set of descriptive factors. Second, these factors are fed into a multivariate pattern classification algorithm. The dimensionality reduction and classification algorithms are two separate mathematical models which involve different assumptions, and this can result in a loss of relevant information during the classification procedures (Nguyen and Torre, 2010). In addition, the most commonly employed classifiers, such as SVM (Moradi et al., 2015; Hojjati et al., 2017; Tong et al., 2017) and Gaussian Processes (Young et al., 2013), require the use of kernels, or data transformations, which are often chosen from a limited and pre-specified set. This process maps the data to a new space in which it is presumed to be easier to separate. However, constructing or choosing an application-specific kernel that acts as a reasonable similarity measure for the classification task is not always possible or easy to achieve.

The use of two separate, and methodologically disjoint, analytical pipelines as well as the need to construct ad hoc kernels can be avoided by employing deep learning algorithms, which have greater representational flexibility than kernel-based methods and can also automatically “learn” the necessary data transformations that maximize an arbitrary performance metric. Recently, such deep-learning methods have been applied to AD vs. healthy controls classification problems (Hosseini-Asl et al., 2016; Liu et al., 2018; Payan and Montana, 2015) and pMCI vs sMCI classification tasks (Choi and Jin, 2018; Lu et al., 2018a, b). Choi and Jin (2018) and Lu et al. (2018a) have used deep-learning to achieve one of the highest pMCI/sMCI classification performances to-date (∼84%–82% conversion rate accuracies for these studies respectively). Their predictions were based on a single (albeit highly informative) imaging modality (PET). A more formal summary of the recent studies and classification methods is presented in Table 3.

The superior representational capacity of deep-learning methods typically relies on a high number of neural network parameters. Frequently, this can result in data overfitting, i.e. an apparently highly satisfactory training performance which however does not generalize well to unseen samples during testing or when applying the model. Another problem is that the data-scarce nature of medical databases is not typically sufficient to build a useful network architecture.

This study therefore aims to develop a parameter-efficient neural network architecture, based on the most recent convolutional neural network layers (i.e. the 3D separable and grouped convolutions) developed in the computer vision research field. Furthermore, we implement a dual-learning approach which simultaneously learns multi-task classification of pMCI vs. sMCI and AD vs. Health Controls (HC) by combining several input streams such as structural MRI measures as well as demographic, neuropsychological, and APOe4 genetic data (the APOe4 gene polymorphism is the only known genetic risk factor for AD in sporadic cases of AD). This new network design yields superior performance on generic visual discrimination tasks like ImageNet (Russakovsky et al., 2015; Chollet, 2017) while maintaining the number of overall network parameters low to efficiently limit the data-overfitting problem. Finally, we develop a novel feature extractor sub-network and we combine the Tensorflow (Abadi et al., 2016) and Keras (Chollet et al., 2015) libraries with our own implementation of 3D separable convolutions which is freely available at https://github.com/simeon-spasov/MCI.

Section snippets

Participants and data

Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the

Data preprocessing

Prior to classification, all T1 weighted (T1w) images were registered to a common space (i.e. T1 template). In detail, two different T1 templates were used in order to assess the robustness of our classification methodology to coregistration inaccuracies. First, we built a custom T1 template specific to this study. To this end, we employed all T1w images, which (after N4 bias field correction) were nonlinearly co-registered to each other and averaged iteratively (i.e. the group average was

Architecture overview

A high-level overview of the network design is shown in Fig. 1. In this paper, we developed a feature extractor sub-network (referred to as the multi-modal feature extractor in Fig. 1), inspired by the parameter-efficient separable and grouped convolutional layers presented in AlexNet (Krizhevsky et al., 2012) and Xception (Chollet, 2017; Velickovic et al., 2016). In detail, the layers of the feature extractor are shared between two tasks - MCI-to-AD conversion prediction and AD/HC

Implementation

All experiments were conducted using python version 2.7.12. The neural network was built with the Keras deep learning library using TensorFlow as backend. TensorFlow, which is developed and supported by Google, is an open-source package for numerical computation with high popularity in the deep learning community. The library allows for easy deployment on multiple graphic processing units (GPUs) (CPU-based experimentation would be prohibitive because of time constraints). The Keras wrapper

Performance evaluation

For the evaluation of the classifier, we repeated the sampling strategy to divide the samples in training, validation and test set splits. Since we have 32 samples more in the MCI dataset (16 for pMCI and 16 for sMCI) as compared to the AD/HC dataset, we used these 32 MCI subjects for testing purposes by randomly sampling 16 subjects from the pMCI and sMCI groups. The validation set comprised roughly 10% of the remaining dataset (36 subjects from MCI and AD/HC respectively) and was also

Results

Firstly, we consider the classification performance of our network on four different input biomarker combinations. The four input combinations are: 1) clinical features and T1w MRI images; 2) clinical features and Jacobian Determinant images; 3) clinical features and atlas-masked T1w MRI images; and 4) clinical features, Jacobian Determinant and T1w MRI images. We performed all of these experiments in custom template space. In order to assess the robustness of the neural network model to MRI

Discussion

Deep-learning algorithms extract a hierarchy of features from the input data via flexible and non-linear transformations. These new data representations are learnt in a manner that maximizes an arbitrary performance metric, for example binary cross-entropy. Hence, instead of relying on a priori knowledge or dimensionality reduction algorithms which might result in non-optimal feature selection, deep-learning uses the gradient in the performance metric to directly guide feature extraction, which

Acknowledgements

Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech;

References (52)

  • S.H. Hojjati et al.

    Predicting conversion from MCI to AD using resting-state fMRI, graph theoretical approach and SVM

    J. Neurosci. Methods

    (2017)
  • M. Jenkinson et al.

    FSL. NeuroImage

    (2012)
  • D. Lu et al.

    Multiscale deep neural network based analysis of FDG-PET images for the early diagnosis of Alzheimer's disease

    Med. Image Anal.

    (2018)
  • E. Moradi et al.

    Machine learning framework for early MRI-based Alzheimer's conversion prediction in MCI subjects

    Neuroimage

    (2015)
  • L. Mosconi et al.

    Early detection of Alzheimer's disease using neuroimaging

    Exp. Gerontol.

    (2007)
  • M.H. Nguyen et al.

    Optimal feature selection for support vector machines

    Pattern Recogn.

    (2010)
  • J. Young et al.

    Accurate multimodal probabilistic prediction of conversion to Alzheimer's disease in patients with mild cognitive impairment

    Neuroimage: Clinical.

    (2013)
  • M. Abadi

    TensorFlow: a system for large-scale machine learning

  • F. Baldacci et al.

    Blood-based biomarker screening with agnostic biological definitions for an accurate diagnosis within the dimensional spectrum of neurodegenerative diseases

  • H. Braak et al.

    Development of Alzheimer-related neurofibrillary changes in the neocortex inversely recapitulates cortical myelogenesis

    Acta Neuropathol.

    (1996)
  • R. Casanova et al.

    High dimensional classification of structural MRI Alzheimer?s disease data based on large scale regularization

    Front. Neuroinf.

    (2011)
  • F. Chollet

    Xception: deep learning with depthwise separable convolutions

  • Chollet

    Keras

  • Djork-Arné Clevert et al.

    Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

    (2015)
  • C. Davatzikos et al.

    Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern classification

    Neurobiol. Aging

    (2011)
  • A. Delacourte et al.

    The biochemical pathway of neurofibrillary degeneration in aging and Alzheimer's disease

    Neurology

    (1999)
  • Cited by (282)

    View all citing articles on Scopus
    1

    these authors contributed equally to this publication.

    2

    Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

    View full text