Technical note
Partial least squares for discrimination in fMRI data

https://doi.org/10.1016/j.mri.2011.11.001Get rights and content

Abstract

Multivariate methods for discrimination were used in the comparison of brain activation patterns between groups of cognitively normal women who are at either high or low Alzheimer's disease risk based on family history and apolipoprotein-E4 status. Linear discriminant analysis (LDA) was preceded by dimension reduction using principal component analysis (PCA), partial least squares (PLS) or a new oriented partial least squares (OrPLS) method. The aim was to identify a spatial pattern of functionally connected brain regions that was differentially expressed by the risk groups and yielded optimal classification accuracy. Multivariate dimension reduction is required prior to LDA when the data contain more feature variables than there are observations on individual subjects. Whereas PCA has been commonly used to identify covariance patterns in neuroimaging data, this approach only identifies gross variability and is not capable of distinguishing among-groups from within-groups variability. PLS and OrPLS provide a more focused dimension reduction by incorporating information on class structure and therefore lead to more parsimonious models for discrimination. Performance was evaluated in terms of the cross-validated misclassification rates. The results support the potential of using functional magnetic resonance imaging as an imaging biomarker or diagnostic tool to discriminate individuals with disease or high risk.

Introduction

Multivariate methods are likely to play an important role in the development of functional magnetic resonance imaging (fMRI) as an imaging biomarker or diagnostic tool to discriminate individuals with disease. The motivation is based on the potential of spatial patterns of functionally connected brain regions conveying more information than do individual regions. For instance, a brain region may not by itself exhibit a significant difference in its level of fMRI activation between, say, a patient group and a control group. Yet, in combination with other brain areas, this region might become important for discrimination as an integral part of a distributed brain network. Neuroimaging data, however, typically have more features than there are observations on individual subjects, causing problems with the use of linear discriminant analysis (LDA). The most common approach has been to use principal component analysis (PCA) as a first step to reduce the dimension of the data [1], [2], [3]. Unfortunately, PCA only identifies gross variability and is not capable of distinguishing among-groups from within-groups variability. Partial least squares (PLS) for focused dimension reduction in discrimination was developed to circumvent this problem by incorporating information on class structure [4]. PLS was first used for spatial pattern analysis of functional brain images by McIntosh et al. [5]. Oriented partial least squares (OrPLS) is a new multivariate technique developed by our group [6]. It has been used in multivariate analysis of fMRI time series data as a way of incorporating information about the underlying experimental paradigm as well as noise and other confounds [7]. In the context of discrimination, OrPLS combines the tuning of PLS toward group separation with the ability to simultaneously orient away from within-group covariability [8], [9].

The emphasis of the present work is on eigenstructure-based dimension reduction, particularly focused dimension reduction as compared to the variance summaries of PCA, in combination with LDA for the classification of multivariate fMRI data from groups of subjects. In a setting of the large variability in level of activation between subjects frequently observed in fMRI studies, PCA for dimension reduction may fail to capture the succinct information on group separation essential for classification within a linear subspace spanned by components accounting for even a large proportion of total variability. Other approaches have used support vector machines (SVMs) for learning and classification of fMRI data in conjunction with an initial step of PCA-based dimension reduction [10]. Also, the literature on genetic algorithms contains a host of routines for variable selection and dimension reduction. As input, we may use region-of-interest (ROI)-, parcellation- [2] or voxel-based [3] functional neuroimaging data. The present work uses a parcellation of the brain into regions based on the Talairach atlas. The aim is to identify a spatial pattern of functionally connected brain regions that is differentially expressed by groups of subjects as measured by the subject-wise discriminant scores and which yields optimal classification accuracy.

Section snippets

Data acquisition

fMRI was used in the present study to observe cortical activation during a confrontation naming task in 13 women with high Alzheimer's disease (AD) risk and 11 with low risk based on family history and apolipoprotein-E4 status [11]. A blocked experimental design was used, and activation to the naming task compared to the rest periods was evaluated by the general linear model. The echo-planar imaging time series data were preprocessed using standard methods including slice timing correction,

Results

Table 1 shows the comparison of principal component and PLS regression as an increasing number of components are retained and included in the model. The individual component scores akTx were used as predictor variables with group membership encoded as an indicator variable being the dependent response measure. Linear regression has been shown to yield the same linear combination (the βk coefficients) of predictor variables as LDA for the two-group problem and offers some further insights [18].

Discussion

Principal component analysis has a long history as the workhorse for multivariate coordinate transformations and dimension reduction in statistical pattern recognition [24]. Yet, the variance summaries of PCA may not provide the best possible approach when discrimination is the goal and dimension reduction is required due to more feature variables than there are observations on individual subjects. We have illustrated here how PCA for dimension reduction of functional neuroimaging data may fail

Acknowledgments

This work was supported by a grant from the National Institute of Neurological Disorders and Stroke (R01-NS036660).

References (27)

  • Y. Liu et al.

    PLS and dimension reduction for classification

    Comp Statist

    (2007)
  • W. Rayens et al.

    Using OrPLS to identify asymptotic women at risk for Alzheimer's disease

    J Chemometrics

    (2008)
  • C.D. Smith et al.

    Altered brain activation in cognitively intact individuals at high risk for Alzheimer's disease

    Neurology

    (1999)
  • Cited by (0)

    View full text