Elsevier

NeuroImage

Volume 57, Issue 2, 15 July 2011, Pages 416-422
NeuroImage

A comparison of different automated methods for the detection of white matter lesions in MRI data

https://doi.org/10.1016/j.neuroimage.2011.04.053Get rights and content

Abstract

White matter hyperintensities (WMH) are the focus of intensive research and have been linked to cognitive impairment and depression in the elderly. Cumbersome manual outlining procedures make research on WMH labour intensive and prone to subjective bias.

This study compares fully automated supervised detection methods that learn to identify WMH from manual examples against unsupervised approaches on the combination of FLAIR and T1 weighted images. Data were collected from ten subjects with mild cognitive impairment and another set of ten individuals who fulfilled diagnostic criteria for dementia. Data were split into balanced groups to create a training set used to optimize the different methods. Manual outlining served as gold standard to evaluate performance of the automated methods that identified each voxel either as intact or as part of a WMH.

Otsu's approach for multiple thresholds which is based only on voxel intensities of the FLAIR image produced a high number of false positives at grey matter boundaries. Performance on an independent test set was similarly disappointing when simply applying a threshold to the FLAIR that was found from training data. Among the supervised methods, precision–recall curves of support vector machines (SVM) indicated advantages over the performance achieved by K-nearest-neighbor classifiers (KNN). The curves indicated a clear benefit from optimizing the threshold of the SVM decision value and the voting rule of the KNN. Best performance was reached by selecting training voxels according to their distance to the lesion boundary and repeated training after replacing the feature vectors from those voxels that did not form support vectors of the SVM.

The study demonstrates advantages of SVM for the problem of detecting WMH at least for studies that include only FLAIR and T1 weighted images. Various optimization strategies are discussed and compared against each other.

Research highlights

► Comparison of automated methods for the identification of white matter hyperintensities. ►Support vector machines show best performance with only FLAIR and T1 available. ►Simple thresholding produces high numbers of false positives. ►Replacing non-support vectors during training further improves results.

Introduction

White matter hyperintensities (WMH) as detected using magnetic resonance images (MRI) have been linked to small vessels disease, degeneration of myelin, enlarged Virchow–Robin spaces but also reduced blood supply at the endzone of choroidal arteries and loss of ependymal cells (Schmahmann et al., 2008, Wahlund et al., 2001). They are found primarily in the periventricular white matter (WM) and increase with age, cognitive impairment and depression (Schmahmann et al., 2008). A number of studies have recently focused on the connection between amount and location of WMH and these clinical symptoms and several employed visual scales to rate WMH (Kapeller et al., 2002, Prins et al., 2004, Scheltens et al., 1993, Wahlund et al., 2001). These scales estimate the lesion severity rather than explicitly detecting every single lesion and therefore entail a loss of information.

A number of approaches towards an automatic detection of WMH have been proposed. Morphologically, most lesions caused by large vessel disease (e.g., stroke) or multiple sclerosis show much sharper boundaries than those from by small vessel disease. In addition, older stroke lesions are typically hypointense in the T1 weighted images (T1w) while those caused by smaller vessel alterations are not (Schmahmann et al., 2008). Detection methods are optimized for the simultaneous analyses of multiple images including Fluid Attenuated Inversion Recovery (FLAIR), T2 weighted (T2w) and proton density (PD) images that are typically acquired alongside the standard T1w images. FLAIR images have similar characteristics as the T2w but with cerebrospinal fluid (CSF) suppressed. WMH are brighter than intact tissue and CSF which makes the FLAIR image suitable for simple thresholding or histogram based segmentation methods (Hirono et al., 2000, Jack et al., 2001). While these methods require the identification of a threshold or cut-point, Admiraal-Behloul and colleagues reported an excellent detection rate from the combination of PD, FLAIR and T2w imaging with a clustering algorithm (Admiraal-Behloul et al., 2005). Each of the images was segmented separately using a fuzzy C-means clustering that categorized voxels into bright, medium-bright and dark in case of the FLAIR image. The final identification of lesioned voxels was made using decision rules based on human experts. Although other approaches exist (Maillard et al., 2008, Wu et al., 2006), supervised learning methods are increasingly being used (Lao et al., 2008, Quddus et al., 2005). These methods are supervised in a sense that they learn to separate intact WM voxels from those in a WMH from training data. These training data usually consist of scans with manually outlined lesions. Each individual voxel is classified as either intact or affected based on various characteristics, often referred to as features. The feature vector (FV) from each sample usually includes the intensity of the index voxel and its neighborhood in all available modalities but may include other information such as the stereotactic coordinates (Anbeek et al., 2004a). Those coordinates may carry relevant information as WMH are not uniformly distributed throughout the brain (Schmahmann et al., 2008). K-nearest neighbor (KNN) classification is an example of supervised learning techniques and employs a voting strategy. It effectively assigns the index voxel to the same group as the k most similar examples of the training set. Anbeek et al., 2004a, Anbeek et al., 2004b demonstrated encouraging results with this approach on subjects with different vascular disorders based on T1w, T2w, PD, inversion recovery and FLAIR data. More recent studies have used support-vector machines (SVM) (Lao et al., 2008, Quddus et al., 2005), an approach that generates a decision boundary from the training data to separate voxel types (Müller et al., 2001, Vapnik, 1998). In addition to other applications to the voxel level (Schnell et al., 2009) this approach has been used to separate diagnostic groups (Fu et al., 2008, Klöppel et al., 2008) or to predict the future course of a disease (Davatzikos et al., 2009, Koutsouleris et al., 2009, Vemuri et al., 2009). The SVM should show a better generalization to new data not used in the training of the algorithm compared to KNN. This is because the SVM learns the relevance of each feature in the FV, while the KNN just uses an identical weight for each feature. In other words, the SVM is less influenced by information that contains noise rather than signal relevant for the classification. Quddus et al.(2005) combined SVMs with a boosting technique to optimize the detection of WML in PD images. Lao et al.(2008) combined T1w, T2w, PD and FLAIR imaging from several centers and classified according to the neighborhood of the index voxel in all modalities.

Although all groups reported good performance of the respective approach, they did not compare it against alternative strategies and it is likely that the best method will depend on lesion type and imaging modality. We aimed to compare supervised and unsupervised approaches frequently used in previous studies on a dataset that is limited to FLAIR and T1w data. Such a limitation can be useful for studies where an extensive testing battery or the severity of the disease puts substantial constraints on the duration of an imaging session.

Section snippets

Materials and methods

20 patients with either mild cognitive impairment (MCI) or mild dementia were recruited into this study (see Table 1). The diagnoses were made on the basis of clinical and neuropsychological data. Subjects with MCI complained about cognitive deficits and showed a decline of cognitive abilities (> 1 SD in age-corrected standardized tests). Patients with mild dementia showed cognitive decline (> 1 SD) from a previous level in at least 2 domains affecting activities of daily living (Kornhuber et

Results

As shown in Fig. 3 the SVM approach gave best performance compared to all other automated approaches. Differences between the different FV were subtle. A completely different picture emerged for KNN classifiers where the full length FV outperformed all others. An FV with the neighborhood of the FLAIR image rated second best with the KNN approach (see supplementary Figs. 4 and 5). Default cut-off values for the SVM and KNN grouping (symbols on the graphs in Fig. 3) resulted in a good recall

Discussion

The aim of this study was a comparison between different automated methods for the problem of detecting WMH from T1w and FLAIR images. All methods presented here depend on a large number on internal parameters. This starts at the preprocessing stage where several algorithms for segmentation and bias correction are available, continues with the scaling and feature extraction step and ends with the value of k in the KNN approach and the kernel type of the SVM. It is not feasible to search the

Acknowledgment

We would like to thank Christian Gaser for helpful comments on the design of this study and the image processing pipeline. This study was supported by the following grants: Federal Ministry of Education and Research, 01GW0661 to MH, German Research Council MA 2343/4-1 to IM and German Federal and State Governments (EXC 294) to OR.

References (30)

  • C. Bishop

    Pattern Recognition and Machine Learning

    (2006)
  • C.C. Chang et al.

    LIBSVM: a library for support vector machines

  • C. Davatzikos

    Longitudinal progression of Alzheimer's-like patterns of atrophy in normal older adults: the SPARE-AD index

    Brain

    (2009)
  • D. Gabor

    Theory of communication

    J. Inst. Electr. Eng.

    (1946)
  • N. Hirono

    Impact of white matter changes on clinical manifestation of alzheimer's disease: a quantitative study

    Stroke

    (2000)
  • Cited by (45)

    • Multiple sclerosis lesion detection in multimodal MRI using simple clustering-based segmentation and classification

      2020, Informatics in Medicine Unlocked
      Citation Excerpt :

      Researchers have applied machine-learning algorithms to MRI datasets to analyze various CNS conditions [1–4]. Various segmentation and detection algorithms have also been proposed for MS lesion detection and classification from MRI using various image processing and analysis methods [5–7]. These algorithms have been developed based on using a single MRI modality as well as using multiple MRI modalities such as T1-weighted (T1w), fluid-attenuated inversion recovery (FLAIR), T2-weighted (T2w) [8].

    • Segmentation of white matter hyperintensities using convolutional neural networks with global spatial information in routine clinical brain MRI with none or mild vascular pathology

      2018, Computerized Medical Imaging and Graphics
      Citation Excerpt :

      We compare the performance of the proposed CNN with GSI (CNN-GSI) framework with those of existing CNN (i.e., CNN without GSI), Support Vector Machine (SVM), Random Forest (RF) and Deep Boltzmann Machine (DBM) frameworks. Both of SVM and RF have been reported to work well for WMH segmentation (Ithapu et al., 2014; Klöppel et al., 2011) whereas DBM is a semi-supervised deep neural network which works well for feature extraction of MRI (Liu et al., 2012). In this study, we use greyscale value and texton features as features for SVM and RF, as per (Ithapu et al., 2014).

    • Machine learning of neuroimaging for assisted diagnosis of cognitive impairment and dementia: A systematic review

      2018, Alzheimer's and Dementia: Diagnosis, Assessment and Disease Monitoring
      Citation Excerpt :

      It is unlikely that the conclusions of the present analysis, based on a substantial body of work to late 2016, would change by the inclusion of these most recent papers. Some nonsystematic reviews and surveys on machine learning have been published [6–12]. Our work included more recent papers, assessed more outcomes, and included sensitivity analyses to assess the impact of key study and population characteristics than prior reviews [13–15], Applications of deep learning not only in brain but, more in general, in medical imaging have been reviewed in a recently published survey [16].

    View all citing articles on Scopus
    View full text