Individual Detection of Patients with Parkinson Disease using Support Vector Machine Analysis of Diffusion Tensor Imaging Data: Initial Results

BACKGROUND AND PURPOSE: Brain MR imaging is routinely performed in the work-up of suspected PD, yet its role is essentially limited to the exclusion of other pathologies. We performed a pattern-recognition analysis based on DTI data to detect subjects with PD at the individual level. MATERIALS AND METHODS: We included 40 consecutive patients with Parkinsonism suggestive of PD who had DTI at 3T, brain 123I ioflupane SPECT (DaTSCAN), and extensive neurologic testing including follow-up (17 PD: age range, 67.8 ± 6.7 years; 9 women; 23 Other: consisting of atypical forms of Parkinsonism; age range, 67.2 ± 9.7 years; 7 women). Data analysis included group-level TBSS and individual-level SVM classification. RESULTS: At the group level, patients with PD versus Other had spatially consistent increase in FA and decrease in RD and MD in a bilateral network, predominantly in the right frontal white matter. At the individual level, SVM correctly classified patients with PD at the individual level with accuracies up to 97%. CONCLUSIONS: Support vector machine–based pattern recognition of DTI data provides highly accurate detection of patients with PD among those with suspected PD at an individual level, which is potentially clinically applicable. Because most suspected subjects with PD undergo brain MR imaging, already existing MR imaging data may be reused; this practice is very cost-efficient.

P D is the most common degenerative movement disorder in the general population, named after the English doctor James Parkinson, who published its first detailed description in 1817. Brain MR imaging is routinely performed in the diagnostic work-up, yet its role is essentially limited to the exclusion of other pathologies such as normal-pressure hydrocephalus or chronic subdural hematoma, among others. One of the rare reported alterations visible on conventional MR imaging is narrowing or disappearance of the pars compacta of the substantia nigra on T2-weighted imaging, 1 yet this sign has low sensitivity and specificity and does not contribute to the diagnosis of PD, in particular at an early stage.
On the basis of the assumption that PD is associated with systematic changes in brain MR imaging, which are too subtle to be detected by visual analysis, we analyzed brain MR imaging of subjects with suspected PD by using an advanced computer-based method aiming to contribute to the diagnosis of PD at an individual level. There are 2 fundamental approaches of advanced MR imaging data analysis. The first and more frequently implemented type of group-level studies typically compares Ն1 group of patients with healthy controls with the aim of detecting disease-related structural alterations, for example. [2][3][4][5][6] Although fascinating from a research perspective, the disadvantage of these group-level studies is that the results cannot be applied to the diagnosis of individual patients in clinical neuroradiology. The second type of analysis aims to detect or classify individual patients. Because the composition of the included subjects may bias the classification accuracy, the study groups should ideally consist of unselected and consecutive patients with suspicion of a given disease (see "Discussion"). The individual classification is potentially clinically applicable. The disadvantage is the difficulty or even impossibility of interpreting the results from a neuropathologic perspective.
To achieve a potentially clinically applicable individual diagnosis, we performed a study of the second type, implementing a pattern-recognition approach. This pattern recognition can be illustrated in short in the example of face recognition. Individual faces are not detected on the basis of single features such as the tip of the nose, ears, eyes, and so forth, but by the combination of multiple features-even though each individual feature may be not significantly different between groups. In the present study, the entire brain is included in the patternrecognition analysis to obtain 1 individual predictive value per subject. More technically, classification analyses can be explained best for a simple example of only 2 features, which can be represented by an x-y plot. If one assumes that all subjects of group A are in the upper left part and all subjects of group B are in the lower right part of the plot, the 2 groups can be discriminated by an oblique ascending line. Of all possible lines to discriminate between groups, the SVM 7 identifies the line that best discriminates between groups. For a more detailed discussion, see a recent review of SVM classification in neurodegenerative diseases by Haller et al. 8 Because the most relevant clinical question is not the discrimination of PD versus healthy controls but the detection of idiopathic PD versus other atypical forms of Parkinsonism including MSA and PSP, we included 40 consecutive subjects with suspected PD rather than healthy control subjects. Inclusion criteria were brain 123 I ioflupane SPECT (DaTSCAN; GE Healthcare, Buckinghamshire, United Kingdom) as a reference and extensive neurologic testing, including long-term follow-up. We analyzed DTI data because a number of recent investigations in various neurodegenerative disorders 6,[9][10][11][12] demonstrated that white matter DTI TBSS 13 analysis is more sensitive than gray matter VBM. 14 A recent study implementing an equivalent approach successfully discriminated between stable versus progressive mild cognitive impairment in the domain of dementia. 15 We show that the data analysis chain of TBSS preprocessing of DTI data followed by SVM classification provides a highly accurate individual detection of PD in consecutive subjects with Parkinsonism, despite the absence of visually detectable brain MR imaging differences.

Subjects
This retrospective study was approved by the local ethics committee. We included all consecutive patients in our institution between 2006 and 2011 with suspected PD who met the following criteria: 1) DTI at 3T without motion artifacts, 2) brain 123 I ioflupane SPECT (DaTSCAN) as a reference, 3) extensive neurologic testing including follow-up, and 4) the absence of morphologic findings on brain MR imaging. All patients were evaluated by an experienced movement disorders specialist.

MR Imaging
MR imaging was performed on a 3T clinical routine whole-body scanner (Magnetom Trio; Siemens, Erlangen, Germany). We used a standard DTI sequence: 30 diffusion directions, b ϭ 1000 s/mm 2 isotropically distributed on a sphere, 1 reference b ϭ 0 s/mm 2 image with no diffusion-weighting, 128 ϫ 128 ϫ 64 matrix, 2 ϫ 2 ϫ 2 mm voxel size, TE ϭ 92 ms, TR ϭ 9000 ms, 1 average. Additional sequences (axial spin-echo T1-weighted or gradient-echo 3D T1-weighted, axial spin-echo T2-weighted, coronal FLAIR, axial gradient-echo T2*) were acquired and analyzed to exclude brain pathology, such as ischemic stroke, subdural hematomas, or space-occupying lesions by an experienced radiologist during clinical work-up. In particular, white matter lesions were analyzed according to the score of Fazekas et al. 16

Image Processing
Preprocessing of the DTI data was performed by using the standard procedure of TBSS, as described in detail before, 13,17 in the FSL software package (http://www.fmrib.ox.ac.uk/fsl/). 18 In principle, TBSS projects all subjects' FA data onto a mean FA tract skeleton by using nonlinear registration. The tract skeleton is the basis for voxelwise cross-subject statistics and reduces potential misregistrations as the source for false-positive or -negative analysis results. The other DTIderived parameters, LD (also known as axial diffusivity, the first eigenvalue), RD (the average of second and third eigenvalues), and MD (the average of all 3 eigenvalues), were analyzed in the same way by using spatial transformation parameters that were estimated in the initial FA analysis.

Statistical Analysis
Analysis of Demographic and Clinical Data. The statistical analyses of the demographic and clinical data were performed in Graph-Pad Prism, Version 5 (GraphPad software, www.graphpad.com). Age was analyzed by using parametric 2-sample 2-tailed t tests, while sex and Fazekas score were analyzed by using the nonparametric 2-sample 2-tailed Mann-Whitney U tests.
Group-Level TBSS Analysis. Voxelwise statistical analyses were corrected for multiple comparisons implementing threshold-free cluster enhancement, considering fully corrected P values Ͻ.05 as significant. 19 Age and sex were used as nonexplanatory coregressors. We used Johns Hopkins University DTI-based white matter atlases, which are distributed in the FSL package, for anatomic labeling of the suprathreshold voxels.
Individual-Level SVM Analysis. The individual SVM classification analysis is identical to that in a previous study. 15 The TBSS preprocessed DTI FA data were converted into a Waikato Environment for Knowledge Analysis-compatible data format. The individual classification was analyzed in the freely available Waikato Environment for Knowledge Analysis software package (http://www.cs.waikato. ac.nz/ml/weka, Version 3.6.1). The analysis included 2 steps. In a first step, we performed a RELIEFF 20 feature selection (http:// rss.acs.unt.edu/Rdoc/library/dprep/html/relief.html). The rationale behind this step is that not all voxels discriminate between groups. Both the inclusion of nondiscriminative voxels and the exclusion of discriminative voxels reduce the classification accuracy. We selected the top 100, 250, 500, 750, and 1000 features implementing 10-fold crossvalidation. The second step consisted of the "actual" classification analyses by using the SVM algorithm sequential minimal optimization 21 (distributed in the Waikato Environment for Knowledge Analysis package) with a radial basis function kernel. 22 There are 2 parameters while using radial basis function kernels: C and ␥. ␥ represents the width of the radial basis function, and C represents the error/ trade-off parameter that adjusts the importance of the separation error in the creation of the separation surface. On the basis of our previous experience, ␥ was iteratively explored from 0.01 to 0.09 with an increment of 0.01, while C was fixed to 1.00. We performed 10 repetitions of a 10-fold cross-validation technique.

Clinical Data
The final sample included 17 PD subjects (age range, 67.8 Ϯ 6.7 years; 9 women) and 23 Other subjects (age range, 67.2 Ϯ 9.7 years; 7 women). Age, sex, and Fazekas score did not differ significantly between the 2 groups (Table 1). For the PD group, a diagnosis of PD was made in the presence of typical, asymmetric, and levodopa-responsive Parkinsonism meeting the UK Parkinson's Disease Society Brain Bank criteria, including at least 2 supportive criteria such as slow progression or peak-dose dyskinesia. PD was moderately advanced (mean Hoehn and Yahr stage 23 : 2.4 Ϯ 0.6), and none of these patients had atypical features, even after at least 2.5 years of follow-up (mean follow-up duration, 6.3 Ϯ 3.1 years). In addition, all had an asymmetrical decrease of 123 I ioflupane uptake in the posterior aspect of 1 or both putamina on the DaTSCAN. The Other group was more heterogeneous, reflecting the prevalence of common PD-mimicking conditions in the daily activity of a movement disorders clinic. All of these patients exhibited Parkinsonism, defined as the presence of bradykinesia associated with resting tremor or rigidity. It included pathologies as varied as MSA (n ϭ 5), PSP (n ϭ 1), dementia with Lewy bodies (n ϭ 2), vascular Parkinsonism (n ϭ 3), frontotemporal dementia with Parkinsonism (n ϭ 1), drug-induced Parkinsonism (n ϭ 2), atypical tremor (n ϭ 2), traumatic brain injury (n ϭ 1), and psychogenic Parkinsonism (n ϭ 1). Diagnoses were established according to widely accepted clinical criteria, whenever available, including those by Gilman et al for MSA, 24 27 In 5 cases, a firm clinical diagnosis could not be established at last assessment and these patients were labeled as having unspecified Parkinsonism.

TBSS Group Differences
The PD group compared with the Other group had a significant increase in FA and a corresponding significant decrease in RD and MD, in particular in the right frontal white matter (Fig  1 and On-line Table). The level of significance of FA was slightly lower compared with RD and MD. LD had spatially overlapping changes, which were just below threshold (not illustrated).
The inverse comparisons yielded no suprathreshold clusters.

SVM Individual Classification Analysis
SVM analysis of FA provided a correct classification between PD versus Other with accuracies of up to 97.50 Ϯ 7.54%. The spatial distribution of the most discriminative voxels (features) overlapped substantially with the results of the grouplevel TBSS analysis as illustrated in Fig 1 and Table 2.

Discussion
Despite the absence of visually evident alterations in brain MR imaging, computer-based SVM analysis of DTI data provides highly accurate individual detection of patients with PD and is potentially applicable in clinical neuroradiology. Brain MR imaging is routinely performed in the work-up of Parkinsonism, notably to exclude concomitant diseases. Available MR imaging data are reused by using advanced data analysis, making this a very cost-effective method, which is not intended to replace but to complement existing methods to obtain an early and specific diagnosis of PD.

Group-Level TBSS Analysis
The first part of the analysis was a group-level analysis of white matter changes in subjects with suspected PD.
Most previous advanced MR neuroimaging studies in the domain of PD compared subjects with PD with healthy controls, with the objective of identifying disease-related alterations in brain morphometry. For example, one of the rare DTI studies in PD demonstrated a lower FA in the substantia nigra in PD compared with healthy controls, which inversely correlated with the clinical severity of PD. 2 Two studies used manually defined ROIs in the substantia nigra showing diminished FA in PD versus healthy controls. 2,28 Another study used multiple ROIs in 10 patients with PD without dementia and 10 healthy controls, showing decrease in FA and increase in MD in the genu of the corpus callosum and in the superior longitudinal fasciculus. 5 Yet other studies found differences in DTI FA in patients with PD with olfactory impairment, 29 patients with PD and depression, 30 or between patients with PD and familial essential tremor 4 or corticobasal syndrome. 31 Another recent study assessed different MR imaging parameters including R2*-, R2-, and R1-mapping, magnetization transfer, and DTI in 31 patients with various forms of Parkinsonism and found that manually defined ROIs of DTI are most useful for identifying PSP, yet less useful for PD. 32 Most interesting, a recent study assessed white matter in 36 subjects with PD and 23 controls on the basis of another technique, notably magnetization transfer imaging. 33 PD-related changes in WM were more sensitive than gray matter volume or attenuation derived from T1-weighted images.
All of these studies have in common that they included very selected control groups.
In contrast to these studies, the aim of the present investigation was to use DTI as a marker to detect individual patients with PD in a group of subjects with suspected PD. We, therefore, deliberately included consecutive nonselected patients with various forms of Parkinsonism. We observed, in our sample, alterations in FA, a measure of axonal integrity, 34 as well as RD and MD in a bilateral right-dominant frontal network. The observed increase in FA in our study is in agreement in principle with the observed increase in FA in de novo patients with PD versus healthy controls in a recent combined T1weighted and DTI study, 3 while another study in 12 subjects with nondemented PD and 13 controls demonstrated a decrease in FA bilaterally in the frontal lobes. 6 Differences in disease duration, severity, and characteristics might explain this discrepancy. Moreover, the increase in FA in PD versus Other subjects in our study is probably due to a decrease in FA in the Other group, rather than due to an increase in FA in the PD group. Due to the heterogeneous constitution of our control group, the results of our group-level comparison should be interpreted with caution and are presented mainly to visualize the presence of detectable DTI changes between subjects with PD versus Other, as a basis for the later individual-level pattern-recognition analysis.

Individual-Level SVM Classification Analysis
To obtain individual discrimination of subjects with PD, we adopted a complex methodology including a chain of TBSS preprocessing of DTI FA data, feature selection of the most discriminative voxels, and subsequent SVM classification. 15,35  The classification accuracy of approximately 97% across the 10 repetitions of the 10-fold cross-validation implies that on average, only 1 subject was incorrectly classified. Note that SVM 7 analyses for individual classification are fundamentally different from the group-level voxelwise analyses discussed above. Such voxelwise analyses are univariate tests, which separately analyze each included voxel between 2 (or more) groups. Given the multiple tests, in our dataset of approximately 150,000 voxels, it is necessary to implement, as a second step, a correction for multiple comparisons. In contrast, individual-level SVM analyses are multivariate tools that originate from a field called "machine learning" or multivoxel pattern analysis, a branch of artificial intelligence. The aim is to identify patterns that allow the discrimination of individual subjects. There is only 1 resulting parameter per subject; hence, there is no need for corrections for multiple compari-sons. For a more detailed discussion of SVM classifiers, see a recent review by Haller et al. 8 There are only a few previous applications of SVM classification in the domain of PD. Most of these studies applied SVM classifiers to behavioral data of gait analysis, 36 fine-motor force tracking, 37 and analysis of wearable accelerometer sensors 38 or joint movement, 39 and even the recorded voice. 40 The only previous SVM application to MR imaging data in the domain of PD analyzed VBM-preprocessed gray matter in 21 patients with PD, 11 with progressive multisystem atrophy and 10 with PSP, and 22 healthy controls. 41 The best classification accuracy up to 96.8% was obtained for PSP versus PD, while the accuracy was 71.9% for MSA versus PD. However, patients with PD could not be discriminated from controls. These classification accuracies are consistent with the clinical neuroradiologic experience because PSP has the most pronounced visible alterations in brain MR imaging with atrophy of the mesencephalon referred to as the "penguin" or "hummingbird" sign, 42 while changes in progressive multisystem atrophy are already less pronounced and, as discussed above, PD-associated changes are very subtle.
Neuroradiologic research has been dominated for decades by group studies. The patient groups for such studies should ideally be preselected to have homogeneous and matched groups with typical disease. This is fundamentally different for individual-level classification studies because the preselection of cases might systematically confound the performance of a classifier in a typical clinical setting consisting of unselected and consecutive cases. The preselection of an "artificially" homogeneous atypical Parkinsonism group would train the clas-   sifier to detect regions that best discriminate idiopathic PD versus the group of preselected atypical Parkinsonism, yet these regions are not necessarily those that best discriminate between idiopathic PD versus unselected and consecutive patients with suspected PD in a clinical setting. Note the prevalence of these diseases is very different. According to Medscape (http://emedicine.medscape.com), the prevalence in the United States of, for example, progressive multisystem atrophy is 1.9 -4.9 cases per 100,000 population (http://emedicine. medscape.com/article/1154583-overview#a0199), yet around 120 (range, 18 -328) in PD (http://emedicine.medscape.com/ article/1154583-overview).
With respect to a potential clinical application of a classifier, the study populations should ideally match the prevalence of the diseases: A very high accuracy of detection of a very rare disease may overestimate, while the incorrect classification of a rare disease may underestimate, the classification accuracy in "real-world" data. Moreover, preselection of patients typically results in the inclusion of "classic" cases, which may have stronger disease-related alterations, while "real-world" data may contain fewer "classic" cases. In other words, the preselection of patients with specific diseases might represent a systematic confound with respect to the performance of a classifier in "real-world" data, and we consequently included consecutive unselected patients from our institution.
Most previous applications of the SVM classification to brain MR imaging data were performed on gray matter VBM data in the domain of Alzheimer disease 43 and mild cognitive impairment, [44][45][46] with classification accuracies between 75% and 85%. One recent investigation also assessed DTI data in the domain of mild cognitive impairment with accuracies over 95%, 15 implementing equivalent methodology to that used in the current investigation. This suggests that DTI might be a very sensitive brain MR imaging parameter. Consistent with this observation, several recent group-level investigations in various neurodegenerative disorders demonstrated that white matter DTI TBSS analysis is more sensitive than gray matter VBM. 6,[9][10][11][12] This does not necessarily imply that WM pathology is more pronounced at a histopathologic level; it might simply be a methodologic difference in data acquisition and preprocessing sensitivity favoring white matter DTI TBSS over gray matter T1-weighted-based VBM. Note that FA is a directly assessed absolute parameter between 0 and 1. In contrast, VBM is based on relative 3D T1-weighted values. VBM consequently requires preprocessing that segments the brain into different compartments to provide an indirect probabilistic gray matter probability per voxel. Additionally, at least the major tracts of the white matter skeleton (TBSS) are generally more linear and have less interindividual variation compared with the much more complex superficial gyral and sulcal folding pattern (VBM), which might imply a different quality of the spatial normalization of the data.

Limitations
The major limitation of the present investigation is the relatively small number of cases, which may affect the results of the SVM analysis, and we propose the current investigation as preliminary data. In fact, the very high accuracy rates of individual classification exceeded our expectations. Both training and testing were performed on the same dataset. A simple split of the data into 2 halves, by using one-half for training and the other half for testing is problematic in small sample sizes such as in the current study because this decreases the number of instances to train the classifier, and the classification accuracy might depend on the division of cases. The reported values were obtained by a well-established 10-fold cross-validation in which 9 parts were used for training and the remaining part was used for testing the classifier. This procedure was repeated 10 times, so that each dataset was used once for testing. This approach increases the sample size for the training of the classifier and at the same time reduces variation of the classification results due to division of cases.
Even though this cross-validation approach is a standard method in the field of machine learning/multivoxel pattern analysis and appropriate for the number of subjects involved in our study, the present results are too optimistic, related to some degree of overfitting of the data. Future validation of the present findings is warranted in a larger and independent sample, which ideally should be acquired on different MR scanners. The consecutive and unselected composition of the Other group is another limitation, yet the rationale behind this selection is discussed in detail above. Additionally, the nonlinear (radial basis function-kernel) SVM does not provide an easy-to-interpret weight vector to identify the most discriminative brain areas. Another limitation is the retrospective nature of the present study.