Automated Integration of Multimodal MRI for the Probabilistic Detection of the Central Vein Sign in White Matter Lesions

BACKGROUND AND PURPOSE: The central vein sign is a promising MR imaging diagnostic biomarker for multiple sclerosis. Recent studies have demonstrated that patients with MS have higher proportions of white matter lesions with the central vein sign compared with those with diseases that mimic MS on MR imaging. However, the clinical application of the central vein sign as a biomarker is limited by interrater differences in the adjudication of the central vein sign as well as the time burden required for the determination of the central vein sign for each lesion in a patient's full MR imaging scan. In this study, we present an automated technique for the detection of the central vein sign in white matter lesions. MATERIALS AND METHODS: Using multimodal MR imaging, the proposed method derives a central vein sign probability, πij, for each lesion, as well as a patient-level central vein sign biomarker, ψi. The method is probabilistic in nature, allows site-specific lesion segmentation methods, and is potentially robust to intersite variability. The proposed algorithm was tested on imaging acquired at the University of Vermont in 16 participants who have MS and 15 participants who do not. RESULTS: By means of the proposed automated technique, participants with MS were found to have significantly higher values of ψ than those without MS (ψMS = 0.55 ± 0.18; ψnon-MS = 0.31 ± 0.12; P < .001). The algorithm was also found to show strong discriminative ability between patients with and without MS, with an area under the curve of 0.88. CONCLUSIONS: The current study presents the first fully automated method for detecting the central vein sign in white matter lesions and demonstrates promising performance in a sample of patients with and without MS.

M ultiple sclerosis is an inflammatory demyelinating disease of the central nervous system characterized by lesions in the brain and spinal cord. Currently, assessment of MR imaging factors heavily in the diagnosis of MS, with much importance placed on the distribution (dissemination in space) and time course of lesions (dissemination in time) 1 in patients presenting with clinical symptoms typical for MS. However, current imaging-based diagnostic criteria favor sensitivity over specificity, making misdiagnosis of MS relatively common. 2,3 Misdiagnosis is especially prevalent among disorders that demonstrate white matter lesions similar to those found in MS. 4,5 As a means of distinguishing MS lesions from white matter abnormalities arising from other diseases, the identification of a vein traversing the center of a lesion has been proposed as a diagnostic tool because inflammatory demyelination in the MS white matter is perivenular. 6,7 The potential for this marker to be used in the diagnosis of MS has been advanced by recent developments in MR imaging pulse sequences, which have enabled detailed imaging of veins in the brain. [8][9][10] Using these sequences, researchers have provided strong evidence that higher proportions of MS lesions show the central vein sign (CVS) compared with lesions resulting from other disease processes commonly mistaken for MS. 7,[11][12][13][14][15][16] This finding has been demonstrated for neuromyelitis optica spectrum disorder, systemic autoimmune diseases, cerebral small vessel disease, Susac syndrome, and migraine. While further replication in a prospective setting is still necessary, a high proportion of brain MR imaging lesions demonstrating the CVS appear to have potential as a biomarker with high specificity for MS.
Unfortunately, important barriers limit the feasibility of the clinical application of the CVS. Two such limitations are the presence of intra-and interrater variability in the subjective assessment of the CVS and the time required to adjudicate the CVS in every MR imaging lesion per patient. Recent studies have attempted to mitigate the time burden associated with CVS assessment by limiting the number of lesions examined. 13,17 However, these techniques have the potential to increase variability and have generally not been as successful as the evaluation of the proportion of the CVS in all MR imaging lesions per patient. 7,18 Most important, in studies that adjudicate all lesions per patient, optimal proportion cutoffs have differed across study sites and disease comparisons. 7,13,18 This variability highlights the need for thorough comparison and optimization of these cutoffs across samples and diseases, yet the same issues of rater subjectivity and temporal burden make this type of research difficult. Thus, the current study introduces an algorithm for the automatic determination of the CVS in white matter lesions and presents a fully automated patient-level diagnostic biomarker. In this article, we describe the CVS-detection pipeline, present statistical measures of judgment accuracy, and discuss the implications and next steps for this line of research.

CVS Detection Algorithm
To adjudicate the CVS for each lesion in a given participant, we perform several steps. We first present the overall summary and then address each step, with associated rationale, in detail below. To perform the algorithm, we require a T1-weighted volume, T2weighted FLAIR volume, and T2*-weighted segmented echo-planar imaging volume: 1) A map of the veins present in the T2*-EPI volume is created using a process referred to as "vesselness filtering," and the vein map is rigidly registered to the T1 volume. 2) White matter lesions are segmented using the T1-and T2-FLAIR volumes. 3) Clear lesion boundaries are then determined using a process that removes ambiguous boundary voxels. 4) Periventricular lesions are removed from candidacy, per guidelines given by the North American Imaging in Multiple Sclerosis Cooperative. 19 5) A permutation procedure is performed to determine whether identified veins occur in the center of a given lesion to a greater degree than would be expected by chance. This yields a probability of a CVS for each lesion j in patient i's scan, denoted ij . Lesion-level CVS probabilities are then averaged to obtain a patient-level CVS biomarker, denoted i . 6) Contributions of the lesions to the average can be weighted by the noise in their T2*-EPI intensities to account for scan motion. Figure 1 demonstrates the steps of the algorithm on a sample lesion. Most important, while figures are necessarily presented in 2D space, all methods undertaken for this procedure are conducted in 3D volumetric space and simultaneously consider all 3 planes of the image.

Vesselness Filtering
Vein maps in the brain are created to later determine the presence or absence of veins in each lesion. To do this, we applied the Frangi vesselness filter 20 to the unregistered T2*-EPI volume (for the application to data, this study used the Convert3D Tool; https://sourceforge.net/projects/c3d/), producing a map of scores of Ն0, with scores of 0 implying no vesselness qualities. The Frangi filter is a vessel-enhancement algorithm based on the Hessian matrix at each voxel, in which the second-order structure of the image is obtained through convolution with derivatives of Gaussian kernels. The scores are calculated using the eigenvalues of the Hessian matrix, specifically picking up on tubular structures that are darker (or lighter, depending on the implementation) than their surroundings. After being obtained in the unregistered T2*-EPI space, these vesselness maps are then rigidly registered to the T1 space.

Lesion Segmentation
To determine the location and shape of white matter lesions, we performed automatic lesion segmentation on coregistered T1and T2-FLAIR volumes. For the application to data, this study used the Method for InterModal Segmentation Analysis (MIMoSA) model 21 in the R statistical environment. 22 The lesion segmentation algorithm produces a map containing the probability that each voxel is part of a lesion. For the results presented in this article, a threshold of 0.30 was applied to this probability map to create a binary lesion mask. The threshold of 0.30 was chosen because previous work has found it to be a conservative cutoff that can limit the amount of false-positive lesion tissue. 23,24 Following the definition of a lesion positive for CVS (CVSϩ) given by the North American Imaging in Multiple Sclerosis Cooperative, 19 we removed from candidacy lesions detected by the MIMoSA model of Ͻ3 mm in any plane.

Lesion-Boundary Determination
Thresholding of the lesion-probability map often results in pathologically distinct lesions being connected by ambiguous boundary voxels. For these lesions to be properly assessed for the CVS, the proposed algorithm addresses this pseudoconfluence through a recently described technique that removes voxels that are connecting pathologically distinct lesions. 24 The technique works by finding regions in which the texture of the lesion-probability map resembles the center of a lesion. Therefore, the centers that it produces are maintained and used for investigating the CVS for the remainder of this algorithm. Further detail on the implementation of this method can be found in the original publication. 24 Because the North American Imaging in Multiple Sclerosis guidelines call for the exclusion of confluent lesions, the removal of connecting voxels may represent a deviation from these guidelines in cases of true confluence. However, many lesions that would be judged discrete by expert raters are often merged by automated segmentation methods. 25 This merging can result in drastic and unrealistic degrees of pseudoconfluence in automated lesion masks, sometimes resulting in Ն50 distinct lesions being merged into Ͻ10 lesion components. 24 Thus, relying on automated determinations of confluence in automated lesion masks would likely result in the exclusion of many or most eligible lesions.

Periventricular Lesion Exclusion
The density and branching nature of veins near the ventricles makes assessment for the CVS difficult in periventricular lesions, especially in cases in which Ͼ1 distinct vein traverses the lesion. Thus, the North American Imaging in Multiple Sclerosis Cooperative recommends excluding lesions with Ͼ1 vein or with branching veins. 19 The proposed algorithm addresses this consideration by excluding periventricular lesions because periventricular lesions typically contain multiple veins. This exclusion is done by performing tissue-class segmentation on the T1 volumes (for the application to data, this study used the FMRIB Automated Segmentation Tool; FAST; http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/ fast) 26 , expanding the CSF region of the brain by 3 mm and eliminating lesions from the lesion-center mask that overlap the expanded CSF region. The choice of a 3-mm expansion was made on the basis of visual inspection of randomly selected T2*-EPI volumes, for which 3 mm appeared to include most of the branching vein structure discussed in the consensus statement, without removing too much of the deep white matter. Notably, although this technique excludes periventricular lesions, it does not exclude other lesions that may have multiple veins. This issue represents a second deviation from the North American Imaging in Multiple Sclerosis recommendations, which could potentially be addressed by future advances in methods for segmenting and counting distinct veins.

CVS Permutation Procedure
In lesions that contain central veins, one would expect aboveaverage coherence between the centrality of voxels within the lesion and their vesselness score. The proposed permutation procedure takes advantage of that expectation to examine the degree to which the most veinlike voxels of a lesion are more concentrated in the center of the lesion than one might expect to observe by chance. First, a vein-center coherence score for lesion j in patient i's scan, C ij , is calculated by summing the products of the distanceto-nearest-lesion-boundary of each voxel (ie, centrality) score, d ijv , and its Frangi vesselness score, f ijv . The coherence formula is given by where V is the set of all voxels in lesion j. Thus, higher values of this score indicate that the highest vesselness values within the lesion tend to occur in the same voxels as the highest centrality values.
A lesion-specific null distribution of coherence scores is created using 1000 random permutations to determine the degree to which this score deviates from chance in cases in which there is no biologic correspondence between vesselness and location within lesions. For each permutation, p, the vesselness scores of the voxels in lesion j are randomly resampled without replacement, yielding a randomly ordered set of values, V* p . A null coherence score is then calculated using the formula, This permutation procedure is performed 1000 times, resulting in a sample of 1000 null coherence scores. The lesion-level CVS probability, ij , is then calculated as the proportion of chance (null-distributed) CVS scores that are smaller than the observed score, given by To obtain a subject-level CVS biomarker, i , these probabilities are averaged over all lesions observed in patient i. The formula for i , is given by where N L is the number of candidate lesions in patient i's scan. The biomarker, i , can be roughly interpreted as the proportion of the patient's lesions that demonstrate the CVS.

Optional Noise Weighting
When one takes the average of the CVS probabilities for a patient's lesions, some lesions may have more reliable estimates than others. A more stable biomarker can potentially be obtained by weighting each lesion's contribution to the biomarker by the amount of noise in the voxels of the lesion on the T2*-EPI volume. To estimate the level of noise in a lesion, we first constructed a "noiseless" T2*-EPI by performing anisotropic diffusion on the original scan. 27 This procedure results in a smoothed volume that maintains tissue boundaries and other image gradients. Then, for voxels in the lesion, the difference is obtained between the original T2*-EPI and the smoothed T2*-EPI. A noise value is finally calculated by dividing the sum of the squared voxel differences by the total number of voxels in the lesion. For lesion j, this value is defined as where I v and I v s are the intensity and smoothed intensity of voxel v, respectively. The desired reliability weight is simply the inverse of this noise value, A weighted subject-level biomarker, i w , is then calculated by summing the products of the CVS probabilities of the lesions with their weights and dividing by the sum of the weights. The weighted biomarker is given by i w ϭ j ⑀L ij ϫ w ij j ⑀L w ij .

Implementation and Software
Accompanying this article, code for the central vein detection algorithm has been made freely available on-line (https://github. com/jdwor/cvs). One file, centralveins_full.R, contains code to run all preprocessing and analysis steps described in the previous section. This file serves to increase the understanding of all steps used in this study and to provide a straightforward tool that can be applied to raw images. A second file, centralveins_simple.R, contains code to be run directly on a probability map and a vein map. This file serves to improve implementations across different sites and scanners, for which researchers and clinicians may have preferred pipelines for preprocessing and lesion segmentation. Following preprocessing and structure segmentation, the cen-tralveins_simple function was found to take an average of 17.7 Ϯ 9.1 minutes and was roughly broken down as a 10-minute baseline with an additional 20 seconds per lesion when run without parallelization. Finally, a third file, helperfunctions.R, provides additional functions used within the previous 2 files.

Validation
Data. For this study, data were analyzed for 40 research participants recruited from the University of Vermont neurology clinic as part of a study aiming to improve the diagnostic specificity for MS. 17 Participants were between 20 and 67 years of age, and 37 were women. Ten had MS and no comorbidities known to produce MR imaging white matter abnormalities; 10 had MS and comorbidities known to produce MR imaging white matter abnormalities; 10 had migraine with MR imaging white matter abnormalities and no other white matter comorbidities; and 10 were previously incorrectly diagnosed with MS and had MR imaging white matter abnormalities and a variety of diagnoses (Table 1). Whole-brain 3D-T2-FLAIR, T1, and T2*-EPI 28 volumes were acquired on a 3T dStream MR imaging scanner (Philips Health-care, Best, the Netherlands) with a 32-channel dStream head coil. FLAIR and T1 volumes were obtained with 1-mm isotropic resolution, and T2*-EPI volumes were obtained with 0.55-mm isotropic resolution. N4 bias correction 29 was performed on all images, and the T2-FLAIR volume for each participant was interpolated to a voxel size of 1 mm 3 and rigidly coregistered to the T1 volume. Extracerebral voxels were removed from the T1 volume using a skull-stripping procedure, 30 and the brain mask was applied to the T2-FLAIR volume.
Motion-Exclusion Criteria. Because head motion might occur during the T2*-EPI scan, potentially producing uninterpretable images, each participant's T2*-EPI scan was manually rated for motion in the relevant white matter regions. Scans were scored from 1 to 5: One indicated "perfect, no artifacts, and excellent signal-to-noise," 2 indicated "only 1 minor artifact that does not obscure any vessels in supratentorial white matter," 3 indicated "more than 1 artifact that does not obscure any vessels in supratentorial white matter," 4 indicated "more than 1 artifact that does obscure some vessels in supratentorial white matter," and 5 indicated "severe artifacts or bad signal-to-noise that does obscure most vessels in supratentorial white matter." It was decided a priori that scans that were rated 5 would be removed for the primary analysis because scans with that degree of motion may be unusable in clinical practice as well.
Performance Assessment. Because the CVS shows great promise as a diagnostic biomarker, the performance of this algorithm in distinguishing MS and non-MS is of primary interest. To determine whether the automated biomarkers, i and i w , replicate the findings from previous work that the distribution of manually adjudicated central vein proportion differs between MS and its mimics, we used t tests to compare the automated CVS values for patients with and without MS. To determine the diagnostic utility of i and i w , we estimated the area under the curve values of the receiver operating characteristic curves. The presence of a difference in performance between i and i w was tested with the DeLong test for comparing the areas under correlated receiver operating characteristic curves 31 using the pROC package in the R statistical environment. 22,32 Sensitivity and specificity were calcu- lated using the 40% cutoff, 7 under which inflammatory demyelination is diagnosed if Ն40% of white matter lesions exhibit the CVS, as well as the more recently proposed 50% cutoff. 18 Additionally, locally optimal cutoffs were determined, and their sensitivity and specificity values were compared with those obtained using established cutoffs.
Finally, these cutoffs were compared with the performance of proportion cutoffs applied to manual determinations of the CVS in previous research, 7,13,18 as well as the performance of 3 recently proposed clinical decision rules that do not require the assessment of the full set of lesions in a scan. The first such rule, referred to as the rule of 6, 13 states that inflammatory demyelination is diagnosed if there are Ͼ6 lesions with the CVS or if more than half of lesions show the CVS. The second and third, referred to as se-lect3 15 and select3*, 17 state that inflammatory demyelination is diagnosed if the CVS is found in at least 2 of 3 lesions preselected on T2-FLAIR and FLAIR* 9 imaging, respectively.

RESULTS
Following manual ratings of scan noise due to motion, 9 participants were excluded and 31 remained for the primary analysis. Of the remaining 31 participants, 16 had MS and 15 did not. Automated CVS detection was performed on these 31 participants using the algorithms and software packages described in the previous section. Two-sample t tests were run to determine whether the automated CVS scores differed between the 16  To determine the diagnostic utility of the automated biomarkers, i and i w , we estimated receiver operating characteristic curves and calculated their areas under the curve. For the unweighted case, i yielded an area under the curve of 0.84 (Fig 3A). On the basis of the 40% rule, applying a cutoff of 0.40 to i yielded a sensitivity of 0.94 and a specificity of 0.67. On the basis of the 50% rule, applying a cutoff of 0.50 to this biomarker yielded a sensitivity of 0.56 and a specificity of 0.80. Three locally optimal cutoffs appear to occur at 0.38, at which sensitivity was 1.00 and specificity was 0.67; at 0.44, at which sensitivity was 0.75 and specificity was 0.73; and at 0.50, at which sensitivity was 0.56 and specificity was 0.80 (Table 2).
For the noise-weighted case, i w yielded an area under the curve of 0.88 (Fig 3B). Applying a cutoff of 0.40 to i w yielded a sensitivity of 0.75 and a specificity of 0.73. Applying a cutoff of 0.50 yielded a sensitivity of 0.56 and a specificity of 0.93. Two locally optimal cutoffs for i w appeared to occur at 0.37, at which sensitivity was 0.94 and specificity was 0.73, and at 0.46, at which sensitivity was 0.63 and specificity was 0.93 (Table 2). Although the weighting appeared to produce marginally improved performance, no significant difference was found using the DeLong test (Z ϭ 0.77, P ϭ .22). Robustness analysis on the full sample of 40 participants after reintroducing the motion-obscured scans showed area under the curve values of 0.77 and 0.81 for i and i w , respectively.
Previous studies that used CVS proportions within patients' full sets of lesions obtained optimal sensitivity/specificity of 1.00/ 1.00 when comparing cases of MS with undiagnosed cases without MS, 7 patients with microangiopathic lesions, 13 and patients with inflammatory vasculopathies. 18 Prior research on a subset of the current sample was unable to obtain perfect discrimination between patients with MS and those with migraine when adjudicating the CVS for all lesions. 15 Compared with cutoffs that used the full set of lesions, decision rules based on a subset of lesions were generally less discriminative between participants with and without MS. The rule of 6 did obtain a sensitivity/specificity of 1.00/1.00 for distinguishing patients with MS and those with small-vessel ischemia, 13 yet in the current sample of MS, migraine, and misdiagnosed patients, the select3 procedure obtained a sensitivity/specificity of 0.81/0.95 and the select3* procedure obtained a sensitivity/specificity of 0.81/0.83. 17

DISCUSSION
Preliminary studies have proposed and validated CVS as a promising biomarker for differentiation of MS from other diseases that cause MR imaging white matter abnormalities. 7,15,19 Yet concerns remain regarding the heavy temporal burden on manual adjudication of CVS as well as the subjective differences that may arise in response to variation in the adjudicators' time constraints and intuition. This study sought to address these issues by introducing an algorithm for automated CVS detection that could, in principle, following further validation, be applied in clinical practice.
In the primary analysis, the algorithm was tested on a cohort of 16 patients with MS (8 with and 8 without other white matter comorbidities) and 15 patients without MS (8 with migraine and 7 misdiagnosed with MS). The fully automated technique replicated previous work that used manual adjudications 7,11-16 by demonstrating that proportions of lesions with the CVS differ significantly between MS and its mimics. Additionally, the automated biomarkers, i and i w , were found to have strong diagnostic ability, with areas under the curve of 0.84 and 0.88 and optimal sensitivity/specificity of approximately 0.94/0.70. There is also great promise for this algorithm to perform consistently across study sites and MR imaging scanners because in-house preprocessing and lesion-segmentation methods can be easily substituted and the remaining steps (obtaining vesselness scores, finding lesion centers, and calculating CVS probabilities) do not require parameter tuning.
Most important, the automated biomarkers presented in this study did not perform as well as previously obtained proportions of the CVS based on manual ratings of all lesions in patients' scans. Specifically, the 40% and 50% cutoffs used in prior manually rated studies often achieved perfect discrimination between patients with and without MS, 7,18 which the automated biomarkers were not able to replicate. However, previous work in a subset of the current sample showed that manual ratings of all lesions did not fully distinguish patients with migraine from those with MS and no white-matter comorbidities. 15 This finding suggests that the patients without MS in the current sample might be more difficult to distinguish from those with MS using the CVS alone than the patients without MS in the studies that did obtain perfect discrimination.
Additionally, although the sensitivity and specificity obtained by these biomarkers were lower than those in the manually obtained CVS proportions, the biomarkers performed comparably with decision rules that use only a subset of lesions in a scan. 17 Thus, while automated adjudication of every lesion in a scan is not yet as accurate as manual adjudication of every lesion in a scan, the proposed automated method shows promise as an alternative to other clinically feasible methods for identifying inflammatory de-  myelination. Further study and refinement of this technique have the potential to yield biomarkers that are both feasible for use in the clinic and comparable in accuracy and reliability with CVS proportions obtained by manual adjudication. There are several important limitations to the proposed algorithm. First, biomarker values were found to be lower than previously reported CVS proportions for patients with MS and higher than previously reported CVS proportions for those without MS. It is possible that this effect is due to errors in lesion segmentation, which would pull the values of participants with and without MS toward each other due to the assessment of noninformative falsepositive lesions. Because this method allows in-house lesion segmentation algorithms to be applied, the impact of false-positive lesions could potentially be mitigated in practice. It is also possible that the effect is due to false-positives or false-negatives in automated CVS assessment. Future work will use manual lesion-level assessments to tease apart these potential sources.
Additionally, the exclusion of 9 of the 40 subjects due to noise in the T2*-EPI scan represents a potential weakness of this automated method. However, robustness analysis found that the performance of the method on the full sample was not drastically reduced compared with the high-quality subset. This finding suggests that in clinical practice, a great deal of motion would not render a scan useless but instead may be an additional consideration for clinicians when interpreting the results of the algorithm.

CONCLUSIONS
Although the potential clinical implications of an automated tool for CVS adjudication call for further study and refinement of such techniques, the current study demonstrates the promising performance of a fully automated method for detecting CVS in white matter lesions. To our knowledge, this is the first automated technique for this challenging aspect of MS diagnosis and represents an important step forward toward a specific MR imaging biomarker for MS lesions.