Evaluation of a Practical Visual MRI Rating Scale of Brain White Matter Hyperintensities for Clinicians Based on Largest Lesion Size Regardless of Location

BACKGROUND AND PURPOSE: Age-related white matter hyperintensities have prognostic implications, but no accepted clinical standard exists for their assessment. We propose a simple objective visual rating system by using 3T brain MR imaging. MATERIALS AND METHODS: MR imaging from 559 participants was processed by using an automated method to determine WMH volumes and evaluated with a new visual rating scale based on the single largest WMH lesion diameter regardless of location. The reproducibility of the visual system was assessed. The association of WMH visual scores and automated volumes was then compared with cognitive scores from the Montreal Cognitive Assessment, which was available for 510 participants. RESULTS: Inter-reader reproducibility was good for subsamples with both high (n = 52) and low (n = 40) prevalence of large automated WMH volumes (agreement of 67% and 87.5%, κ = 0.71 and 0.76, respectively). Correlation between increased WMH and cognitive deficit measurements was equal for our visual ratings and automated volumes (Spearman ρ = 0.118 and 0.109; P values = 0.008 and 0.014, respectively). The visual scale retained a significant association with MoCA score after adjusting for age, sex, and education (standardized β = −0.087, P = .042). CONCLUSIONS: We propose a simple visual WMH scoring system suitable for use as a baseline evaluation in clinical practice.

B rain WMHs are increasingly detected due to greater imaging utilization, the aging of our population, and the higher sensitivity of 3T MR imaging. 1 Objective determination of disease severity is increasingly important, as mounting evidence implicates WMH severity as a risk factor for motor and cognitive decline, dementia, stroke, and death. [2][3][4][5][6][7] We have developed a simple rat-ing system with the aim of reducing the considerable variability in white matter hyperintensity assessment in clinical practice. 8 We adapted our system from the objective grading criteria for deep WMH developed by Gouw et al. Their system has a significant association with cognitive and physical impairment, equivalent to a complex visual scale with WMH localization and WMH volume quantification. 9 Their use of simple size measurements for grading is desirable as it obviates the need for standard reference images or expert instruction and reduces subjectivity. Their scale is limited, however, in that it does not incorporate rating of periventricular WMH. This is problematic as several large studies evaluating cognitive outcomes have a significant association with periventricular but not deep WMH, 3,10-12 though this may be due to periventricular WMH more closely correlating with total white matter burden. 13 Periventricular WMH was initially thought to have a completely distinct etiology from deep WMH. 14 Later, Fazekas et al 15 found that advanced (defined as "irregular") periventricular WMH demonstrated similar ischemic changes at pathology as seen in advanced deep WMH indicating equivalent disease severity. Subsequent studies have also demonstrated that periventricular WMHs extending farther from the ventricular surface are as-sociated with lipohyalinosis and ischemic changes whereas thin bands of periventricular WMH Ͻ3 mm in thickness are likely a normal finding or associated with alterations in the subependymal lining. 1,13,15 De Carli et al 13 evaluated visual methods of WMH grading and found lesion classification as periventricular or deep, based on axial images, to be inaccurate. Lesions characterized as deep were often seen to abut the ventricle when examined in multiple planes. Spatial analysis of segmented WMH failed to identify distinct populations of deep versus periventricular disease. Periventricular and deep WMH were found to correlate highly with each other and with the overall WMH burden, favoring a common underlying etiology. Thus, the rationale for differentiating periventricular and deep WMH is increasingly unclear, and its implementation is often imprecise. This argues for rating deep and periventricular WMH jointly.
We assessed a modification of the objective grading system developed by Gouw et al 9 by applying its simple size criteria to the assessment of WMH in determining a global disease score based on the largest lesion identified without regard to deep or periventricular location. Reproducibility was assessed among groups with relatively high and low median WMH volume by automated segmentation as lesion load may affect test reliability. 16 We compared the association between our WMH visual scale and automated volumes with scores from the MoCA. 17

Participants
A subject sample of 563 individuals was drawn from a subset of the Dallas Heart Study. 18 Each participant gave written consent to participate in the study under a protocol approved by the Institutional Review Board. Sample size was chosen to be comparable to or greater than that used in similar published work. 9,19 We excluded from further analysis 3 participants with encephalomalacia and one with probable vascular malformation resulting in 559 participants evaluated for this study by a neuroradiologist. Of these, a second radiologist also jointly reviewed 20 images as a training set and, for reproducibility testing, 52 from a sample enriched for larger WMH volumes (discussed in the WMH analysis methods section) and 40 from a random sample out of the 559 total evaluated. From this sample, 510 were also evaluated by using the MoCA.

WMH Analysis
Quantification of WMH volume (mL) was performed by using an automated segmentation algorithm we developed, which has been previously described 20 by using the FMRIB Software Library (FSL; http://www.fmrib.ox.ac.uk/fsl). 21,22 Visual grading was based on the size of the single largest lesion as shown in On-line Table 2. All measurements were taken from axial FLAIR images. The intensity threshold of WMH was defined by an internal standard as greater than that of cortical gray matter. Periventricular and deep WMH are equivalent in our system, and one overall global score is given reflecting the single largest WMH lesion regardless of location. The size for periventricular WMH is the point of greatest thickness taken perpendicular to the ventricle (On-line Fig 1). The size for deep WMH is the diameter of the largest lesion.
Two investigators jointly reviewed 20 images for a baseline training set then independently read a test set of studies from 52 participants selected to have a range of WMH volumes based on results of our automated analysis. One reviewer reread the first set of images in a randomized order after 1 week. Review of the initial test set revealed disagreement regarding grading of lesions Ͻ3 mm in diameter. A clarification was incorporated that lesions Ͻ3 mm in diameter or thickness would not be counted as WMHs. A second group of 40 participants were then chosen randomly for review. One author reread these studies after several months by using a different computer and different lighting to assess the maximum intrarater variability. The 3-mm threshold was applied to the grading of all 559 images used for subsequent analysis.

Cognitive Assessment
The MoCA is a brief 30-point screening test of global cognitive function. The published cutoff for cognitive impairment is Յ25. 17 For our study an additional cutoff score of Յ19 points was also evaluated, which was one standard deviation below our observed population mean of 23 points. 23

Statistics
Calculations were made by using SAS software, version 9.2 (SAS Institute, Cary, North Carolina). Inter-and intrarater reliability was assessed by using a weighted coefficient reflecting the ascending order of our visual rating scale with values below 0.40 reflecting poor agreement, 0.40 to 0.75 reflecting fair to good agreement, and values above 0.75 reflecting excellent agreement. 24 Visual ratings by a neuroradiologist (K.S.K.) for 559 participants were correlated with automated WMH volumes (mL), age (years), and cognition (total MoCA score) by using the Spearman rank correlation. Automated WMH volumes were also correlated with MoCA score by using Spearman rank correlation. Pearson 2 and Mann-Whitney U tests were used to assess differ-ences in WMH visual scale and automated volumes (respectively) between normal and cognitively impaired groups by using MoCA cutoffs of Յ19 and Յ25. A multivariate linear regression was performed to measure linear association of visual grade with total MoCA score while controlling for age, sex, and education as covariates.

RESULTS
Among the 559 participants, the average age was 50.7 Ϯ 9.7 (mean Ϯ SD) with a range from 25 to 72 years; 52.5% were women. The ethnic distribution was 43% black non-Hispanic, 38.9% white non-Hispanic, 15.5% Hispanic, and 2.5% other. The distribution of WMH in our sample was nonparametric with median WMH volume of 0.90 mL (1st and 3rd quartile, Q1-Q3, of 0.58 -1.29 mL and range 0.20 -83.18 mL). Fifty-two participants in an initial test sample (mean age 56.3 Ϯ 10.9 years) enriched for larger automated WMH volume had a median WMH volume of 1.64 mL (Q1-Q3, 0.77-4.88 mL; range 0.20 -45.48 mL). Forty participants in a second visual reading test group (mean age 48.6 Ϯ 10.6) were randomly selected and had a lower median WMH volume of 0.93 mL (Q1-Q3, 0.57-1.24 mL; range 0.29 -2.01 mL). Scores for the 559 participants rated by a neuroradiologist with corresponding age and WMH volume are shown in Table 1.
Correlations between cognition and WMH were comparable across methods of WMH measurement (Spearman r ϭ 0.118 and 0.109; P values ϭ 0.008 and 0.014 for the visual rating scale and volumetric assessment, respectively). WMH load by visual and volumetric methods were compared between normal and cognitively impaired groups for both MoCA cutoff points in Table 2. The visual scale retained a significant linear association with MoCA score as the dependent variable in a multivariate linear regression controlling for age, sex, and education (standardized ␤ ϭ Ϫ0.087, P ϭ .042).

DISCUSSION
Our WMH visual rating system for MR imaging based on the single largest lesion size regardless of location showed good reproducibility and had significant association with cognitive performance, equivalent to that of automated WMH volumes. The association between the visual WMH rating and MoCA scores persisted after controlling for age, sex, and education.
A defining and controversial aspect of our rating system is that it does not distinguish between periventricular and deep WMH. This reflects the work of DeCarli et al, 13 which argued that, excluding thin periventricular caps and rims, evidence does not support categorizing WMH as deep or periventricular on MR imaging. We direct readers to their article for a comprehensive discussion. In brief, their study did not identify distinct populations with periventricular versus deep WMH. Rather, it showed WMH burden in both locations to be highly correlated with each other and with total WMH burden. Similarly, a prior pathologic study of WMH noted the distinction between deep and periventricular WMH was often "blurred," preventing a clear distinction. 25 Evidence supports the presence of a vascular zone at risk for ischemia fed by arterioles extending centripetally from the cortex into the deep white matter then on toward the ventricles without appreciable flow centrifugally out from the ventricles. [26][27][28] This at-risk vascular zone encompasses both advanced periventricular and deep WMH as described on MR imaging. 13  DeCarli et al 13 also demonstrated the inaccuracy of localizing WMH based solely on axial imaging, raising the possibility that many rating systems that purport to distinguish deep from periventricular WMH may not in fact achieve that result (On-line Fig 2). It is possible that this deficiency could account for 2 metaanalyses that were not able to verify the reproducibility of reported localized effects for deep versus periventricular WMH, though reproducible associations between total WMH and cognitive outcomes were shown. 29,30 We do not dispute that more complex grading systems with further localization will have utility for specialists and researchers. A comprehensive accounting of effects attributable to white matter disease by lesion location may need to identify which tracts and functional circuits are disrupted. [31][32][33] Our system is not meant to preclude more comprehensive analysis, however, and may in fact be useful by identifying which individuals deserve more detailed evaluation. The prospect that WMH may have localized effects also does not invalidate the utility of assessing global disease burden. WMHs are indicative of diffuse underlying microvascular disease and are associated with frontal hypometabolism and executive dysfunction regardless of their location. 34 During the initial evaluation of our scale it became apparent that intensity and size thresholds for WMH were also needed to obtain agreement between different raters. On 3T FLAIR images, focal, well-defined regions of WMH were often surrounded by less well-defined and less intense regions of increased signal (Online Fig 3). Diffusion tensor imaging demonstrates a penumbra of decreasing severity of derangement extending out from WMH lesions. 35 At higher field strength, more subtle disease becomes apparent on FLAIR imaging, 1 which may account for the intensity gradient we observed. We chose to include only more intense regions of white matter abnormality on our 3T scans to more closely correlate with prior WMH work conducted on lower field strength MR imaging. We set an internal reference that WMH lesions must be more intense than cortical gray matter, an inversion of the normal relationship. We also found that a 3-mm size threshold was necessary to reliably distinguish punctuate WMH lesions from background variations in white matter signal intensity. With implementation of these changes, our system demonstrated good to excellent reproducibility in assessing participants with either low or high prevalence for large volumes of WMH.
A potential weakness of our system is that a 4-point scale will not represent the WMH burden with the same fidelity as more volumetric scales, thereby limiting the strength of associations with outcomes. We did not see this in our study, but the makeup of our study population may have influenced this result. Our community-based sample did not include those in institutions such as nursing homes, which likely limited the degree of impairment we observed. It is also possible that a more extensive cognitive assessment would yield improved associations with WMH volumes compared with our scale. Our rating scale will also not likely reflect interval changes in WMH, 36 but these changes can be noted separately by the routine practice of reviewing prior studies alongside the new ones.
The intent of our grading system is to reduce the variability of white matter hyperintensity assessment in clinical readings.
Adoption of a standardized approach to assessing and reporting lesion severity is necessary to translate knowledge gained from WMH research into useful prognostic information. The results of this study indicate that our simple WMH grading system based on the single largest lesion size regardless of location has good reproducibility and a significant association with cognitive performance, equaling that of automated WMH volumes. Our work supports the validity of our scale in the baseline evaluation of WMH on MR imaging of the brain.

CONCLUSIONS
We propose a practical visual grading system for WMH based on largest lesion size regardless of location, which has good reproducibility and a significant association with cognitive function equivalent to that of automated WMH volumes. Application of this method using simple size criteria will add clarity to the clinical assessment and communication of WMH severity.