Reproducibility, Interrater Agreement, and Age-Related Changes of Fractional Anisotropy Measures at 3T in Healthy Subjects: Effect of the Applied b-Value

BACKGROUND AND PURPOSE: There is no reproducibility study of fractional anisotropy (FA) measurements at 3T using regions of interest (ROIs). Our purpose was to establish the extent and statistical significance of the interrater variability, the variability observed with 2 different b-values, and in 2 separate scanning sessions. MATERIALS AND METHODS: Twelve healthy volunteers underwent MR imaging twice. MR imaging was performed on a 3T unit, and FA maps were analyzed independently by 2 observers using ROIs positioned in the corpus callosum, internal capsules, corticospinal tracts, and right thalamus. Changes in FA values (×103) measured with 2 b-values (700 and 1000 s/mm2), age-related differences, interobserver agreement, and measurement reproducibility were assessed. RESULTS: In the right internal capsule genu (FA = 702/728; b = 1000/700 s/mm2) and the left anterior limb of the internal capsule (AIC; FA = 617/745; b = 1000/700 s/mm2), the FA values were significantly different between the 2 b-values (P = .02 and .05, respectively). Significant age-related differences in FA were observed in the genu of the corpus callosum and in the left AIC. Interrater measurements showed fair-to-moderate agreement for most anatomic structures. The lowest significant change for a single subject regarding any FA values between the 2 sessions was in the corpus callosum (4%), whereas the highest one was in the corticospinal tracts (27%). The Bland-Altman plot analysis showed that the 1000-s/mm2 b-value gave satisfactorily reproducible measurements equally good or better than the 700-s/mm2 b-value. CONCLUSION: The reproducibility of FA estimates using ROIs was satisfactory. Measurements with a b-value at 1000 s/mm2 showed superior reproducibility in most anatomic locations.

D iffusion tensor imaging (DTI) refers to a group of MR imaging techniques for investigating the water mobility (diffusion) and microstructure, as well as organization of different tissues in relation to directionality of diffusion. 1 In the case of the brain tissue, the water mobility is not equal in all directions (isotropic) but greater in one direction than another (anisotropic). This anisotropy is usually expressed by 2 metrics: trace and fractional anisotropy (FA). The trace represents the total measured diffusion, whereas the FA quantifies the degree to which a single diffusion orientation is dominant by measuring related coherence within a voxel. Initial evidence for use of this technique has been demonstrated in brain maturation and aging, in demyelinating disorders, in stroke, in brain tumors, and in brain trauma. [2][3][4][5][6][7][8][9] Because DTI is increasingly becoming included in standard clinical MR imaging protocols, the reliability of DTI measurements is an important concern that is related to the stability of the equipment parameters and examination conditions and, perhaps, even true variations in the measurements over time. Initial evidence showed more pronounced variability of FA across different scanners than across similar sequences on the same scanner. 10 Under the assumption of a stable examination environment, the next serious concern is the availability of normative data that are essential for the interpretation of pathologic findings. 8,11 Normative DTI data have been acquired for infants, children, and adolescents in a narrow, as well as in a wider, age interval. [12][13][14][15] The aforementioned studies are widely performed on 1.5T MR imaging scanners (whereas 3T systems are increasingly available) and use a voxelwise method for FA analysis. However, voxelwise analysis requires intersubject registration and smoothing, which may be a source of errors in the acquired FA values.
The reproducibility of the quantitative FA is essential to detect changes in the white matter of patients during the followup of longitudinal DTI studies. In this case, any changes associated with certain pathology or aging can be attributed to the evolving disease or aging and be separated from the usual values of deviation and variability that characterize any repeated measurements. 16,17 To our knowledge, the evidence of DTI reproducibility on 3T units in adults is limited and investigated a narrow range of age by using a voxel-to-voxel analysis. 18 Another important aspect of the reliability of the DTI quantitative method is the presence of satisfactory interrater and intrarater coefficients of variation that would guarantee the robustness of the method. However, the literature data in this issue are scarce and refer mostly to patients. 19,20 Finally, to the best of our knowledge, there is no study of the possible role of different b-values in the reliability of the conducted DTI measurements.
In the present study, we sought to address the aforemen-tioned lack of data in the literature by performing a study of the reproducibility of FA in healthy adults of a wide age range by using region of interest (ROI)-based analysis. Our null hypothesis was that the interrater variability, as well as the variability observed with 2 different b-values and in 2 scanning sessions, would not be statistically significant.

Subjects
Institutional review board approval was obtained for the study. Twelve healthy volunteers with no history of neurologic or psychiatric illness were enrolled into the study. All of the subjects underwent MR imaging twice, with the 2 scans obtained 2 weeks apart. The patient age, sex, and dexterity were noted. The mean age was 38.4 Ϯ 11.1 years (95% confidence interval: 31.3-45.5), 6 patients were male, and only 1 patient was left-handed.

MR Imaging
MR imaging was performed on a 3T unit (Intera; Philips, Eindhoven, the Netherlands) scanner by using a standard head coil. All of the subjects were closely monitored for head motion with the surveillance system of the scanner. The subject's head was sufficiently cushioned to avoid any involuntary movement, and no head motion was noticed. The acquisition of DTI data in the axial plane was the last series after a T1-weighted pulse sequence (TR, 400 ms; TE, 12 ms; flip angle, 75°; section thickness, 5 mm with 1 mm gap; FOV, 240 mm; matrix, 304 ϫ 244) in the sagittal plane and fluid-attenuated inversion recovery images (TR, 11 000 ms; TE, 105 ms; inversion time, 2800 ms; turbo spin factor, 21; section thickness, 5 mm with 1 mm gap; FOV, 230 mm; matrix, 320 ϫ 240) in the axial plane. The DTI sections were positioned parallel to the anterior/posterior commissure line, from the centrum semiovale to medulla oblongata with the following imaging parameters: spin-echo echo-planar single-shot sequence, with parallel acquisition (sensitivity encoding) factor of 2; 2 acquisitions (averages); TR, 5000 ms; TE, 90 ms; flip angle, 90°; section thickness, 3 mm with no gap; FOV, 230 mm; acquisition matrix of 112 ϫ 112 interpolated to 256 ϫ 256; b-value, 1000 s/mm; and 16 diffusion gradient orientations. DTI was then repeated with the b-value of 700 s/mm 2 . These same MR imaging parameters were used for the second scan performed 2 weeks later.

FA Measurements
Measurements were performed independently by 2 experienced observers (neuroradiologists for 8 and 25 years), who were blinded for the subject identity and scan number. The commercially available software provided on the scanner was used. ROIs were drawn on the FA images and correlated with the anatomic images. The and 0.20 cm 2 (in the corpus callosum and thalamus) and were identical between the 2 sessions to reduce the variability accounting for the validity of the results (Fig 1). The average FA value (ϫ10 3 ) and SD within an ROI were recorded.

Statistical Analysis
The means, SDs, and the 95% confidence intervals of the FA values in each anatomic structure for both b-values and scanning sessions were determined. The t test was applied to detect changes in FA values between hemispheres. Pearson correlation coefficients were applied to detect any age-or sex-related correlation of the FA values. P Ͻ .05 was considered to indicate a statistically significant difference. Interobserver agreement was assessed by using intraclass correlation coefficients. Measurement reproducibility was assessed by using the Bland-Altman method to define the agreement between replicate measurements. 21 The mean difference, SD of the differences, and 95% limits of agreement (ie, mean difference Ϫ2 ϫ SD and mean difference ϩ2 ϫ SD) were calculated for each FA value in all of the anatomic structures for both b-values. Measurement error and repeatability were assessed for each anatomic region. An estimate of the precision of the measurement was derived by calculating the within-subject SD, by using the formula dSD/͌2 (where dSD is the square root of the mean squared difference). Measurement error relative to the size of the FA values was estimated via the within-subject coefficient of variation, which was calculated by dividing the within-subject SD by the overall mean of the parameter studied and then expressed as a percentage. The repeatability coefficient, which represents the threshold value below which the absolute differences between 2 measurements on the same patient is expected to lie for 95% of the measurement pairs, was assessed by using the formula 1.96 ϫ dSD, and the 95% confidence limits for spontaneous change for an individual patient were calculated by using the formula (1.96 ϫ dSD)/͌n, where n ϭ 1, and were expressed as a percentage of the mean parameter value. Finally, the t test was applied to evaluate changes in FA values in each anatomic region between the 2 scans.

Age-Related and Sex-Related Changes in FA
DTI and calculation of the FA were feasible in all of the patients, and the FA maps were consistent with the known anatomy. The following descriptive statistics and the calculation of any statistically significant difference between the FA values derived by the 2 b factors, as well as between the FA values obtained in each hemisphere, refer to the FA values calculated by the first observer after the first measurement session. The mean values of the FA in all of the anatomic structures, including SD and 95% confidence intervals, for both b-values are shown in Table 1. Apart from the genu of the right internal capsule (mean FA ϭ 702/728; b ϭ 1000/700 s/mm 2 ) and the anterior limb of the left internal capsule (mean FA ϭ 617/745; b ϭ 1000/700 s/mm 2 ), where statistically significant difference was found between the 2 b-values (P ϭ .02 and P ϭ .05, respectively), there was no statistically significant difference of the FA values derived by the 2 b-values. Regarding any differences between the 2 hemispheres, no statistically significant difference was found in any bilateral anatomic structure apart from the corticospinal tract (FA ϭ 636/653 [right/left] for b-value at 1000 s/mm 2 , P ϭ .02; FA ϭ 587/616 [right/left] for b-value at 700 s/mm 2 , P ϭ .04). There was also no sex-related difference in our patient population (P ϭ .7). Significant negative age-related differences in FA were observed only in the genu of the corpus callosum (r ϭ Ϫ0.6; P ϭ .04) and in the left anterior limb of the internal capsule (r ϭ Ϫ0.78; P ϭ .004).

Interrater Agreement
Intraclass correlation coefficients (Table 2) showed fair agreement (.01 Յ P Ͻ .05) between the 2 raters for the genu of the left internal capsule, the posterior limb of the right internal capsule, and both corticospinal tracts on the basis of 1000 s/mm 2 . This fair agreement was transformed to moderate agreement (P ϭ .01) for the corticospinal tracts by using a b-value of 700 s/mm 2 . Moderate agreement (P ϭ .012) between the 2-raters (b-value ϭ 1000 s/mm 2 ) was observed in the genu of the right internal capsule, as well as in the anterior and posterior limbs of the left internal capsule. However, the application of a b-value of 700 s/mm 2 deteriorated the intraclass correlation coefficients in the genu of the right internal capsule, as well as in the anterior limb of the left internal capsule (.02 Յ P Ͻ .05). Anatomic structures where the FA values between the 2 raters were in good agreement (P ϭ .001) were the anterior limb of the right internal capsule, the right thalamus, and the genu of the corpus callosum measured on the basis of 1000-s/mm 2 b-value. Interestingly, the 700-s/mm 2 bvalue showed in these ROI locations, except of the genu of the corpus callosum, nearly identical intraclass correlation coefficients (P ϭ .001). Only the FA values in the splenium of the corpus callosum showed very good agreement (P ϭ .0012) with both b-values. Table 1 shows between-scan FA values for both b-values. Statistically significant differences were found in the right corticospinal tract (FA ϭ 651/570 1.session/2.session; P ϭ .048) and right thalamus (FA ϭ 349/316 1.session/2.session; P ϭ .024) only for a b-value of 1000 s/mm 2 . ROI analysis applying a b-value of 700 s/mm 2 did not demonstrate any between-scan statistically significant difference. Table 3 shows the precision parameters of the FA measurement method, as well as the reproducibility between the 2 scanning sessions in terms of  within-subject SD, within-subject coefficient of variation (percent), repeatability coefficient, and significant change for a single subject (percent). Satisfactory precision parameters were found for all of the anatomic structures by using the 1000-s/mm 2 b-value. Notably, the corticospinal tracts and the left internal capsule showed the larger within-subject coefficient of variation (for b-value ϭ 1000 s/mm 2 , 8% for the right corticospinal tract, 7% for the left corticospinal tract, and 3%-6% for the examined parts of the left internal capsule). In other terms, by performing repeated measurements of the FA in the right corticospinal tract among various subjects, we may find approximately 7% intrasubject variability; this parameter is of major concern when determining the precision and repeatability of data by using a certain measurement method. In all of the anatomic structures, apart from the anterior and posterior limb of the internal capsule, a b-value of 1000 s/mm 2 demonstrated smaller within-subject SD and, thus, smaller within-subject coefficient of variation compared with a b-value of 700 s/mm 2 . Finally, no measurement showed a within-subject coefficient of variation greater than 50%, which indicates an unacceptable measurement error. The repeatability coefficient was markedly higher in the genu and anterior limb of the internal capsules as well as in the corticospinal tracts while it was very low in the corpus callosum. This means that subtle changes in the FA values of the corpus callosum (more than 6 -7 s/mm 2 ) may indicate disease-or therapy-induced changes. That would be the case of a FA value change more than 16 -17 s/mm 2 in the right corticospinal tract. In a similar manner, the lowest significant change (percent) of the FA values for a single subject was in the corpus callosum (4%-7%), whereas the highest one was observed in the corticospinal tracts (Ն20%). The Bland-Altman plot analysis and the 95% limits of agreement for the FA values obtained by both b-values in each anatomic location revealed, similar to the coefficients of variation and repeatability coefficients, a slight superiority of 700-s/mm 2 b-value in the reproducibility of the measurements in internal capsule, especially on the left side (Table 3).

Reproducibility of FA Measurements between 2 Scanning Sessions
In the anterior limb of the left internal capsule, the mean differences (SD; 95% limits of agreement) were Ϫ4.1 (59.45; Ϫ199.0 to 190.8) and Ϫ20.7 (43.58; 106.2 to 64.0) for b-values of 1000 and 700 s/mm 2 , respectively. The 95% limits of agreement are obviously narrower for the b-value at 700 s/mm 2 , and, thus, the reproducibility is supposed to be better. Otherwise, the 1000-s/mm 2 b-value gave satisfactorily reproducible measurements equally good or better than the 700 b-value.

Discussion
We investigated FA values in various white matter tracts, hemispheric changes, and age-related changes based on a ROI analysis by using a 3T system. The above results, as well as the interrater agreement and the reproducibility of the measurements, were assessed with 2 different b-values so as to detect any dependency of the FA on the applied b-value. The mean values of FA in our subjects were higher in the ROIs located in the internal capsule and almost identical in the genu of the corpus callosum and in the corticospinal tract with those obtained by Bonekamp et al 16 (FA values in this study were 514 -529 in the anterior limb of the internal capsule and equal to 751-755 and 626 -646 in the genu of the corpus callosum and in the corticospinal tract, respectively) with ROI analysis in children and adolescents examined at 1.5T. A higher FA value in the splenium of the corpus callosum than in the genu was found in all of the subjects, and the values, as well as this pattern, are in perfect agreement with those found in the literature 22 and in a recent study conducted at 3T. 23 The higher values in the splenium may also be explained on technical grounds. FA is a measure of the degree to which the 3D probability attenuation function of water diffusion within a single voxel is oblong (FA closer to 1) or spherical (FA closer to 0). Even under normal circumstances, the measured FA in the internal capsule, as well as in the corticospinal tract, is expected to decrease where local tract curvature is high relative to voxel size. Moreover, it may be helpful to place the findings in the context of appearance of the ROIs on color-coded directional diffusion maps to determine the directional contributions from the main eigenvector at each voxel; this may shed some light on ROIs in which decreases in FA may be related to multidirectionality or crossing of fiber tracts within individual voxels in the ROIs. The significantly different mean FA values between the DTI acquisitions at a b-value of 700 s/mm 2 and those at a b-value of 1000 s/mm 2 in the genu of right internal capsule and anterior limb of left internal capsule are to be expected, because the signal intensity-to-noise ratio (SNR), but not the contrast-to-noise ratio, has been shown to decrease with increasing b-value, 24 and decreases in SNR have, in turn, been associated with upward bias in FA values. 25 A further reason for the limited impact of the b-value in the internal capsule may be the relatively long TE in our study, which makes the DTI prone to the field strength (B0) inhomogeneities and magnetic susceptibility gradients. Regarding the acquired FA values in the corpus callosum, our values are considerably higher compared with studies conducted in 1.5T. 10 In our opinion, this may be also an effect of the higher SNR, which is known to be proportional to B0, and has been described in initial reports of DTI at 3T. 26 A side-dependent statistically significant difference in our study was demonstrated in the FA values between the corticospinal tracts. The asymmetry in the corticospinal tract is a finding that is often controversial between the authors. Our finding is in agreement with results of a recent study (where the asymmetry is more pronounced in patients with multiple sclerosis), 27 but it is in discrepancy with the results of previous studies in pediatric and adult populations. 16,17 For the first time in the literature, it was demonstrated that b-values may have an effect on the calculated FA in the region of the internal capsule.
In the second part of our study, we assessed the interrater agreement for the obtained FA measurements on the basis of 2 b-values. Unlike the previous studies (conducted at 1.5T), which found satisfactory to very good interobserver precision of FA measures, 16,17,20 our results implicate fair-to-moderate interrater agreement for the most anatomic locations by using a b-value of 1000 s/mm 2 . When applying a b-value of 700 s/mm 2 , the interrater agreement seems to be substantially improved only for the left corticospinal tract and deteriorates for the internal capsule on both sides. On the other hand, a good to very good agreement was found for the corpus callosum and the anterior limb of the right internal capsule. Therefore, as pointed out by other researchers, 20 the interobserver variability, which may be 3 times higher than the intraobserver variability, remains a serious problem in the interpretation of the FA maps. A voxelwise analysis, compared with a ROI-based analysis, may seem attractive to eliminate operator-dependent ROI misplacement, partial volume averaging, and interobserver or intraobserver variability, but the required smoothing procedure may mix the values of spatially nearby voxels and results in spatial noise suppression. Notably, the issues concerning spatial smoothing are still evolving, 28 and the voxelbased methods can be corrected for multiple comparisons.
The reproducibility of the FA measurements in our study is in essential agreement with initial work, acquired at 1.5T, in children and adolescents, 16 where the highest coefficients were observed in the cerebral peduncle, as well as in the anterior and posterior limb of the internal capsule. The marked coefficient of variation in the internal capsule, as well as in the right thalamus, found in our study, may be attributed to the high level of iron in the basal ganglia (with pronounced susceptibility artifacts at 3T) and the possible ROI misregistration in the internal capsule. Interestingly, a b-value of 1000 s/mm 2 seemed to improve the reproducibility of the FA measurements by reducing the coefficients of variation. The FA measurements obtained in the right internal capsule showed better withinsubject coefficient of variation compared with the repeatability coefficients and the significant change for a single subject where the reproducibility was equal between both sides. This discrepancy may be attributed to the dependence of the coefficients of variation on the magnitude of the measured value. Our statistical method resembles the one used by Lau and Goodyear 23 to define the minimum detectable change in FA between 2 sessions. In contrast to that study, we found the lowest detectable changes, which may indicate pathologic changes, in the corpus callosum, followed by the internal capsule. The higher significant changes for a single subject were observed in the corticospinal tract on both sides. This implies higher thresholds of FA values in detecting any pathologic value or therapeutic effect in these regions.
Finally, in our attempt to detect significant age-related differences in FA, we observed a significant negative correlation only in the genu of the corpus callosum and in the left anterior limb of the internal capsule. A statistically insignificant increase in FA with age in the other parts of internal capsule follows the remarks of Schmithorst et al, 15 who reported an increase in FA with age for 4 fiber tracts (internal capsule, corticospinal tract, left arcuate fasciculus, and right inferior longitudinal fasciculus). Our finding adds evidence to previous observations in children and adolescents, in which there were age-related FA changes in the splenium of the corpus callosum, as well as in the corticospinal tract and the internal capsule. 16 Our results are consistent with recent findings for the corpus callosum, 29,30 as well as with those of other authors who reported a decrease in the FA and a concomitant increase in the apparent diffusion coefficient as adults age. 31,32 This may be attributed to a gradual demyelination, loss of neural attenuation, changes in water content, or changes in the organization of nerve fibers.
A limitation of the present study is the small number of examined subjects. The examination of more healthy subjects may allow for the detection of more significant correlations between FA values and age, as well as a better estimation of the method reproducibility. Second, we did not perform any study in patients to investigate any difference in the reproducibility pattern (pathologic structures have generally higher interrater agreement); thus, our results may not apply directly to subjects with white matter disease. Third, we used a limited number of directions to determine the FA values; FA calcula-tions with more directions may have eliminated some FA errors. The results of the present study indicate the need for awareness of various extrinsic (motion, artificial noising, poor registration across sections, and imaging parameters) and intrinsic (MR scanner and field homogeneity, as well as coil and gradient features) factors, which inevitably influence the reproducibility and precision of the obtained FA values, thus affecting the reliability of DTI and restricting the wide use of normative data in the design of disease studies. 33 In conclusion, our results show a satisfactory reproducibility of the FA in all of the examined white matter tracts. A b-value of 1000 s/mm 2 showed superior reproducibility of measurements in most anatomic locations. However, the interrater agreement proved mostly to be fair to moderate. Thus, this study highlights an important problem with reliance on a single DTI metric, such as FA. An age-related change of the FA values was observed in the genu of the corpus callosum and in the left anterior limb of the internal capsule. Our results describe the sample distribution and the reproducibility of the FA under nonpathologic conditions and may serve as a baseline for the longitudinal studies in patients.