Diffusion MRI Microstructural Abnormalities at Term-Equivalent Age Are Associated with Neurodevelopmental Outcomes at 3 Years of Age in Very Preterm Infants

BACKGROUND AND PURPOSE: Microstructural white matter abnormalities on DTI using Tract-Based Spatial Statistics at term-equivalent age are associated with cognitive and motor outcomes at 2 years of age or younger. However, neurodevelopmental tests administered at such early time points are insufficiently predictive of mild-moderate motor and cognitive impairment at school age. Our objective was to evaluate the microstructural antecedents of cognitive and motor outcomes at 3 years’ corrected age in a cohort of very preterm infants. MATERIALS AND METHODS: We prospectively recruited 101 very preterm infants (,32weeks’ gestational age) and performed DTI at term-equivalent age. The Differential Ability Scales, 2nd ed, Verbal and Nonverbal subtests, and the Bayley Scales of Infant and Toddler Development, 3rd ed, Motor subtest, were administered at 3 years of age. We correlated DTI metrics from Tract-Based Spatial Statistics with the Bayley Scales of Infant and Toddler Development, 3rd ed, and the Differential Ability Scales, 2nd ed, scores with correction for multiple comparisons. RESULTS: Of the 101 subjects, 84 had high-quality DTI data, and of these, 69 returned for developmental testing (82%). Their mean (SD) gestational age was 28.4 (2.5) weeks, and birth weight was 1121.4 (394.1) g. DTI metrics were significantly associated with Nonverbal Ability in the corpus callosum, posterior thalamic radiations, fornix, and inferior longitudinal fasciculus and with Motor scores in the corpus callosum, internal and external capsules, posterior thalamic radiations, superior and inferior longitudinal fasciculi, cerebral peduncles, and corticospinal tracts. CONCLUSIONS: We identified widespread microstructural white matter abnormalities in very preterm infants at term that were significantly associated with cognitive and motor development at 3 years’ corrected age. ABBREVIATIONS: Bayley-III 1⁄4 Bayley Scales of Infant and Toddler Development, 3rd ed; CA 1⁄4 corrected age; DAS-II 1⁄4 Differential Ability Scales, 2nd ed; FA 1⁄4 fractional anisotropy; IFOF 1⁄4 inferior fronto-occipital fasciculi; ILF 1⁄4 inferior longitudinal fasciculus; MD 1⁄4 mean diffusivity; PTR 1⁄4 posterior thalamic radiations Premature birth is associated with a significantly increased risk of brain abnormalities and long-term neurodevelopmental impairment. Injuries or maturational delays affecting theWM are observed in 50%–80% of very preterm infants. These abnormalities are associated with serious neurodevelopmental impairment. However, such abnormalities are challenging to detect using conventional MR imaging techniques alone. Fortunately, DTI, a specialized form of MR imaging that can sensitively query the brain’s microstructure, offers a novel approach for identifying these WM injuries. In preterm brains, the evolution of fractional anisotropy (FA) and mean diffusivity (MD), 2 metrics derived from DTI, varies from that of normative populations, and underlying brain injury may lead to neurodevelopmental impairment later in life. Functional MR imaging with the FMRIB Tract-Based Spatial Statistics (TBSS; http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/ TBSS) tool uses observer-independent voxelwise statistical analysis to process the complex information contained within Received October 16, 2020; accepted after revision February 18, 2021. From the Perinatal Institute (M.N.P., J.K., L.H., N.A.P.), Imaging Research Center (M.C., A.B., W.Y.), and Center for ADHD (L.T.), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio; Department of Electronic Engineering and Computer Science (M.C.), College of Engineering and Applied Science, University of Cincinnati, Cincinnati, Ohio; Center for Perinatal Research (K.M., J.W.L.), The Research Institute at Nationwide Children’s Hospital, Columbus, Ohio; Department of Psychology (K.O.Y.), Alberta Children’s Hospital Research Institute and Hotchkiss Brain Institute, University of Calgary, Alberta, Canada; and Departments of Radiology (W.Y.) and Pediatrics (L.T., L.H., N.A.P.), University of Cincinnati College of Medicine, Cincinnati, Ohio. This work was supported by National Institutes of Health grants R01NS094200 (to N.A.P.), R01 NS096037 (to N.A.P.), and R21HD094085 and a Trustee Grant from the Cincinnati Children’s Hospital Medical Center (to L.H.). Please address correspondence to Nehal A. Parikh, DO, MS, Cincinnati Children’s Hospital, 3333 Burnet Ave, MLC 7009, Cincinnati, OH 45229; e-mail: Nehal.Parikh@cchmc.org Indicates open access to non-subscribers at www.ajnr.org Indicates article with online supplemental data. http://dx.doi.org/10.3174/ajnr.A7135 AJNR Am J Neuroradiol 42:1535–42 Aug 2021 www.ajnr.org 1535 diffusion-weighted images. TBSS can be used to identify specific WM tracts and structures in the infant brain that correlate with later developmental outcomes. Previous studies have used TBSS to objectively assess WM microstructure following clinical events such as infection, sports injury, or preterm brain injury (eg, intraventricular hemorrhage) and to relate the associated WM alterations to outcomes. In addition, studies have used TBSS to identify brain regions and tracts in which FA significantly correlates with cognitive and motor outcomes at 2 years of age or younger. These studies have consistently concluded that higher FA is associated with better motor, cognitive, and language functioning. Past studies emphasizing the value of TBSS correlated DTI parameters with neurodevelopmental outcomes derived from the Bayley Scales of Infant and Toddler Development, 3rd ed (Bayley-III) collected at 2 years of age or younger. Such standardized assessments are administered between 18 and 24months of age, representing the earliest time point at which cognitive, language, and motor development can be reliably ascertained. However, assessment at these earliest ages is not necessarily predictive of school age outcomes. For example, the Bayley-III Motor subscale at 2 years of age significantly underestimates rates of motor impairment at 4 years of age in preterm infants. Spencer-Smith et al showed that cognitive delay, as assessed by the Bayley-III administered at 2 years of age, was not strongly associated with cognitive impairment at 4 years of age as assessed by the Differential Ability Scales, 2nd ed (DAS-II). We propose that correlating FA from term-equivalent age MR imaging with 3-year outcomes may provide a more robust understanding of the early changes in WM microstructure that are also significantly associated with cognitive development. Our objective was to test the hypothesis that WM microstructure, assessed using TBSS at term-corrected age (CA), is associated with neurodevelopmental performance at 3 years’ CA in a regional cohort of very preterm infants. MATERIALS AND METHODS Participants All infants born at 31weeks’ gestational age or earlier between November 2014 and March 2016 who were cared for in 1 of 4 level III neonatal intensive care units in the Columbus, Ohio, region were eligible for inclusion, with a few exceptions. Infants with congenital or chromosomal anomalies that affected their central nervous system and infants who remained hospitalized at 44 weeks’ postmenstrual age unless cared for at the Nationwide Children’s Hospital, the sole site of imaging, were excluded (n1⁄4 7). We also excluded infants with severe ventriculomegaly, because it can interfere with proper MR imaging registration (n1⁄4 9). We prospectively enrolled 101 infants from the Nationwide Children’s Hospital, Ohio State University Medical Center, Riverside Hospital, and Mount Carmel St. Ann’s Hospital, which together care for approximately 80% of all infants born very preterm in the Columbus, Ohio, region. Data were collected between January 2015 and July 2018. The institutional review board of the Nationwide Children’s Hospital approved the study. Written informed consent was obtained from a parent or guardian of every study infant. MR Imaging Acquisition For all infants, structural MR imaging was performed at the Nationwide Children’s Hospital using a 3T Magnetom Skyra MR imaging scanner (Siemens) and a 32-channel phased array head coil. All scans were completed between 39 and 44 weeks’ postmenstrual age. Inpatients were transported to MR imaging by a skilled neonatal nurse and neonatologist. Infant heart rate and oxygen saturation were monitored during every scan. All imaging was performed during natural sleep and without sedation by feeding infants immediately before the scan, providing hearing protection, and using an immobilization device. All infants were imaged safely without any adverse events using the following parameters– axial T2-weighted: TE 1⁄4 147ms, TR 1⁄4 9500ms, flip angle 1⁄4 150°, resolution1⁄4 0.93 0.93 1.0mm, scan time1⁄4 4 minutes 9 seconds. Single-shot echo-planar diffusion MR imaging with 64 noncollinear gradient directions was performed with parameters that varied among subjects. To mitigate the variability in the FA/MDmaps caused by unwanted variability arising from the different b-shells and TE/TR values across subjects, we used an advanced harmonization method using ComBat, a batch-effect correction tool with an open-source Matlab (MathWorks) implementation. Each subject was assigned a “batch” value based on having the same bshell (b1⁄4 800 or b1⁄4 2000) and highly similar TE/TR parameters (maximum difference of 63/1600 for TE/TR parameters within the same batch). We used 3 batches: batch 1 (TR1⁄4 7500ms, TE1⁄4 77ms, flip angle1⁄4 90°, resolution1⁄4 1.97 1.97 2.00mm, bvalue1⁄4 800 s/mm); batch 2 (mean TR1⁄4 9553ms, mean TE1⁄4 89ms, flip angle1⁄4 90°, resolution1⁄4 2.23 2.23 2.00mm, bvalue1⁄4 2000 s/mm); and batch 3 (mean TR1⁄4 3996ms, TE1⁄4 104ms, flip angle1⁄4 80°, resolution1⁄4 2 2 2mm, b-value1⁄4 2000 s/mm). Baseline Clinical and MR Imaging Variables As previously described, all MR imaging scans were read by pediatric neuroradiologists qualitatively for the degree of brain injury/maturation and objective quantitative biometric measurements, using a standardized scoring system per Kidokoro et al. This approached yielded a global brain abnormality score, which was categorized as normal (total score, 0–3), mild (total score, 4–7), moderate (total score, 8–11), or severe abnormality (total score, ≥12). All readings were unblinded to clinical history but blinded to outcomes. Severe bronchopulmonary dysplasia was defined using the National Institutes of Health definition: the need for.30% effective fractional inspiratory oxygen concentration via nasal cannula or the need for any positive-pressure support at 36weeks’ postmenstrual age. We defined severe retinopathy of prematurity as any of the following: stage 3 retinopathy of prematurity (international classification), any stage retinopathy of prematurity with plus disease (presence of dilated and tortuous posterior pole vessels), or any prethreshold retinopathy of prematurity requiring treatment with laser ablation or bevacizumab intraocular injections. Data Preprocessing and Harmonization For each subject, eddy current distortions were corrected by registering the diffusion-weighted images to the B0 image, and skulls were removed using the FSL Brain Extraction Tool (http:// 1536 Parikh Aug 2021 www.ajnr.org fsl.fmrib.ox.ac.uk/fsl/fslwiki/BET). For all subjects before ComBat, we first nonlinearly warped their FA and MDmaps to a common neonatal template image using Advanced Normalization Tools (ANTs; http://stnava.github.io/ANTs/). This step was necessary because ComBat performs batch harmonization by fitting a model at the voxel level. After ComBat harmonization of the FA/MD plots between batches was complete, we verified that the FA/MD histograms between batches were aligned and that the skew of Bland-Altman plots of the FA/MD values across batches had been significantly reduced, similar to Fortin et al. Neurodevelopmental Outcomes At 3 years’ CA, the DAS-II was administered by a psychometrician blinded to the child’s history and imaging results. The administered DAS-II subtests included Verbal Comprehension, Naming Vocabulary, Picture Similarities, and Pattern Construction. Verbal Comprehension and Naming Vocabulary are combined to derive the Verbal Ability score, which evaluates verbal concepts and knowledge. Picture Similarities and Pattern Construction make up the Nonverbal Ability score, which estimates nonverbal inductive reasoning. The DAS-II yields age-standardized scores (mean=100 [SD , 15]), with higher scores indicating better cognitive functioning. The psychometric properties of the DAS-II are strong in the 3to 3.5-year age range, with high internal reliability coefficients and test-retest reliability. In terms of validity, DAS-II scores are highly consistent with other standardized cognitive measures. The Bayley-III is also a widely used and validated test that yields age-standardized scores (mean1⁄4100; [SD , 15]), with higher scores indicating better motor functioning. Tract-Based Spatial Statistics Voxelwise statistical analyses of the FA data were performed using TBSS. TBSS projects all subjects’ FA data onto a mean FA tract skeleton, before applying voxelwise cross-subject statistics. Each subject’s brain was aligned to a neonatal-optimized studyspecific template created from all 84 study subjects. The FA template was created using the multivariate template construction 2 protocol in ANTs, in which symmetric group normalization is used to create a group template and deform each image to that template, updating the group template after each deformation. We repeated this procedure until all registrations to the group template yielded minimally deformed transformations. Next, we produced a mean FA skeleton representing the centers of all tracts common to the group. Each subject’s aligned FA data were then projected onto this skeleton with a threshold of 0.15, and the resulting data were fed into voxelwise cross-subject statistics, which determined the correlation between FA and continuous scores from the Bayley-III and DAS-II, adjusted for postmenstrual age at MR imaging and sex. This was done using the FSL Randomize tool (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/ Randomize) with the threshold-free cluster enhancement option with 5000 permutations, followed by correction for multiple comparisons. Significant voxels were defined via a P value of,.05 following this correction. The same analysis was performed for MD. Statistical Analysis We compared the baseline characteristics of infants with and without 3 years’ CA data using independent samples t tests, Mann-Whitney U tests, or Fisher exact tests, as appropriate. Following TBSS, we verified the associations shown using linear regression. For each image, we extracted the mean FA/MD for all voxels that were significantly associated with each developmental score. A linear regression was then used to examine the relationship between the mean FA/MD values from all these significant voxels combined and the associated outcome score. All analyses were also corrected for multiple comparisons. We used the traditional 2-sided P value, .05 to indicate statistical significance. All analyses were performed using STATA 16.0 (StataCorp).

P remature birth is associated with a significantly increased risk of brain abnormalities and long-term neurodevelopmental impairment. Injuries or maturational delays affecting the WM are observed in 50%-80% of very preterm infants. [1][2][3] These abnormalities are associated with serious neurodevelopmental impairment. 1,3,4 However, such abnormalities are challenging to detect using conventional MR imaging techniques alone. Fortunately, DTI, a specialized form of MR imaging that can sensitively query the brain's microstructure, offers a novel approach for identifying these WM injuries. In preterm brains, the evolution of fractional anisotropy (FA) and mean diffusivity (MD), 2 metrics derived from DTI, varies from that of normative populations, and underlying brain injury may lead to neurodevelopmental impairment later in life. [4][5][6] Functional MR imaging with the FMRIB Tract-Based Spatial Statistics (TBSS; http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/ TBSS) tool uses observer-independent voxelwise statistical analysis to process the complex information contained within diffusion-weighted images. [4][5][6][7][8][9][10][11][12][13][14][15][16] TBSS can be used to identify specific WM tracts and structures in the infant brain that correlate with later developmental outcomes. 8,10,[13][14][15] Previous studies have used TBSS to objectively assess WM microstructure following clinical events such as infection, sports injury, or preterm brain injury (eg, intraventricular hemorrhage) and to relate the associated WM alterations to outcomes. 9,11,[16][17][18][19] In addition, studies have used TBSS to identify brain regions and tracts in which FA significantly correlates with cognitive and motor outcomes at 2 years of age or younger. 10,13,15 These studies have consistently concluded that higher FA is associated with better motor, cognitive, and language functioning.
Past studies emphasizing the value of TBSS correlated DTI parameters with neurodevelopmental outcomes derived from the Bayley Scales of Infant and Toddler Development, 3rd ed (Bayley-III) collected at 2 years of age or younger. Such standardized assessments are administered between 18 and 24 months of age, representing the earliest time point at which cognitive, language, and motor development can be reliably ascertained. However, assessment at these earliest ages is not necessarily predictive of school age outcomes. [20][21][22] For example, the Bayley-III Motor subscale at 2 years of age significantly underestimates rates of motor impairment at 4 years of age in preterm infants. 22 Spencer-Smith et al 23 showed that cognitive delay, as assessed by the Bayley-III administered at 2 years of age, was not strongly associated with cognitive impairment at 4 years of age as assessed by the Differential Ability Scales, 2nd ed (DAS-II). 24 We propose that correlating FA from term-equivalent age MR imaging with 3-year outcomes may provide a more robust understanding of the early changes in WM microstructure that are also significantly associated with cognitive development.
Our objective was to test the hypothesis that WM microstructure, assessed using TBSS at term-corrected age (CA), is associated with neurodevelopmental performance at 3 years' CA in a regional cohort of very preterm infants.

Participants
All infants born at 31 weeks' gestational age or earlier between November 2014 and March 2016 who were cared for in 1 of 4 level III neonatal intensive care units in the Columbus, Ohio, region were eligible for inclusion, with a few exceptions. Infants with congenital or chromosomal anomalies that affected their central nervous system and infants who remained hospitalized at 44 weeks' postmenstrual age unless cared for at the Nationwide Children's Hospital, the sole site of imaging, were excluded (n ¼ 7). We also excluded infants with severe ventriculomegaly, because it can interfere with proper MR imaging registration (n ¼ 9). We prospectively enrolled 101 infants from the Nationwide Children's Hospital, Ohio State University Medical Center, Riverside Hospital, and Mount Carmel St. Ann's Hospital, which together care for approximately 80% of all infants born very preterm in the Columbus, Ohio, region. Data were collected between January 2015 and July 2018. The institutional review board of the Nationwide Children's Hospital approved the study. Written informed consent was obtained from a parent or guardian of every study infant.

MR Imaging Acquisition
For all infants, structural MR imaging was performed at the Nationwide Children's Hospital using a 3T Magnetom Skyra MR imaging scanner (Siemens) and a 32-channel phased array head coil. All scans were completed between 39 and 44 weeks' postmenstrual age. Inpatients were transported to MR imaging by a skilled neonatal nurse and neonatologist. Infant heart rate and oxygen saturation were monitored during every scan. All imaging was performed during natural sleep and without sedation by feeding infants immediately before the scan, providing hearing protection, and using an immobilization device. All infants were imaged safely without any adverse events using the following parameters-axial T2-weighted: TE ¼ 147 ms, TR ¼ 9500 ms, flip angle ¼ 150°, resolution ¼ 0.93 Â 0.93 Â 1.0 mm 3 , scan time ¼ 4 minutes 9 seconds.

Baseline Clinical and MR Imaging Variables
As previously described, 27 all MR imaging scans were read by pediatric neuroradiologists qualitatively for the degree of brain injury/maturation and objective quantitative biometric measurements, using a standardized scoring system per Kidokoro et al. 28 This approached yielded a global brain abnormality score, which was categorized as normal (total score, 0-3), mild (total score, 4-7), moderate (total score, [8][9][10][11], or severe abnormality (total score, ≥12). All readings were unblinded to clinical history but blinded to outcomes.
Severe bronchopulmonary dysplasia was defined using the National Institutes of Health definition: 29 the need for .30% effective fractional inspiratory oxygen concentration via nasal cannula or the need for any positive-pressure support at 36 weeks' postmenstrual age. We defined severe retinopathy of prematurity as any of the following: stage 3 retinopathy of prematurity (international classification 30 ), any stage retinopathy of prematurity with plus disease (presence of dilated and tortuous posterior pole vessels), or any prethreshold retinopathy of prematurity requiring treatment with laser ablation or bevacizumab intraocular injections.

Data Preprocessing and Harmonization
For each subject, eddy current distortions were corrected by registering the diffusion-weighted images to the B0 image, and skulls were removed using the FSL Brain Extraction Tool (http:// fsl.fmrib.ox.ac.uk/fsl/fslwiki/BET). 31,32 For all subjects before ComBat, we first nonlinearly warped their FA and MD maps to a common neonatal template image using Advanced Normalization Tools (ANTs; http://stnava.github.io/ANTs/). 33 This step was necessary because ComBat performs batch harmonization by fitting a model at the voxel level. After ComBat harmonization of the FA/MD plots between batches was complete, we verified that the FA/MD histograms between batches were aligned and that the skew of Bland-Altman plots of the FA/MD values across batches had been significantly reduced, similar to Fortin et al. 25

Neurodevelopmental Outcomes
At 3 years' CA, the DAS-II was administered by a psychometrician blinded to the child's history and imaging results. The administered DAS-II subtests included Verbal Comprehension, Naming Vocabulary, Picture Similarities, and Pattern Construction. Verbal Comprehension and Naming Vocabulary are combined to derive the Verbal Ability score, which evaluates verbal concepts and knowledge. Picture Similarities and Pattern Construction make up the Nonverbal Ability score, which estimates nonverbal inductive reasoning. The DAS-II yields age-standardized scores (mean =100 [SD , 15]), with higher scores indicating better cognitive functioning. 24 The psychometric properties of the DAS-II are strong in the 3-to 3.5-year age range, with high internal reliability coefficients and test-retest reliability. 34 In terms of validity, DAS-II scores are highly consistent with other standardized cognitive measures.
The Bayley-III is also a widely used and validated test that yields age-standardized scores (mean ¼100; [SD , 15]), with higher scores indicating better motor functioning. 35

Tract-Based Spatial Statistics
Voxelwise statistical analyses of the FA data were performed using TBSS. TBSS projects all subjects' FA data onto a mean FA tract skeleton, before applying voxelwise cross-subject statistics. Each subject's brain was aligned to a neonatal-optimized studyspecific template created from all 84 study subjects. 12 The FA template was created using the multivariate template construction 2 protocol in ANTs, in which symmetric group normalization is used to create a group template and deform each image to that template, updating the group template after each deformation. 33 We repeated this procedure until all registrations to the group template yielded minimally deformed transformations. 36 Next, we produced a mean FA skeleton representing the centers of all tracts common to the group. Each subject's aligned FA data were then projected onto this skeleton with a threshold of 0.15, and the resulting data were fed into voxelwise cross-subject statistics, which determined the correlation between FA and continuous scores from the Bayley-III and DAS-II, adjusted for postmenstrual age at MR imaging and sex. This was done using the FSL Randomize tool (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/ Randomize) with the threshold-free cluster enhancement option with 5000 permutations, followed by correction for multiple comparisons. Significant voxels were defined via a P value of ,.05 following this correction. The same analysis was performed for MD.

Statistical Analysis
We compared the baseline characteristics of infants with and without 3 years' CA data using independent samples t tests, Mann-Whitney U tests, or Fisher exact tests, as appropriate. Following TBSS, we verified the associations shown using linear regression. For each image, we extracted the mean FA/MD for all voxels that were significantly associated with each developmental score. A linear regression was then used to examine the relationship between the mean FA/MD values from all these significant voxels combined and the associated outcome score. All analyses were also corrected for multiple comparisons. We used the traditional 2-sided P value , .05 to indicate statistical significance. All analyses were performed using STATA 16.0 (StataCorp).

RESULTS
Of the original 101 subjects, a total of 17 were excluded due to unusable scans: Fourteen did not fit into the 3 established harmonized batches, 2 had missing T2 images, and 1 was excluded due to poor image quality. Of the 84 subjects with harmonized DTI data, 69 returned for developmental testing (82%). Their mean (SD) gestational age was 28.4 (2.5) weeks, and birth weight was 1121.4 (394.1) g (Table). Of these 69 infants, 7 (10.1%) had moderate brain abnormalities (global brain abnormality score = 8-11), 15  In TBSS analyses correlating whole-brain FA with DAS-II developmental scores at 3 years' CA, adjusted for sex and postmenstrual age at MR imaging and corrected for the false discovery rate, several significant positive correlations between FA and Nonverbal Ability scores were identified. Nonverbal scores were significantly correlated with FA in the genu, splenium, and body of the corpus callosum; the posterior thalamic radiations (PTR); the fornix; and the inferior longitudinal fasciculus (ILF)/inferior fronto-occipital fasciculi (IFOF) (Fig 1, Online Supplemental Data). Verbal scores were not significantly correlated with FA in any brain voxels. There were no significant negative correlations with either outcome. Whole-brain MD was not significantly associated with DAS-II scores after adjustment for covariates. The linear regression analyses showed that the mean FA from all significant TBSS voxels explained 15% (P ¼ .001) of the variance in the DAS-II Nonverbal scores at 3 years' CA (Fig 2).
Bayley-III Motor scores were significantly positively correlated with FA in the genu, splenium, and body of the corpus callosum; posterior limb of the internal capsule; anterior limb of the internal capsule; external capsule; PTR; superior longitudinal fasciculi; ILF/ IFOF; cerebral peduncles; and corticospinal tracts (Fig 3, Online Supplemental Data). FA was not negatively correlated with Motor scores in any region. We also identified several significant negative correlations between MD and motor development in approximately the same regions as those correlated with FA (Fig 3, Online Supplemental Data). There were no positive correlations between MD in any region and Motor scores. Linear regression analysis showed that the mean FA and MD from all significant voxels, respectively, explained 20% (P , .001) and 25% (P , .001) of the variance in Bayley-III Motor outcomes at 3 years' CA (Fig 2).

DISCUSSION
We used TBSS, a well-established, objective, whole-brain tool, to quantify microstructural development in infants born very preterm at term-equivalent age and to identify brain regions in which 2 DTI metrics were associated with cognitive and motor development at 3 years' CA. Using voxelwise correlations between FA/MD and DAS-II Nonverbal and Bayley-III Motor scores, we identified several WM regions in which FA and MD were associated with neurodevelopmental outcomes. In comparison with prior studies that evaluated neurodevelopment at an earlier age (18-24 months' CA), 10,13,15,37 we were able to correlate early WM development with more reliable measures of cognitive and motor outcomes at 3 years' CA. We identified several regions of WM in which FA at term was significantly positively correlated with 3-year cognitive and motor development in very preterm infants. This finding validates prior findings correlating increased FA with better cognitive and motor outcomes, confirming that greater microstructural development and integrity is associated with better function even at older ages. Previous studies have shown that decreased MD or its components, axial and radial diffusivity, were associated with better motor development. 10,15 However, in this study, we did not identify any significant relationship between MD and cognitive development. FA is known to be a useful metric in neonates due to its ability to assess regional myelination. MD, being an aggregate of both myelination and axonal integrity, has not been strongly associated with cognitive development, which has been significantly more difficult to predict than motor development.
The DAS-II is a standardized, well-validated test of verbal and nonverbal abilities, both of which are keys to outcomes such as academic success and behavioral functioning. 38 We found that the genu, splenium, and body of the corpus callosum; PTR; posterior limb of the internal capsule; fornix; and ILF/IFOF are important for nonverbal cognitive abilities as assessed by the DAS-II at 3 years' CA. Our findings are generally consistent with studies that examined Bayley-III cognitive subscores in younger children at or before 2 years' CA. Specifically, the 3 largest studies reported an association between FA of the corpus callosum, 10,13,15 posterior limb of the internal capsule, 13,15 and/or PTR 13 with Bayley-III cognitive outcomes at 18-24 months of age. However, neither the FA of the ILF/IFOF nor the fornix was explicitly linked to cognitive development, as in our study. The ILF/IFOF fibers overlap posteriorly and are, therefore, not easy to distinguish using TBSS, and it is still debated whether the IFOF is a distinct pathway or just a portion of a bigger bundle that includes the ILF. 39 Functionally, both tracts are known to be important for several higher-order cognitive processes. 34,35 The fornix is considered a part of the limbic system and is associated with memory and emotions. 36 For the DAS-II Verbal Abilities outcome, we did not identify any correlations with white matter microstructure. Two prior studies of TBSS-derived DTI metrics and language development at earlier ages also did not find significant associations. 15,37 In a large cohort, Barnett et al 13 identified limited regions where FA was significantly associated with Bayley language scores. Our findings, using arguably more accurate testing at 3 years of age, suggest that TBSS-assessed microstructure is not a strong predictor of language outcome.
We identified several significant WM regions that correlated with Bayley-III Motor scores at 3 years of age that have known associations with sensorimotor function, including the corpus callosum, corticospinal tract, corona radiata, PTR, and superior longitudinal fasciculi. All these regions, except the PTR, were also identified by a prior neonatal TBSS study that followed infants until 24 months' CA. 10 The other 2 neonatal studies with follow-up before 24 months' CA also did not identify the PTR or the superior longitudinal fasciculi. 13,15 In older children with cerebral palsy, injury to the PTR appears to be as common or more common than injury to the corticospinal tract. 37,38 However, some studies have found no relationship between FA in the PTR at term-equivalent age and motor outcomes. 40,41 Similarly, prior studies have not shown consistently significant relationships between superior longitudinal fasciculi integrity and cerebral palsy. 42,43 Because up to 50% of children with cerebral palsy also have associated cognitive impairment, it is possible that tracts such as the PTR and superior longitudinal fasciculi are identified as significant because of their overlap and known association with cognitive impairment. 42,43 Indeed, FA in the PTR at term was also associated with nonverbal scores in our cohort. Similarly, the ILF and IFOF were significantly associated with Motor and Nonverbal scores. Relationships between diffusion metrics and neurodevelopmental outcomes. The linear relationship between mean FA or MD of all voxels identified by TBSS is significantly associated with developmental outcomes, including the Nonverbal score on the DAS-II or Motor scores on the Bayley-III at 3 years' corrected age. The dashed red lines and shaded region represent the 95% confidence intervals for the linear relationship.
We speculate that some of the inconsistencies in the literature result from differences between the Bayley-III and the DAS-II. In a study of extremely preterm infants, the Bayley-III cognitive and language scores at 2 years of age were not strongly predictive of scores on the DAS-II at 4 years of age (24%-37% variance explained) or on the Wechsler Intelligence Scale for Children at 6 years of age (24%-26% variance explained). 23,44 The DAS-II generally has high correlations with the Wechsler scales in infants, while the Bayley-III has been shown to be a poor predictor of school-age cognitive impairment in extremely preterm children. 20,21,34 These findings are not surprising considering that cognitive/language functions are just beginning to manifest at 2 years of age; therefore, they are difficult to accurately evaluate at this early time point, and the Bayley-III underestimates scores and impairment as assessed by the Bayley-II. 45,46 Even the Bayley-III Motor subscale at 2 years of age significantly underestimates rates of motor impairment at 4 years of age in preterm infants. 22 Overall, these prior studies suggest a need to shift to more accurate test instruments and increase the follow-up testing period to at least 3 years of age for studies of preterm infants.
A strength of our study is that it is population-based, unlike most prior neonatal studies that have been derived only from tertiary care referral hospitals, thus increasing the generalizability of our findings. We also performed developmental testing at 36 months' CA. This allowed us to identify more robust structurefunction relationships than neonatal studies that reported outcomes before 36 months. A weakness of our study is that we did not examine the effect of the postnatal environmental factors on developmental outcomes. Prior studies have shown a strong relationship between maternal education and socioeconomic status and neurodevelopmental scores. In addition, the heterogeneity of our sample with regard to imaging parameters, even with extensive harmonization, may still have impacted our results. A larger study, such as the one we are currently performing, is needed to validate our findings and improve the prediction of neurodevelopmental impairment using diffusion MR imaging and other advanced MR imaging modalities. Additionally, while TBSS offers a significant advantage in reliability and wholebrain assessment over ROI-based techniques, it is still based on tensor models, which can result in erroneously lower FA measurements in regions of crossing fibers (eg, the centrum semiovale). Automated, whole-brain, probabilistic tractography methods using higherorder diffusion models, which are currently lacking for neonates, are needed to overcome this limitation.

CONCLUSIONS
In a cohort of very preterm infants, we validated the presence of widespread microstructural perturbations, particularly in FA, from several brain regions that were associated in previous studies with neurodevelopmental outcomes at 2 years of age or younger. We used the DAS-II scores at 3 years' CA to validate these findings and also identified significant associations in a few previously unreported white matter regions. These diffusion MR imaging biomarkers can be potentially combined with additional promising biomarkers from other advanced MR imaging modalities to enhance the prediction of long-term neurodevelopmental outcomes in very preterm infants.

ACKNOWLEDGMENTS
We thank Jennifer Notestine, RN, and Valerie Marburger, NNP, for serving as the study coordinators; Josh Goldberg, MD, for assisting with recruitment; and Mark Smith, MS, for serving as the study MR imaging technologist. We are also grateful to the families, neonatal intensive care unit personnel, and High-Risk Clinic staff who made this study possible.