Validation of an MRI Brain Injury and Growth Scoring System in Very Preterm Infants Scanned at 29- to 35-Week Postmenstrual Age

BACKGROUND AND PURPOSE: The diagnostic and prognostic potential of brain MR imaging before term-equivalent age is limited until valid MR imaging scoring systems are available. This study aimed to validate an MR imaging scoring system of brain injury and impaired growth for use at 29 to 35 weeks postmenstrual age in infants born at <31 weeks gestational age. MATERIALS AND METHODS: Eighty-three infants in a prospective cohort study underwent early 3T MR imaging between 29 and 35 weeks' postmenstrual age (mean, 32+2 ± 1+3 weeks; 49 males, born at median gestation of 28+4 weeks; range, 23+6–30+6 weeks; mean birthweight, 1068 ± 312 g). Seventy-seven infants had a second MR scan at term-equivalent age (mean, 40+6 ± 1+3 weeks). Structural images were scored using a modified scoring system which generated WM, cortical gray matter, deep gray matter, cerebellar, and global scores. Outcome at 12-months corrected age (mean, 12 months 4 days ± 1+2 weeks) consisted of the Bayley Scales of Infant and Toddler Development, 3rd ed. (Bayley III), and the Neuro-Sensory Motor Developmental Assessment. RESULTS: Early MR imaging global, WM, and deep gray matter scores were negatively associated with Bayley III motor (regression coefficient for global score β = −1.31; 95% CI, −2.39 to −0.23; P = .02), cognitive (β = −1.52; 95% CI, −2.39 to −0.65; P < .01) and the Neuro-Sensory Motor Developmental Assessment outcomes (β = −1.73; 95% CI, −3.19 to −0.28; P = .02). Early MR imaging cerebellar scores were negatively associated with the Neuro-Sensory Motor Developmental Assessment (β = −5.99; 95% CI, −11.82 to −0.16; P = .04). Results were reconfirmed at term-equivalent-age MR imaging. CONCLUSIONS: This clinically accessible MR imaging scoring system is valid for use at 29 to 35 weeks postmenstrual age in infants born very preterm. It enables identification of infants at risk of adverse outcomes before the current standard of term-equivalent age.

ing WM abnormality was associated with poorer motor and cognitive outcomes. 1,2,5,7,[14][15][16] Scoring systems of MR imaging at TEA were further developed to include quantitative biometrics to measure the impact of secondary brain maturation and growth following preterm brain injury. 17 These brain metrics correlated with brain volumes and differentiated preterm and term-born infants at TEA MR imaging. 17 At TEA, transcerebellar diameter was associated with fidgety general movements at 3-month corrected age (CA), 18 poorer cognitive outcomes at 12-month CA, 19 and poorer motor and cognitive outcomes at 2-year CA. 20 Reduced deep gray matter area at TEA was associated with poorer motor and cognitive outcomes, 19 and an increased interhemispheric distance independently predicted poorer cognitive development at 2-year CA. 3 Reduced biparietal width at TEA predicted both motor and cognitive outcomes at 2-year CA in infants born very preterm. 3,21 Term-equivalent age MR imaging scoring systems have been further developed to include evaluation of deep gray matter (DGM) structures and the cerebellum. 22 At TEA, global brain abnormality scores were significantly associated with motor outcomes at 2-years CA 23 ; and cognitive outcomes, at 7 years. 24,25 Deep gray matter scores were significantly associated with poorer attention and processing speeds, memory, and learning. 24,25 With safe earlier MR imaging now possible with MR compatible incubators, valid scoring systems for use earlier than TEA are required. The aim of this study was to validate an MR imaging scoring system previously developed for very preterm infants at TEA in a cohort of infants born Ͻ31-weeks gestational age with MR imaging between 29 and 35 weeks' postmenstrual age (PMA). 22 The study aimed to establish predictive validity for motor and cognitive outcomes at 12-months CA. Secondary aims were to examine inter-and intrarater reproducibility and to examine relationships between global brain abnormality categories and known perinatal risk factors. It was hypothesized that the scoring system would be valid and reliable for use at this earlier time point but with more infants classified with brain abnormalities, due to immaturity rather than injury.

Study Design and Participants
This prospective cohort study of infants born at Ͻ31-weeks gestational age was conducted at the Royal Brisbane and Women's Hospital, Brisbane, Australia, between February 2013 and April 2015. Preterm infants were eligible if they had no congenital abnormality, and their parents/caregivers were English-speaking who lived within a 200-km radius of the hospital. 26 A reference sample of healthy term-born infants was simultaneously recruited to generate reference values and cut-points for the regional brain measurements that form part of the scoring system. Inclusion criteria for term-born infants were a gestational age at birth of 38 -41 weeks; birthweight above the 10th percentile; an uncomplicated pregnancy, delivery, and postpartum period; and normal neurologic examination findings. 26 Ethics approval was obtained from the Royal Brisbane and Women's Hospital Human Research Ethics Committee (HREC/12/QRBW/245) and The University of Queensland (2012001060), and the trial was registered with the Australian New Zealand Clinical Trials Registry (ACTRN12613000280707). Brain MR imaging was performed during sleep without sedation  between 30 and 32 weeks PMA or when the infant was medically  stable (range, 29

MR Imaging Scoring
A standardized MR imaging scoring system according to Kidokoro et al 22 was used to score all MRIs. An independent neurologist with training in radiology and experienced in neonatal MR imaging scoring (S.F.) performed the scoring. The scorer had no knowledge of any clinical characteristics of the infants except PMA at the time of scanning. Scoring was confirmed by a senior neuroradiologist (A.C.). Modifications to scoring cut-points were made by using the term reference data means and SDs. 27,28 Scoring items and parameters are detailed in On-line Table 1, a scoring proforma is included in On-line Table 2, and On-line Figs 1-18 provide examples of lesion types and regional measurements.
Cerebral WM abnormality was rated on 6 components, with a maximum total score of 15: cystic degeneration, focal signal abnormalities, delayed myelination, thinning of the corpus callosum, dilated lateral ventricles, and reduction of WM volume. 22 Myelination of the corpus callosum and posterior limb of the internal capsule was expected by 36-week PMA, so all infants were given a score of 2 for this item on early MR imaging. The CGM was rated on 3 components with a maximum total score of 8: signal abnormality, delayed gyration, and dilated extracerebral CSF space. Cerebellar and DGM abnormalities were rated on signal abnormality and volume reduction with maximum total scores of 6 for each. 22 A total of WM, CGM, DGM, and cerebellar scores yielded a global brain abnormality score (0 -35). 22 Each of the WM, CGM, DGM, cerebellar, and global scores could be further categorized into no, mild, moderate, or severe brain abnormality categories. 22 The WM total scores were categorized as none (0 -2), mild (3)(4), moderate (5-6), or severe (Ն7) WM abnormalities. Cortical GM, DGM, and cerebellar categories used the following total scores: none (0), mild (1), moderate (2), and severe (Ն3). Total global scores were classified as normal (0 -3), mild (4 -7), moderate (8 -11), or severe (Ն12) brain abnormalities.
Six regional measurements form part of the scoring: thickness of the corpus callosum (genu, body, and splenium), ventricular diameter, biparietal width, interhemispheric distance, DGM area, and transcerebellar diameter. These measurements change with PMA at the time of MR imaging as a result of head and brain growth. To address this change and minimize the risk of confounding, we examined the relationship of each of these measures with PMA at MR imaging to derive a correction method for PMA at MR imaging. The PMA was determined on the basis of the obstetric estimate measure of gestation at delivery. 29 In the preterm group, early and term MR imaging data were pooled for each of the regional measures, and cases with focal brain lesions were removed to ensure that any linear relationship found was the result of age and not confounded by brain injury. For each measure that demonstrated a linear relationship with PMA at MR imaging, the regression coefficient (slope) was used to generate an equation for correction, written as: Corrected Value ϭ Measured Value ϩ Regression Coefficient ϫ (40-PMA at MR Imaging). The correction was then applied to the full cohort. On-line Figs 8 -10 and 15 provide instructions for conducting regional measurements, correcting the raw values, and scoring.
The regional measurements were also obtained for the term reference sample, and examination of the relationship with PMA at MR imaging was performed separately from that of the preterm group. When linear relationships were found, measurements were corrected as per the equation above. Following correction of the term reference sample regional scores, means and SDs were calculated, and these were used to create cut-points for scoring each of the respective regional measurements.
Interrater reproducibility of MR imaging scoring was tested on a separate sample with 20 MR scans from each time point scored by a second blinded rater, a pediatric radiologist (J.B.). Intrarater reproducibility was tested with 20 MR scans from each time point rescored 1 month apart (S.F.).

Neurodevelopmental Outcome at 12-Months CA
All infants underwent neurodevelopmental assessment at 12months CA by an experienced physiotherapist blinded to MR imaging findings and medical history. The Bayley Scales of Infant and Toddler Development, 3rd ed. (Bayley III), was performed, and composite scores for motor and cognitive performance were generated. 30 The Neuro-Sensory Motor Developmental Assessment (NSMDA) evaluates neurologic and sensory motor function in addition to gross and fine motor performance, with total scores and functional classifications used. 31,32 The NSMDA at 12-months CA has good predictive validity for motor and cognitive outcomes and cerebral palsy at 4-years CA for very preterm infants 33,34 and 24-month motor and functional outcomes for infants with cerebral palsy. 35

Statistical Analysis
Sample size calculations were based on qualitative evaluation of MR images at TEA predicting 12-month outcomes, 4 with 69 infants required to reject the null hypothesis with 90% power (at P Ͻ .05). A sample of 80 infants was recruited to account for attrition and the earlier PMA at MR imaging (29 to 35 weeks PMA).
The association between each of the 6 regional measurements and PMA at MR imaging was analyzed by using mixed-effects regression models for the preterm sample data and separately for the term reference sample data with linear regression. When a linear relationship was found, data were centered around the mean and the relationship was examined to determine whether it was quadratic. Correction equations were then applied to the raw regional measures. Term reference sample mean and SD data were used to generate scoring cut-points for each of the regional measures. Paired t tests were used to determine statistically significant differences between early and term MR imaging item scores in the preterm group.
The association between early MR imaging scores and 12month outcomes and term MR imaging scores and 12-month outcomes was evaluated with univariable and multivariable linear regression. Multivariable regression included potential confounders of sex, social risk, and, for the NSMDA only, CA at assessment.
To examine the predictive validity of both early and term MR imaging, we calculated sensitivity, specificity, and accuracy (percentage of cases correctly classified). Dichotomized MR imaging and outcome data were used to construct 2 ϫ 2 tables. MR imaging category scores were dichotomized into normal/mild or moderate/severe categories for each of the subscales and global scores. Bayley motor and cognitive composite scores were dichotomized (by ϽϪ1 SD) and the NSMDA functional classification scores, as normal/minimal versus mild/moderate/severe/profound.
Inter-and intrarater reliability was evaluated by using intraclass correlation coefficients (ICCs) (type 3, 1). Agreement was evaluated by using the percentage level of accuracy, in which the definition for accuracy was exact score Ϯ1 for the subscale scores and exact score Ϯ2 for the global scores.
When we investigated perinatal risk factors, differences across global brain abnormality score categories were determined by using Mann-Whitney U tests (dichotomous perinatal risk factors) and Kruskal Wallis 1-way ANOVAs (continuous perinatal risk factors). Analysis was performed by using the STATA statistical package, Version 14 (StataCorp, College Station, Texas).

Participants
Of 214 eligible preterm infants, parents or guardians of 110 consented to the study, of whom 83 had early MR imaging and 12month outcomes available and were included in this analysis (16 with no early MR imaging: 5 medically unstable, 1 death, 4 cancellations due to MR imaging equipment failure, 3 with no MR imaging slots, 1 withdrawn, 2 with movement artefacts; 11 failed to return for 12-month follow-up). Of these, 77/83 had a second MR scan at term. Thirty-eight term-born infants were included in the reference sample. Demographic data and MR imaging scores are summarized in Tables 1-3; 12-month outcomes are summarized in Table 4. There were minimal differences between those participants with both early and term MR imaging and those with only early MR imaging, except that all 6 participants who did not undergo their term MR imaging were classified with a higher social risk. 36,37 Given the established relationship between higher social risk and poorer neurodevelopmental outcome and an increased risk of cerebral palsy and to address this difference in our cohort between early and term MR imaging, all multivariable analyses included social risk as a potential confounder. 38,39 All term reference sample infants had a normal global brain abnormality category score.

Associations between Regional Brain Measurements and PMA at MR Imaging
All preterm regional measures except the body of the corpus callosum demonstrated linear relationships with PMA at MR imag-ing (P Ͻ .01). In the term reference sample, linear relationships were found only for transcerebellar diameter and the genu of the corpus callosum. Results of regression analyses and corrected regional measures for the early, term, and term reference sample MRIs are presented in On-line Tables 3 and 4.

Findings in Each Scoring Domain at Early and Term MR Imaging
Results for scoring items are presented in On-line Table 1. The incidence of WM cystic lesions, CGM signal abnormality, and WM volume reduction as measured by corrected biparietal width remained stable between early and term MR imaging. A proportion of signal abnormalities in the WM and DGM resolved between early and term MR imaging. A propensity to score worse at term compared with early MR imaging was evidenced for each of the following: ventricular dilation, interhemispheric distance, volume reduction of the DGM and cerebellum, and thinning of the corpus callosum. More infants had delayed gyral maturation at early MR imaging compared with term MR imaging.

Predictive Validity of Early MR Imaging
Results of univariable and multivariable regression analyses between early MR imaging scores and neurodevelopmental outcomes are presented in Fig 1 (first row); sensitivity, specificity, and accuracy, in Table 5. Global, WM, and DGM scores on early    The sensitivity of early MR imaging global scores to predict motor, cognitive, and NSMDA outcomes ranged from 33% to 50%, specificity ranged from 86% to 87%, with the percentage of accurately classified cases ranging from 77% to 83%.

Inter-and Intrarater Reproducibility
Reliability and agreement results are presented in On-line Table 5. At Early MR imaging, intrarater reliability ranged from 0.82 to 0.97 (ICC), and agreement, from 90% to 100%. Interrater reliability was low for CGM (ICC ϭ 0.08) but excellent for the other Associations between early (first row) and term (second row) MR imaging scores and neurodevelopmental outcome at 12-months corrected age for the preterm cohort. Solid lines represent univariable regression analyses, and dashed lines represent multivariable analyses for which sex, social risk and, for NSMDA only, corrected age at assessment were added.

Perinatal Risk Factors
Perinatal risk factors were associated with increasing severity of the MR imaging global brain abnormality category scores (Online Table 6). Early MR imaging was associated with gestational age at birth, birth weight, patent ductus arteriosus, retinopathy of prematurity, postnatal corticosteroids, ventilation, and oxygen therapy. Term MR imaging was associated with gestational age at birth, birth weight, higher social risk, retinopathy of prematurity, ventilation, oxygen requirement at 36-weeks PMA, and the requirement for home oxygen.

DISCUSSION
This clinically accessible scoring system of structural brain MR imaging for use at 29 to 35 weeks PMA for infants born at Ͻ31week gestational age is valid. Early MR imaging WM, DGM and global brain abnormality scores were associated with Bayley III motor and cognitive scores and outcome on the NSMDA at 12months CA. Early cerebellar scores were also associated with the NSMDA outcome. These associations were reconfirmed at term MR imaging. In addition, term MR imaging cerebellar scores were associated with Bayley III motor and cognitive outcomes. Early MR imaging was more strongly associated with cognitive than motor outcomes. The scoring system on which this study was based has been used in 2 studies examining the relationships between TEA MR imaging and cognitive outcomes at 7 years. 24,25 Our results support previous findings at TEA and suggest that the brain changes associated with adverse cognitive outcomes are already present as early as 29 to 35 weeks PMA. 7 Of all MR imaging subscale scores, at early and term MR imaging, DGM demonstrated the strongest relationship with outcome. This finding supports inclusion of DGM evaluation in qualitative and semiquantitative scoring systems in this population. Cerebellar scores on early MR imaging were associated with NSMDA scores but not the Bayley III motor score. This finding is interesting because the Bayley III motor scale focuses on motor achievement, while the NSMDA evaluates the quality of motor performance, including balance and postural reactions, functions known to be modulated by the cerebellum. The NSMDA also includes assessment of muscle tone, reflexes, and sensory motor function, and at 12 months CA, has been shown to predict motor and cognitive outcomes and cerebral palsy at 4 years in preterm infants. 33,34 The specificity of the scoring system is reasonable, indicating that those infants whose global scoring category is moderate or severe have a high probability of poor motor and cognitive outcomes at 12-months CA. The sensitivity is relatively low, so not all infants who progress to poor motor and cognitive outcomes will be identified by this scoring system at early or term MR imaging; however, it also means that the risk of false-positives is low. Parents indicate a desire for prognostication and early identification of outcomes, 40 and a low false-positive rate is preferable to prolonged distress of a false-positive result causing parents to spend years waiting for an adverse outcome that does not occur. 41,42 A combination of TEA MR imaging findings and 3-month CA general movement assessment demonstrates improved predictive validity over TEA MR imaging alone, [43][44][45] so evaluation of the relationships between this early MR imaging scoring system and concurrent clinical measures and the combination of early MR imaging and clinical measures to predict later outcomes is warranted.
Our results indicate that term MR imaging scores demonstrate stronger associations with 12-month outcomes than early MR imaging scores. Term MR imaging associations described here are stronger than those found by another group using the original scoring system 23 ; this finding suggests that the modified scoring cut-points, based on term-born reference sample data, may be an improvement over the original scale. 27 Their outcome was at 2-years CA rather than 12-months CA in the present study. Stronger associations of term MR imaging with outcomes may be due to small focal lesions evident on early MR imaging having resolved by term MR imaging or volume reduction becoming more apparent. Both of these require further exploration. Term MR imaging scores presented here show a lower incidence of myelination delay compared with the cohort on which the scale was originally based. In the present study, the T1 sequence was performed at the end of the MR imaging when infants were often beginning to wake up; therefore, it had a higher incidence of motion artefacts. For this reason, T2-weighted images were used to score myelination delay with their improved contrast, and this may have resulted in an overestimation of myelination compared with the earlier study. 22

CONCLUSIONS
This study presents a clinically accessible MR imaging scoring system of brain injury and growth for use from 29 to 35 weeks' PMA in infants born at Ͻ31-weeks gestational age that has good reproducibility and significant associations with motor and cognitive outcomes at 12-months CA. The tool is suitable for use in research and for assisting clinical patient management. broader project, a prospective cohort study, of which the current work is a part*; UNRELATED: Grants/Grants Pending: National Health and Medical Research Council of Australia, Comments: A separate study, PREBO (Preterm Brain Outcomes), has been awarded funding to progress work started by the PPREMO project. A number of participants in the present study have also consented to the PREBO study, which involves follow-up at later ages. I am listed as an associate investigator on that grant.