Assessment of MRI-Based Automated Fetal Cerebral Cortical Folding Measures in Prediction of Gestational Age in the Third Trimester

BACKGROUND AND PURPOSE: Traditional methods of dating a pregnancy based on history or sonographic assessment have a large variation in the third trimester. We aimed to assess the ability of various quantitative measures of brain cortical folding on MR imaging in determining fetal gestational age in the third trimester. MATERIALS AND METHODS: We evaluated 8 different quantitative cortical folding measures to predict gestational age in 33 healthy fetuses by using T2-weighted fetal MR imaging. We compared the accuracy of the prediction of gestational age by these cortical folding measures with the accuracy of prediction by brain volume measurement and by a previously reported semiquantitative visual scale of brain maturity. Regression models were constructed, and measurement biases and variances were determined via a cross-validation procedure. RESULTS: The cortical folding measures are accurate in the estimation and prediction of gestational age (mean of the absolute error, 0.43 ± 0.45 weeks) and perform better than (P = .024) brain volume (mean of the absolute error, 0.72 ± 0.61 weeks) or sonography measures (SDs approximately 1.5 weeks, as reported in literature). Prediction accuracy is comparable with that of the semiquantitative visual assessment score (mean, 0.57 ± 0.41 weeks). CONCLUSIONS: Quantitative cortical folding measures such as global average curvedness can be an accurate and reliable estimator of gestational age and brain maturity for healthy fetuses in the third trimester and have the potential to be an indicator of brain-growth delays for at-risk fetuses and preterm neonates.

G estational age (GA) estimation for unborn babies is crucial to any assessment of pregnancy, fetal development, and neonatal care. Obstetricians routinely use the last menstrual period to estimate the beginning date of gestation and calculate the esti-mated date of delivery. However, it has been reported that 20%-40% of women could not determine their last menstrual period with certainty due to various reasons such as bleeding in the first trimester and pregnancy following the use of oral contraceptives. 1 Fetal growth measurements from sonography biometry are used to estimate the gestation length, if one assumes a normative growth trajectory at the time of the examination. 2 The most widely used fetal biometrics assessed in the first 2 trimesters 3 have been shown to be more accurate than last menstrual periodbased estimates. 4,5 Consequently, sonographic measurements are recommended in place of the last menstrual period if there is a large discrepancy between the last menstrual period and sonography-based estimates before 20 weeks of gestation. 6 Sonography-based GA estimation in the third trimester (26 weeks to birth) should be interpreted with great caution due to remarkably increased variability in organ size in the later stages of pregnancy. 3,7 For example, biparietal diameter varies by approximately 7 days, 14 days, and 21-28 days when measured at 14 -20 weeks, 21-30 weeks, and after 30 weeks GA, respectively. 7 Consequently, it has been recommended that menstrual dates be used to establish GA if it is within the error range of these biometric markers during the third trimester. 3 This limitation in sonographic assessment stems mostly from the inherent variability of organ size and the intrinsic signal properties of ultrasonography. In this article, we attempt to address this important problem from a different perspective-brain cortical folding. During the second half of gestation, the cortical folding of the neocortex initiates at around 20 weeks and drastically changes the brain shape throughout the third trimester, from a largely smooth surface to a complex convoluted one. 8 The process appears to be genetically controlled and largely consistent across the healthy population. 9,10 Primary, secondary, and tertiary gyri sulci emerge in order, and newly evolved sulci related to higher functions, such as auditory, visual, and linguistic functions, appear later than phylogenetically more primitive allocortical folding, such as the hippocampus and olfactory sulcus. 8 Several MR imaging studies on healthy fetuses have found a linear relation between folding measures and GA. 11,12 There was evidence that female and male preterm neonates did not differ in folding while differing in size. 13 Data also showed that the sulcal depth of a folding measure was approximately 5% of the mean of the measure in healthy preterm neonates at around 32 weeks' GA. 14 In comparison, the sonography-based measure, biparietal diameter, has a sulcal depth (1.5 mm) equal to 16% of the average biparietal diameter (9 mm) at 30 -36 weeks' GA. 3 The lower variance of folding when controlled for GA potentially leads to a better estimator of GA than sonography in late gestation.

Subjects and Data Acquisition
The study cohort included healthy pregnant women who were recruited in a previously published study with local institutional review board approval. 15 Exclusion criteria included GA of Ͼ36 completed weeks, multiple gestations, congenital infection, gestational diabetes, or any maternal contraindication to MR imaging. We also excluded fetuses with sonographic findings of dysmorphic features, dysgenic brain lesions, or anomalies of other organ systems. 15 Written informed consent from mothers was obtained according to a protocol approved by Boston Children's Hospital Committee for Clinical Investigation, and the study was compliant with the Health Insurance Portability and Accountability Act. A cohort of 33 healthy fetuses with high-resolution reconstructed MRI was included in this study, of which 10 subjects were female and 23 were male. The cohort has been previously reported in studies of normative brain development 16 but has not been used to estimate gestational age.
The mean GA at the time of MR imaging was 29.1 Ϯ 2.8 weeks (range, 25.2-35.4 weeks). Estimated gestational age was based on maternal dates and first-trimester sonography measurements, if available, by the pregnant mother's referring obstetrician. The working gestational age as determined by the referring obstetrician at the time of MR imaging was used. All fetal MR imaging studies were performed on a 1.5T scanner (Achieva; Philips Healthcare, Best, the Netherlands) and a 5-channel phased array cardiac coil. Multiplanar single-shot turbo spin-echo imaging was performed (TE ϭ 120 ms, TR ϭ 12,500 ms, 0.625 signal averages, FOV ϭ 330 mm, section thickness ϭ 2-mm, no intersection gap, acquisition matrix ϭ 256 ϫ 204, acquisition time ϭ 25-55 seconds). All fetal and postnatal MR imaging findings were normal. The Vineland Adaptive Behavior Scale 17 used to assess functional performance in communication, daily living, socialization, and motor skills was age-appropriate in the subjects between 18 and 24 months of age.

Motion Correction and Manual Brain Segmentation
Spontaneous fetal motion during scanning poses a challenge for the computation of 3D measures. The imaging data used in this study were reconstructed to isotropic volumes (1 mm 3 ) on the basis of a superresolution method. 18 Other methods for correcting motion in fetal MR imaging exist, 19,20 but we did not compare the differences among them in this article.
We created a manual mask of the intracranial region for each subject. The mask excluded all the maternal tissue and nonbrain tissue of the fetus. In addition, we also defined a cerebral mask for each subject, which was constructed manually by tracing the interface between developing white matter and cortical gray matter. The mask region excluded the cerebellum and brain stem. The boundary of the mask was used as a representation of the cortex. This mask was also used to calculate the brain/cerebral volume of the subject. Masks for the cortical lobes (frontal, temporal, parietal, occipital, and insular) and left and right hemispheres were generated by registering subject images with a manually labeled template. This template was based on a publicly available atlas, brain-development.org (www.brain-development.org), 21 which was an average of preterm infants at a postmenstrual age of 29 weeks. Manual tracings of lobes and hemispheres were performed and visually inspected in the free software ITK-SNAP (www. itksnap.org). 22

Cortical Folding Measures
We implemented and evaluated 8 different quantitative measures of cortical folding that have been used in the literature. [23][24][25][26] These were average curvedness (AC), Gaussian curvature (L 2 ) norm, mean curvature L 2 norm, intrinsic curvature index, extrinsic curvature index, convexity ratio, isoperimetric index, and average sulcal depth. The convexity ratio and isoperimetric index are globally defined, while the other measures are an average over each point on the cortex. Gaussian curvature L 2 norm, intrinsic curvature index, mean curvature L 2 norm, convexity ratio, isoperimetric index, and AC are all invariant to translation, rotation, and scaling of the cortical surface. 25,27 The extrinsic curvature index is invariant to translation and scaling, but rotation may cause it to change sign. The convexity ratio is also shown to be area-independent (ie, invariant to the change of surface area in question). 26 The implementation of the folding measure computations was based on a previously established processing pipeline designed for pediatric data. 27,28 The pipeline took in the cerebral mask and upsampled it to a higher resolution (eg, 0.45 3 mm 3 ). Our numeric scheme for computing the quantitative folding measures depended on reliably estimating the cortical surface and sampling it fairly uniformly. The adopted numeric scheme did so by obtaining an implicit surface parameterization of the cortical surface, as a level set of a distance transform on a Cartesian grid. We used the sparse field level set representation, which was numerically efficient, on a grid of isotropic voxels with submillimeter dimension. A previous article 28 validated such a scheme for adult brains as well, which exhibited more complex and more folded cortical surfaces than fetal brains, thus indicating the suitability of the scheme for fetal brains. The flowchart of the processing pipeline is shown in On-line Fig 1.

Statistical Analysis
To examine whether the various cortical folding measures provided more information about GA than brain volume, we constructed a linear model with brain volume and each folding measure as 2 predictors of GA and compared the model with the reduced model with only brain volume as the predictor of GA. More specifically, we tested the null hypothesis by using analysis of variance that the following 2 linear regressions are equal: 1) GA ϭ a 0 ϩ a 1 ϫ ͑Brain Volume͒ ϩ a 2 ϫ ͑Folding Measure͒, 2) GA ϭ a 0 ϩ a 1 ϫ ͑Brain Volume͒.
To assess the validity to estimate GA across various samples, we adopted a leave-k-out analysis, where k out of n subjects were chosen as a test set and the regression was performed on the remaining nϪk subjects as the training set. Note that when k ϭ 1, this cross-validation degenerates to the commonly used leaveone-out analysis scheme. A prediction error was calculated by averaging the absolute difference between the predicted GA and the known GA in the test set. For a fixed k Ͼ 1, a random test set was generated k ϫ n times. The differences in the mean and variance of the absolute error of predicted GA were assessed by t test, F-test, and the Levene test. 29 To further examine the biases and variances related to the curvedness and volume-based predictions, we performed statistical tests on signed prediction errors in addition to absolute prediction errors. Bias was defined as the mean of the signed errors for a prediction. Variance referred to the variance of the predicted GA. The difference in biases was assessed by t test, and the difference in variances was assessed by the F-test and Levene test.

Relations between Folding and GA
A progressive increment of cortical folding complexity can be observed with increasing GA (On-line Fig 2). The Pearson correlation coefficient (r) between each individual quantitative cortical folding measure and GA, which is shown in the Table, indicates that these measures have a very strong linear relation with GA (P Ͻ 10 Ϫ10 for all measures). For example, average curvedness and average sulcal depth accounted for 96% and 93% of the variance of GA as calculated by r 2 in the data. Three scatterplots with regression lines are shown in Fig 1. Polynomial fitting suggested that there was some slight degree of nonlinear correlation between the folding measures and age, but linear regression appeared to fit the current data well. We also examined the folding complexity in the left and right hemispheres separately and did not find significant asymmetries in terms of hemispheric folding (see left subfigure in On-line Fig 3). When the cerebrum was divided into 5 lobes, we found that the frontal lobe had the largest rate of folding, while the insular region demonstrated the lowest rate of folding (see right subfigure in On-line Fig 3).
The null hypothesis that a 2 ϭ 0 in the linear model (equation 1) was rejected by the F-test (F Ͼ 18.10, P Ͻ .0002), and the first regression was significantly better than the second regression for any folding measure. This finding indicates that the quantitative cortical folding measures add additional and complementary information to brain volume in the prediction equation for GA. In contrast, when we swapped folding measures with brain volume in the equations, the null hypothesis was not rejected (F Ͻ 2.34, P Ͼ .13) for AC, Gaussian curvature L 2 norm, mean curvature L 2 norm, intrinsic curvature index, or sulcal depth. The phenomena implied that some of these folding measures could completely replace brain volume measurements when it Note:-EC indicates extrinsic curvature index; GC, Gaussian curvature L 2 norm; IC, intrinsic curvature index; MC, mean curvature L 2 norm; CR, convexity ratio; IP, isoperimetric index; SD, sulcal depth; BVol, brain volume; r (R), regression; Poly., polynomial. a The first and second rows are correlation coefficients between each measure and GA in linear and polynomial regressions. The third row is the linear correlation coefficients between folding measures and total maturation score. came to predicting GA because brain volume provided no extra information about GA compared with AC, Gaussian curvature L 2 norm, mean curvature L 2 norm, intrinsic curvature index, or sulcal depth. An independent assessment of brain maturity in the same dataset was performed by a pediatric neurologist and a pediatric neuroradiologist (D.J.L. and A.V.), each with 8 years of fetal MR imaging experience. A single fetal total maturation score (fTMS) was calculated by summing up 6 semiquantitative subscores characterizing various visual sulcation observations, the extent and location of myelination, and evolution of the germinal matrix. 16 We found that the folding measures obtained by the proposed framework were in high agreement with the average visual fTMS (Fig 2).

Prediction of GA
The resultant predictions by quantitative folding measures, brain volume, and the visual fTMS assessment in a leave-one-out scheme are shown in Fig 3. The AC-based prediction appears closer to true GA than brain volume or the fTMS-based prediction. The final errors (averaged across the k ϫ n test sets) based on different sizes of the training set are shown in On-line Fig 4. It can be observed that using the cortical folding measure to predict GA always resulted in higher prediction accuracy (ie, smaller errors) than using brain volume. Leave-one-out cross-validation showed that the mean of absolute errors is 0.43 Ϯ 0.45 weeks (range, 0.01-1.24 weeks) for AC and 0.72 Ϯ 0.61 weeks (range, 0.02-2.63 weeks) for brain volume. The difference between the 2 prediction errors was statistically significant (P ϭ .024). The difference between the 2 variances of the 2 predictions was also significant based on an F-test (P ϭ .002), borderline on a Levene test (P ϭ .06) for training size equal to 32, and significant on a Levene test (P Ͻ 10 Ϫ4 ) for any other training size larger than 20. The curvedness predictor also had a slightly lower absolute error than the fTMS predictor (mean, 0.57 Ϯ 0.41 weeks), but the difference was not significant.
The difference between the biases from curvedness and volume-based predictions was insignificant on the basis of a 2-sample t test when the training set size was reasonably large (Ͼ20). Regarding variance, the curvedness predictor had consistently smaller variance than the volume predictor (about half) when the training set size was large (Ͼ20). The 2-sample F-test showed that the variance difference between curvedness and volume predictors was significant (P ϭ .0041, for the largest training size 32; for smaller training sizes, the significance was higher with a smaller P value). The Levene test also showed that the variance difference was significant (L ϭ 4.380, P ϭ .040, for training size 32; for smaller training sizes, the significance was higher). In summary, the biases of the 2 predictors are comparable, and the cortical folding predictor has a smaller variance than the volume predictor, which means the former is more reliable than the latter in the prediction of GA. The quantitative cortical folding predictor and visual fTMS predictor are close in both bias and variance.

DISCUSSION
To our knowledge, this article is among the first works in the literature to use various MR imaging-based computational cortical folding measures as a means to estimate GA in healthy human fetuses. All the previous studies on fetal MR imaging 11,12,25,[30][31][32] have regarded clinically acquired GA as an independent variable or a known reference point as opposed to a dependent variable to be predicted or estimated. The current work is also unique in the wide GA range of the cohort, which includes the period close to full term (33-36 weeks), when it is more difficult to date pregnancy.
Many studies have characterized cortical folding in preterm neonates. 13,26,[33][34][35] Although premature neonates are often studied as a surrogate of prenatal development of full-term infants, preterm neonates are exposed to extrauterine stressors and risk  factors, which often make their neurodevelopment dissimilar to normal intrauterine development. 14 On the basis of the data and results, it became clear that linearity was sufficient to account for the relation between GA and folding. More than 90% of variance in gestational age can be explained by the variance in a cortical folding measure. The calculation of folding measures does not use the information of gestational age at all, so it is unlikely that the linearity manifested in the results comes from a bias in the processing that favors a linear model or any other model between GA and folding.
In embryology, GA is defined as postconception time, but in human obstetrics, it is often defined as the time since the first day of the mother's last menstrual period. The latter is approximately 2 weeks longer than the former. In this article, we used the definition of GA involving the last menstrual period, which is the common clinical practice. It is possible that there were errors in the estimation of true GA despite the best clinical practice. The potential inaccuracy of ground truth GA in these data, however, does not greatly diminish the value of the results because the folding process reflects an aspect of brain development that is complementary to size increase, which is the basis of clinical GA estimation. The fact that 2 independently obtained measures of GA (folding-based and size) are so consistent with each other indicates the improbability that either measure is erroneous. The chance that both measures are very imprecise but in high agreement is slim, if one assumes that the errors are random and independent.
The sample size of the current study is relatively small and may reduce the generalizability of the results. When dealing with a small sample size, it is important not to overfit a model by creating too many features/measures or by using a complex type of regression. By using a single whole-brain cortical folding measure and simple linear regression, we sought to avoid the overfitting problem, in which the agreement between the model and data is increased but the generalizability to more data is decreased. Linear regression strikes a good balance between the goodness of fit in the current data and the generalizability to unseen datasets. Although cortical folding may be nonlinear in its entire course, it can be approximated by a linear process within the gestational period of interest (25-36 weeks), as demonstrated in other independent studies. 12,13,30 Another limitation of this study is that there could be some "healthy" fetuses who are later found not healthy because abnormalities were not obvious at short-term follow-ups.

CONCLUSIONS
This article demonstrated that automated quantitative measurement of fetal cerebral cortical folding can be used to estimate gestational age in the third trimester with high accuracy and reliability on a single-case basis by using clinical fetal MR imaging. The folding measures accurately predict the gestational age of a fetus in the third trimester (mean error, 0.43 Ϯ 0.45 weeks), which can be a major challenge for sonography-based measures. Improved accuracy and reliability in GA estimation in late gestation can have a positive impact on prenatal care for underserved populations. It is also important to be able to estimate the fetal brain maturity because chronologic GA is no longer a suitable gauge for fetuses with aberrant neurodevelopment later in gestation. Cortical folding measurement offers a potential way to accomplish that as well.
Disclosures: James C. Gee-RELATED: Grant: National Institutes of Health,* University of Pennsylvania*; UNRELATED: Consultancy: University of Texas Southwestern Medical Center, Comments: consultant on National Institutes of Health-funded project; Grants/Grants Pending: National Institutes of Health*; Stock/Stock Options: part of retirement planning; Travel/Accommodations/Meeting Expenses Unrelated to Activities Listed: advisory board meetings, international task force meetings, academic lectures, grant review panels. *Money paid to the institution.