Artificial Intelligence – Assisted Evaluation of the Spatial Relationship between Brain Arteriovenous Malformations and the Corticospinal Tract to Predict Postsurgical Motor Defects

BACKGROUND AND PURPOSE: Preoperative evaluation of brain AVMs is crucial for the selection of surgical candidates. Our goal was to use arti ﬁ cial intelligence to predict postsurgical motor defects in patients with brain AVMs involving motor-related areas. MATERIALS AND METHODS: Eighty-three patients who underwent microsurgical resection of brain AVMs involving motor-related areas were retrospectively reviewed. Four arti ﬁ cial intelligence – based indicators were calculated with arti ﬁ cial intelligence on TOF-MRA and DTI, including FN 5mm/50mm (the proportion of ﬁ ber numbers within 5 – 50mm from the lesion border), FN 10mm/50mm (the same but within 10 – 50mm), FP 5mm/50mm (the proportion of ﬁ ber voxel points within 5 – 50mm from the lesion border), and FP 10mm/50mm (the same but within 10 – 50mm). The association between the variables and long-term postsurgical motor defects was analyzed using univariate and multivariate analyses. Least absolute shrinkage and selection operator regression with the Pearson correlation coef ﬁ cient was used to select the optimal features to develop the machine learning model to predict postsurgical motor defects. The area under the curve was calculated to evaluate the predictive performance. RESULTS: In patients with and without postsurgical motor defects, the mean FN 5mm/50mm , FN 10mm/50mm , FP 5mm/50mm , and FP 10mm/50mm were 0.24 (SD, 0.24) and 0.03 (SD, 0.06), 0.37 (SD, 0.27) and 0.06 (SD, 0.08), 0.06

In our previous studies using fMRI and DTI, lesion-to-corticospinal tract distance (LCD) was used to evaluate the relationship between white fibers and lesions, which has proved to be an important predictive factor for postsurgical motor defects (MDs) in BAVMs involving motor-related areas (M-AVMs). 6,7 However, because BAVM lesions and the corticospinal tract (CST) are both stereoscopic objects, LCD obviously has some limitations in describing the spatial relationship between them. First, LCD is measured on the basis of a 2D plane and is a measurement that is too simple and crude to describe the complicated 3D spatial relationship between CST and BAVM lesions. 8,9 Second, LCD is measured manually, which may lead to relatively inaccurate and subjective measurements. 10 Finally, as a basic step of LCD calculation, fiber tracking skills may have confounding bias, which might vary with the operators' different levels of clinical experience or in different clinical centers. 11 Artificial intelligence (AI) technology, with its powerful processing and analysis ability, is emerging to develop a predictive model for clinical prognosis. 12 AI-based tools have the capability of overcoming the shortcomings of manual evaluation, such as time-consuming workflows and substantial clinical experience requirements. 13 Meanwhile, AI also has great potential in calculating the sophisticated 3D spatial relationship between stereoscopic objects. 14,15 Whether exploiting AI technology to evaluate BAVMs has a better ability to predict postoperative motor function in M-AVMs remains unknown.
In this study, 83 patients who underwent microsurgical resection of M-AVMs were retrospectively reviewed. Four AI technology-based indicators were calculated followed by a machine learning (ML) model to predict the long-term postsurgical MDs of patients. Our study indicated that AI technology is an excellent method for predicting postoperative motor function in patients with M-AVMs.

Patients
All patients were retrospectively reviewed from our BAVM database of a prospective clinical trial (ClinicalTrials.gov Identifier: NCT02868008) conducted between July 2015 and December 2020. Inclusion criteria were the following: 1) The lesions were near the CST (LCDs were ,10 mm based on DTI tractography), 2) patients underwent microsurgical resection of the BAVMs, 3) patients were followed up for 6 months with complete information of motor function, and 4) patients' TOF-MRA and DTI data were acquired before microsurgery. The exclusion criteria included the following: 1) poor-quality neuroimaging, 2) atypical BAVM images after stereotactic radiosurgery or interventional therapy, and 3) incomplete imaging or follow-up data. The standard of postsurgical MDs was defined as patients not recovering to the level of being able to carry out all their usual activities (mRS . 2) 6 months after microsurgery as previously reported. 5,16 Finally, 83 patients with BAVM lesions near the CST who underwent microsurgery were enrolled. The study was reviewed and approved by the ethics committee of the Beijing Tiantan Hospital Affiliated with Capital Medical University (approval No. KY2016-031-01). Before the study, the patients were informed about the design of the study, and each participant provided his or her informed consent. The design and all procedures adhered to the latest version of the Helsinki Declaration. The flow diagram of patient selection, AI-based indicator calculation, model development, and predictive ability evaluation is shown in Fig 1.

Baseline Data
The clinical factors including patients' age, sex, hemorrhage, and seizure information were collected from the prospectively collected database and the electronic medical records system by a neurosurgeon (J. Zhang). The variables such as BAVM size, lobe, diffuseness, deep venous drainage, deep perforating artery supply, Spetzler-Martin (S-M) score, and hemorrhagic presentation, diffuseness, deep venous drainage, and lesion-to-eloquence distance (HDVL) score were determined with consensus by 2 experienced neurosurgeons (H. Li and J. Weng) from preoperative angiograms, traditional MR imaging scans, and TOF-MRA images. [5][6][7] As described in our previous study, the LCD was defined as the nearest distance from the BAVM lesions to the CST. The LCD Manual , serving only as an inclusion criterion (LCD Manual , FIG 1. Flow diagram of patient selection, AI-based indicator calculation, ML model development, and predictive ability evaluation. AIbased indicators include FN 5mm/50mm , FN 10mm/50mm , FP 5mm/50mm , and FP 10mm/50mm . Clinical variables include age, sex, side, lobe, size, deep perforating artery supply, deep venous drainage, hemorrhage, seizure, and S-M grading score. LASSO indicates least absolute shrinkage and selection operator. ,10 mm), was measured with consensus by 2 experienced neurosurgeons (H. Li and Y. Jiao) on the iPlan 3.0 workstation (Brainlab). Sections for measurement were manually selected where the tracked CST appeared to be nearest to the margin of the lesions on the TOF-MRA images fused with fMRI and DTI. 17 Meanwhile, the lesion-to-corticospinal tract distance measured by the AI method (LCD AI ) was used for developing a predictive model. Distances of any 2 points between the surface of the segmented lesions and tracked CSTs were measured by matrix operations. The minimum value among the distances was the nearest LCD AI .

MR Image Acquisition
The MR images were obtained using a 3T MR imaging scanner (Magnetom Trio; Siemens). The sagittal T1 anatomic image acquired was a gradient-echo sequence: TR ¼ 2300 ms, TE ¼ 2.98 ms, section thickness ¼ 1 mm, slices ¼ 176, FOV ¼ 256 mm, flip angle ¼ 9°, matrix ¼ 64 Â 64, voxel size ¼ 1 Â 1 Â 1 mm 3 , and bandwidth ¼ 240 kHz. Axial TOF-MRA was performed using a 3D TOF gradient-echo acquisition sequence: TR ¼ 22 ms, TE ¼ 3.86 ms, section thickness ¼ 1 mm, slices ¼ 36 Â 4, FOV ¼ 220 Â 220 mm 2 , flip angle ¼ 120°, and matrix ¼ 512 Â 512. DTI was performed using the DWI-EPI technique: TR ¼ 6100 ms, TE 93 ¼ ms, section thickness ¼ 3 mm, slices ¼ 45, FOV ¼ 230 Â 230 mm 2 , and matrix ¼ 128 Â 128 with a motion-probing gradient in 30 orientations. 6,7 Lesion Segmentation, Fiber Tracking, Registration, and Lesion Dilation Lesion Segmentation Assisted by AI. In the segmentation phase, we used the U-Net model proposed in our previous study to delineate BAVM lesions automatically. Before we trained the model, all AVM lesions on the TOF-MRA images were manually labeled with lesion masks by 3 neuroradiologists (H. Li, J. Wang, and J. Zhang) as the ground truth references according to their signal, shape, course, texture, and so forth. The manually labeled images were used to train and test the U-Net model. In the process of developing the U-Net segmentation model, we set the number of training epochs as 300 and evaluate the model after each epoch with a Dice score. The criterion used to stop the iteration was no more than a 0.5% increase of the Dice score in 50 consecutive epochs. The results of AI segmentation were checked and verified by 2 neurosurgeons (H. Li and J. Wang). Finally, the Dice score of the model reached 0.80. In addition, the goal of our study was to build a fully AI-based automatic prediction model for convenient clinical application: inputting the raw imaging data into the model without the time-consuming manual segmentation by neurosurgeons and directly going to the prediction of the prognosis. Therefore, in the present study, we used the U-Net model for segmentation rather than using manual segmentation. The details of the segmentation method were described in our previous work. 18 Fiber Tracking. Fiber tracking was based on DTI, which could reflect the anisotropy of the diffusion motion of water molecules. The Constant Solid Angle model and the Euler Delta Crossing (EuDX) algorithm were used to track the fibers on DTI with the Python Library Dipy (https://github.com/dipy). 19,20 The Constant Solid Angle model was used to obtain the orientation distribution function of each voxel on DTI scans. The orientation distribution function was used to express the probability of the fiber distribution in finite directions per voxel. 21 The EuDX algorithm used the obtained orientation distribution function to connect the directions of these voxels with the highest probabilities to form a complete fiber tract. The results of AI fiber tracking were checked and verified by neurosurgeons (H. Li and J. Wang).
Registration Assisted by AI. In this study, the registration method included affine registration and deformable registration; the objectives of using these methods were to find a coordinate transformation and to bring the 2 images as close as possible. The difference between affine registration and deformable registration was that the affine registration performed a linear transform operation and each voxel of the image had the same transformation, so the 2 images had the same size and position after affine registration. 22 Affine registration can be expressed as T A is a 4 Â 4 matrix that represents linear transformation in affine registration. X; y; z ð Þis the coordinate of a voxel in an image. x A ; y A ; z A ð Þ is the coordinate of a voxel in an image after affine registration.
On the other hand, deformable registration performed nonlinear transformations for which each voxel of the image had its own transformation direction, so the 2 images had the same shape and texture after deformable registration. 23 The whole process is represented as follows: DX; DY; DZ indicates the offset in each dimension of each voxel. x F ; y F ; z F ð Þ is the coordinate of a pixel in an image after deformable registration. The nonlinear transformation operation was performed on the coordinates x A ; y A ; z A ð Þ after affine registration.
To obtain T A and ðDX; DY; DZÞ, we used U-Net neural networks to build a model for the 2 registration processes. 24 In the affine registration model, the decoder was designed as a fully connected neural network, and the output was a 3 Â\ 4 matrix. In the deformable registration model, the output was a 3-channel matrix, and each channel was represented as ðDX; DY; DZÞ. To achieve end-to-end training, we added interpolation operations into the neural network model for which the output of the network was a registered image. In addition, our study could train the affine registration and deformable registration in a cascade by which the images input into the model could undergo both affine and deformable registration.
The registration method was used to complete the functional segmentation of the medical images. The ICBM-152 (https:// nist.mni.mcgill.ca/icbm-152-nonlinear-atlases-2009/), which has detailed anatomic structural segmentation, was used as an atlas. 25,26 We first extracted all the data for fibers of patients in DTI scans through the Constant Solid Angle model and the EuDX algorithm. We performed affine registration and deformable registration operations on the patients' DTI scans with the atlas, and the patients' anatomic structural segmentation data were obtained. Then, we found ROIs, the anterior half of the lower pons on the ipsilateral side and precentral gyrus, related to motor function in the patients' anatomic structural segmentation. 27 The fibers between the 2 ROIs were extracted from the patients' DTI scans; these were the CST of the patient. When we combined the results of segmentation and fiber tracking on the patients' DTI scans, the fibers related to motor function could be distinguished from all the fibers. Finally, the CST tracked on DTI scans was registered to TOF-MRA scans with BAVM segmentation by affine registration so that the AI-based indicators reflecting the spatial relationship of the CST and lesion could be calculated (Fig 2).
Lesion Dilation. To evaluate the spatial relationship, we performed a dilation operation on the lesion mask segmented by the U-Net model to obtain the AI-based indicators. The dilation operation was implemented using affine transformation. The centroid position x c ; y c ; z c ð Þ of the lesion mask was calculated, and the position was used as the center of the affine transformation. Then, all the parameters related to the times of dilation in the affine transformation matrix were calculated. We recorded the size of each voxel in an image as x v ; y v ; z v , which represented the size of each voxel in 3 dimensions. If the size of the area where the lesion was located was h; w; d ð Þand we wanted to dilate the lesion by b mm, the dilated lesion was in an h þ 2Ãb =x v ; ð w þ 2Ãb =y v ; d þ 2Ãb =z v Þ-sized area-that is, the voxels b =x v ,\ b =y v , and b =z v were expanded at both ends of the region. Therefore, the lesions were dilated The affine transformation matrix was expressed as : The dilation operation can be expressed as

AI-Based Indicator Calculations in Patients with M-AVMs
In this study, 4 AI-based indicators were proposed to predict postsurgical MDs, including fiber number (FN) 5mm/50mm , FN 10mm/50mm , and proportion of fiber voxel points (FP) 5mm/50mm , and FP 10mm/50mm . FN 5mm/50mm and FN 10mm/50mm , indicated the proportion of fiber numbers from 5 to 50mm and from 10 to 50mm from the lesion border, respectively. FP 5mm/50mm and FP 10mm/50mm indicate the proportion of fiber voxel points from 5 to 50mm and from 10 to 50mm from the lesion border, respectively. The 4 indicators may reflect the lesion-to-fiber spatial relationship and the degree of potential CST damage caused by BAVM lesions. These indicators were automatically calculated by AI algorithm-assisted methods based on the results of lesion segmentation, fiber tracking, registration, and lesion dilation. In the process of indicator calculations, we used the method of lesion dilation to illustrate the distances of 5 mm, 10 mm, and 50mm from the borders of the lesion (Fig 3).

Development of a Machine Learning Model
In our study, we used the least absolute shrinkage and selection operator (LASSO) regression with the Pearson correlation coefficient to select the optimal features to develop the ML model among clinical features (including age, sex, lobe, size, deep perforating artery supply, deep venous drainage, diffuseness, hemorrhage, seizure, S-M score, LCD AI , and AI-based features (including FN 5mm/50mm , FN 10mm/50mm , FP 5mm/50mm , and FP 10mm/50mm ). The aim of this process was to minimize the prediction error, determined by the following equation: where b j is the regression coefficient of variables selected and t is the L1-norm of the regression coefficient that controls the degree of penalty. 28 The impact of the penalty function was that coefficient estimates with a small contribution to the model were forced to be exactly zero.
On the basis of LASSO regression, a nomogram was built to provide a quantitative tool for clinical use. Additionally, different ML algorithms were used to build the model to find the best algorithm, including logistic regression (LR), random forest, Extreme Gradient Boosting (XGBoost) (https://www.geeksforgeeks.org/ xgboost/), and support vector machine.
In this study, nested cross-validation comprising an outer cross-validation loop and an inner cross-validation loop was used for model optimization and evaluation. In the outer cross-validation loop, the data set was split into 10 equally sized folds based on k-fold stratified cross-validation. For each cross-validation iteration, 9 folds of data were used as the training set and 1 was used as the testing set. The training set was used as input in the inner cross-validation loop and was split into 5 folds (4 were used for training and 1 was used for validation) for feature selection and hyperparameter optimization. 29 The nested cross-validation was randomly repeated 50 times, and 50 different experimental results were obtained to ensure the robustness and stability of the model. The mean areas under the curve (AUCs) of the models built by different algorithms were compared to find the best performing algorithm to build the ML model.

Statistical Analysis
Statistical analyses were performed with SPSS, Version 20.0.0 (IBM). Nomograms, calibration plots, and decision curves were generated using R statistical and computing software, Version R-4.0.5 (http://www.r-project.org). Receiver operating characteristic (ROC) plots were generated using Python, Version 3.6.0. The association between the variables and postsurgical MDs was analyzed using univariate and multivariate analyses. The AUCs of different ML algorithms were compared to select the optimum algorithm with the highest AUC to build the ML model. The AUCs were calculated to compare the predictive ability of the AI-based indicator and ML model with reported predictive methods (LCD AI , S-M score, and HDVL score). 7,30 The DeLong test was performed to compare the AUC between FN 10mm/50mm and other predictive methods. The accuracy, specificity, sensitivity, precision, recall, and F1 score of different predictive methods were calculated to evaluate their predictive abilities. A calibration plot was used to graphically represent the agreement between the probability predicted by the nomogram and the actual probability of postsurgical MDs. The Brier Score of the calibration plot was calculated to evaluate the predictive ability of the nomogram. All statistical tests were 2-sided with a significance level of P , .05.

Clinical Characteristics
A total of 83 surgically treated patients with M-AVM were included in this study. Among all the patients, there were 53 (62.4%) male and 30 (37.6%) female patients, with a mean age of 30.2 (SD, 11.5) years. The mean lesion diameter was 35

Development of the ML Model and Nomogram
FN 10mm/50mm , the S-M score, and diffuseness were selected by LASSO regression to establish the ML model. The Pearson coefficient between any 2 of these factors was ,0.35 (Fig 4A). The AUCs of LR, random forest, XGBoost, and support vector machine algorithms in prediction were 0.88 (SD, 0.07), 0.86 (SD, 0.07), 0.86 (SD, 0.07), and 0.77 (SD, 0.12), respectively. The LR with the highest AUC of 0.88 (SD, 0.07) was selected as the algorithm for the ML model (Fig 4B). A nomogram was created combining FN 10mm/50mm , the S-M score, and diffuseness based on LASSO regression (Online Supplemental Data). The graphic preliminary score for each of the 3 factors was summed to generate the total score, which indicates the probability of postsurgical MDs. The calibration curve with a Brier Score of 0.131 showed excellent agreement between the predicted probabilities of the nomogram and the observed probabilities for postsurgical MDs (Online Supplemental Data).

DISCUSSION
To the best of our knowledge, there is no satisfactory method that is able to accurately reflect the 3D spatial relationship between BAVM lesions and the CST to predict postsurgical motor function. In this study, we proposed a novel indicator, FN 10mm/50mm, acquired by AI, which has excellent performance for predicting postsurgical MDs. Meanwhile, an ML model consisting of FN 10mm/50mm , the S-M score, and diffuseness was developed, having better prediction performance.
Currently, AI is widely used to develop a predictive model for clinical prognosis because of its irreplaceable advantages. 12 First, AI has the ability to analyze diverse data types (eg, demographic data, imaging data, and doctors' free-text notes) and incorporate them into predictions for prognosis. 13 Second, AI can alleviate the subjectivity and need for expertise in the interpretation of medical images and clinical evaluation. Oermann et al 31 used an ML model to predict outcomes after radiosurgery for BAVMs and achieved good performance. Third, AI is able to reconstruct the complex geometry of stereoscopic objects captured through sophisticated imaging instruments and calculate the quantitative indicators reflecting their spatial relationship that go beyond those measured by human readers. 14,15 In this study, we used the FN 10mm/50mm automatically calculated by AI to indicate the potential for CST damage due to surgery, which could reflect the spatial relationship of the 3D orientation of the CST and the border of the BAVM lesion. Furthermore, an ML model consisting of FN 10mm/50mm , diffuseness, and S-M grading was also developed for prognosis prediction and achieved a better performance.
The distances of 5 and 10 mm were chosen as variables on the basis of our previous studies. 6,17 In our previous study, we found that for the BAVMs involving the eloquent areas, the LCDs of 5 and 10 mm were the cutoff values for predicting postoperative dysfunction. 6,17 It suggested that the fibers within 5 or 10 mm from the border of AVM lesions may be most likely to be injured during the resection of AVMs. 6,17 Meanwhile, according to the anatomic evaluation, for the recruited AVMs adjacent to CSTs  (the shortest distance between AVM lesions and CSTs of #10mm), a range of 50mm from the boundary of AVM lesions may include all CST fibers. Therefore, we used 5 mm/50mm and 10 mm/50mm to reflect the potentially injured fiber proportion of CSTs in surgery. In addition, consistent with our previous study, we used the mRS score at 6 months after microsurgery to define the postoperative MDs (mRS . 2). 5,32 Patients' visual or language deficits were also taken into consideration when grading the mRS. 33 Most interesting, our results showed that the proportion of potentially affected fiber numbers was more effective than the proportion of potentially affected voxel points in predicting longterm postsurgical MDs. This finding suggested that once the CSTs were injured at 1 point, they affected the whole length of the CSTs. We speculated that this situation might be related to the characteristics of nerve axons. According to previous reports, the interruption of axon anatomic continuity will destroy the function of the whole nerve fiber, resulting in partial or total loss of its functions due to the degeneration of nerve fibers distal to the lesion and eventual death of axotomized neurons. 34 This outcome means that neural fibers have less plasticity. 35 Therefore, neurosurgeons should be more cautious and alert when crossing a section perpendicular to CSTs in an operation to reduce the possibility of postsurgical MDs.
Our study presents novelty in the algorithm: First, we proposed a method of automatically tracking WM fibers on DTI. 36 DTI has been indicated as the only noninvasive way to display WM tracts in vivo and has a unique advantage in identifying and estimating neural fibers at the subcortical level. 37,38 However, in clinical work, the differences in fiber tracking techniques in diverse clinical centers and the experiences of diverse clinicians may lead to a divergence in the accuracy of fiber tracking among different clinicians. [8][9][10][11] In this study, the Constant Solid Angle model and the EuDX algorithm were used to track the CST automatically; this approach could reduce the bias of identification and improve the accuracy and efficiency of fiber tracking in different clinical centers. 21,39 Second, for our registration method, compared with traditional registration methods, we used a neural network to perform both affine and deformable registration in a cascade. The deep learning-based registration has higher accuracy and shorter execution time than conventional registration methods. 40,41 Third, the lesion-dilation operation in our study was implemented using affine transformation, which can accurately control the expansion of the lesions at any distance in all directions. Finally, the LR algorithm with L2 regularization was used for model establishment. The algorithm also unified the LASSO characteristics and ridge characteristics to generate sparse weights to eliminate irrelevant features, possibly preventing overfitting and improving the generalization of the model. 42 Our study had several limitations. First, it was conducted in a single center with a limited number of patients due to a relatively small number of this kind of surgery performed. Second, due to its retrospective nature, it may be difficult to avoid information bias, selection bias, and confounding factors. Third, the automated segmentations and fiber tracking may have some errors because of the unclear boundary of AVM lesions themselves, rough surface of the manually section-by-section labeled lesions, and so forth. The model will be trained with more cases to minimize these errors in our following study. Finally, although internal cross-validation was performed, further external validation studies including more patients and from other centers are needed.

CONCLUSIONS
Surgical treatment for M-AVMs is challenging for neurosurgeons. An accurate prediction of the possibility of postsurgical MDs would help guide neurosurgeons in the selection of surgical candidates. In this study, a new AI-based indicator FN 10mm/50mm and a corresponding ML model are proposed that both benefit from the assistance of AI techniques. This approach will have potential in the development of predictive software and to assist doctors with different levels of clinical experience in various clinical centers, providing more precise consultation to patients with M-AVM.