Skip to main content
Advertisement

Main menu

  • Home
  • Content
    • Current Issue
    • Accepted Manuscripts
    • Article Preview
    • Past Issue Archive
    • Video Articles
    • AJNR Case Collection
    • Case of the Week Archive
    • Case of the Month Archive
    • Classic Case Archive
  • Special Collections
    • Low-Field MRI
    • Alzheimer Disease
    • ASNR Foundation Special Collection
    • Photon-Counting CT
    • AJNR Awards
    • View All
  • Multimedia
    • AJNR Podcasts
    • AJNR SCANtastic
    • Trainee Corner
    • MRI Safety Corner
    • Imaging Protocols
  • For Authors
    • Submit a Manuscript
    • Submit a Video Article
    • Submit an eLetter to the Editor/Response
    • Manuscript Submission Guidelines
    • Statistical Tips
    • Fast Publishing of Accepted Manuscripts
    • Graphical Abstract Preparation
    • Imaging Protocol Submission
    • Author Policies
  • About Us
    • About AJNR
    • Editorial Board
    • Editorial Board Alumni
  • More
    • Become a Reviewer/Academy of Reviewers
    • Subscribers
    • Permissions
    • Alerts
    • Feedback
    • Advertisers
    • ASNR Home

User menu

  • Alerts
  • Log in

Search

  • Advanced search
American Journal of Neuroradiology
American Journal of Neuroradiology

American Journal of Neuroradiology

ASHNR American Society of Functional Neuroradiology ASHNR American Society of Pediatric Neuroradiology ASSR
  • Alerts
  • Log in

Advanced Search

  • Home
  • Content
    • Current Issue
    • Accepted Manuscripts
    • Article Preview
    • Past Issue Archive
    • Video Articles
    • AJNR Case Collection
    • Case of the Week Archive
    • Case of the Month Archive
    • Classic Case Archive
  • Special Collections
    • Low-Field MRI
    • Alzheimer Disease
    • ASNR Foundation Special Collection
    • Photon-Counting CT
    • AJNR Awards
    • View All
  • Multimedia
    • AJNR Podcasts
    • AJNR SCANtastic
    • Trainee Corner
    • MRI Safety Corner
    • Imaging Protocols
  • For Authors
    • Submit a Manuscript
    • Submit a Video Article
    • Submit an eLetter to the Editor/Response
    • Manuscript Submission Guidelines
    • Statistical Tips
    • Fast Publishing of Accepted Manuscripts
    • Graphical Abstract Preparation
    • Imaging Protocol Submission
    • Author Policies
  • About Us
    • About AJNR
    • Editorial Board
    • Editorial Board Alumni
  • More
    • Become a Reviewer/Academy of Reviewers
    • Subscribers
    • Permissions
    • Alerts
    • Feedback
    • Advertisers
    • ASNR Home
  • Follow AJNR on Twitter
  • Visit AJNR on Facebook
  • Follow AJNR on Instagram
  • Join AJNR on LinkedIn
  • RSS Feeds

AJNR at ASNR25 | Join us at BOOTH 312 and more. Check out our schedule

Research ArticleBRAIN TUMOR IMAGING

Development and Evaluation of Automated Artificial Intelligence–Based Brain Tumor Response Assessment in Patients with Glioblastoma

Jikai Zhang, Dominic LaBella, Dylan Zhang, Jessica L. Houk, Jeffrey D. Rudie, Haotian Zou, Pranav Warman, Maciej A. Mazurowski and Evan Calabrese
American Journal of Neuroradiology April 2025, DOI: https://doi.org/10.3174/ajnr.A8580
Jikai Zhang
aFrom the Department of Electrical and Computer Engineering (J.Z., M.A.M.), Duke University, Durham, North Carolina
cDuke Center for Artificial Intelligence in Radiology (J.Z., E.C.), Duke University Medical Center, Durham, North Carolina
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jikai Zhang
Dominic LaBella
dDepartment of Radiation Oncology (D.L.), Duke University Medical Center, Durham, North Carolina
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dylan Zhang
fDepartment of Radiology (D.Z., J.L.H., M.A.M., E.C.), Duke University Medical Center, Durham, North Carolina
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Dylan Zhang
Jessica L. Houk
fDepartment of Radiology (D.Z., J.L.H., M.A.M., E.C.), Duke University Medical Center, Durham, North Carolina
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jessica L. Houk
Jeffrey D. Rudie
gDepartment of Radiology (J.D.R.), University of California San Diego, San Diego, California.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jeffrey D. Rudie
Haotian Zou
eDepartment of Biostatistics and Bioinformatics (H.Z., M.A.M.), Duke University School of Medicine, Durham, North Carolina
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pranav Warman
hDuke University School of Medicine(P.W.), Durham, North Carolina
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maciej A. Mazurowski
aFrom the Department of Electrical and Computer Engineering (J.Z., M.A.M.), Duke University, Durham, North Carolina
bDepartment of Computer Science (M.A.M.), Duke University, Durham, North Carolina
eDepartment of Biostatistics and Bioinformatics (H.Z., M.A.M.), Duke University School of Medicine, Durham, North Carolina
fDepartment of Radiology (D.Z., J.L.H., M.A.M., E.C.), Duke University Medical Center, Durham, North Carolina
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Evan Calabrese
cDuke Center for Artificial Intelligence in Radiology (J.Z., E.C.), Duke University Medical Center, Durham, North Carolina
fDepartment of Radiology (D.Z., J.L.H., M.A.M., E.C.), Duke University Medical Center, Durham, North Carolina
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Evan Calabrese
  • Article
  • Figures & Data
  • Supplemental
  • Info & Metrics
  • Responses
  • References
  • PDF
Loading

SUMMARY:

This project aimed to develop and evaluate an automated, AI-based, volumetric brain tumor MRI response assessment algorithm on a large cohort of patients treated at a high-volume brain tumor center. We retrospectively analyzed data from 634 patients treated for glioblastoma at a single brain tumor center over a 5-year period (2017–2021). The mean age was 56 ± 13 years. 372/634 (59%) patients were male, and 262/634 (41%) patients were female. Study data consisted of 3,403 brain MRI exams and corresponding standardized, radiologist-based brain tumor response assessments (BT-RADS). An artificial intelligence (AI)-based brain tumor response assessment (AI-VTRA) algorithm was developed using automated, volumetric tumor segmentation. AI-VTRA results were evaluated for agreement with radiologist-based response assessments and ability to stratify patients by overall survival. Metrics were computed to assess the agreement using BT-RADS as the ground-truth, fixed-time point survival analysis was conducted to evaluate the survival stratification, and associated P-values were calculated. For all BT-RADS categories, AI-VTRA showed moderate agreement with radiologist response assessments (F1 = 0.587–0.755). Kaplan-Meier survival analysis revealed statistically worse overall fixed time point survival for patients assessed as image worsening equivalent to RANO progression by human alone compared to by AI alone (log-rank P = .007). Cox proportional hazard model analysis showed a disadvantage to AI-based assessments for overall survival prediction (P = .012). In summary, our proposed AI-VTRA, following BT-RADS criteria, yielded moderate agreement for replicating human response assessments and slightly worse stratification by overall survival.

ABBREVIATIONS:

2D
2-dimensional
AI
artificial intelligence
AI-VTRA
artificial intelligence volumetric tumor response assessment
BT-RADS
Brain Tumor Reporting and Data System
C-index
concordance index
FeTS
Federated Tumor Segmentation
GBM
glioblastoma
IDH
isocitrate dehydrogenase
NLP
natural language processing
OS
overall survival
RANO
Response Assessment in Neuro-Oncology
RECIST
Response Evaluation Criteria in Solid Tumors
SD
standard deviation
VDET
volumetric differences for enhancing tumor
VDFLAIR
volumetric differences for FLAIR

Glioblastoma (GBM) is the most common primary brain malignancy in adults and remains difficult to treat even with the benefit of decades of experience.1 Despite improved understanding of the genetic underpinnings of brain malignancies, treatment options for GBM are limited, and survival remains poor.2⇓-4 GBM management is further complicated by the complexity and frequency of clinical and radiologic response assessments, which may occur as often as every 4 weeks during active treatment.5 Brain MRI plays a critical role in GBM treatment response assessments and, along with comprehensive clinical assessment, is central for determining treatment response and/or disease progression.6,7

Given the importance of MRI for GBM treatment monitoring, there have been extensive efforts to develop standardized MRI response assessment criteria.8 Originally proposed in 1990, the McDonald criteria were widely considered the standard for GBM MRI response assessments, particularly for clinical trials.9 While similar to other solid tumor response assessment criteria, such as the Response Evaluation Criteria in Solid Tumors (RECIST),10 the McDonald criteria employed 2-dimensional (2D) tumor measurements to better capture the complex shape that is typical of GBM. In the following decades, the Response Assessment in Neuro-Oncology (RANO) criteria and its variations6,11 superseded the McDonald criteria, with their primary advantage being the consideration of both enhancing and nonenhancing tumors in addition to relevant treatment modalities. While RANO continues to be widely used in clinical trials, it is not commonly used for routine clinical assessments owing to its complexity.7 RANO 2.0 updates RANO by providing unified criteria to assess gliomas regardless of their grades and recommends volumetric assessments.27

More recent efforts toward response assessment standardization have included the Brain Tumor Reporting and Data System (BT-RADS), a standardized MRI reporting system designed to simplify brain MRI reporting for routine clinical follow-up of patients with GBM.12⇓-14 Similar to RANO, BT-RADS relies on measurements of both enhancing and nonenhancing tumors, and the BT-RADS 4 category was designed to be equivalent to the primary imaging criterion for RANO progression.6,12 The main advantage of BT-RADS is its ease of use and implementation. In contrast to RANO, BT-RADS has seen more rapid adoption for routine clinical use and has been implemented at several major brain tumor centers since it was first proposed in 2018.13 RANO 2.0 and BT-RADS differ in scope (RANO 2.0 primarily focused on clinical trials and BT-RADS on routine assessments) and in approach. Specifically, RANO 2.0 proposes a unified set of criteria for high- and lower-grade gliomas, while BT-RADS was designed for high-grade gliomas. Both criteria acknowledge changes in enhancing and nonenhancing tumors, and both share similar criteria for tumor progression (25% increase in enhancing tumor). However, other RANO 2.0 categories do not have straightforward relationships to BT-RADS categories. For example, RANO 2.0 “partial response” requires a 50% 2D/linear decrease in enhancing tumor, while BT-RADS 1 (imaging improvement) does not specify an enhancing tumor decrease threshold. However, BT-RADS, like its predecessors, relies on 2D measurements, which may not accurately capture the complex 3D shape of GBM.15 In addition, it should be acknowledged that human BT-RADS assessments are an imperfect reference standard as they are somewhat subjective and dependent on manual measurements and interpreting radiologists’ adherence to published guidelines. While previous volumetric (3D) response assessment criteria have been proposed, implementation has been hindered by the difficulty in translating volumetric changes into response assessment categories.

Automated artificial intelligence (AI)-based volumetric brain tumor MRI segmentation has recently matured into a clinically viable tool principally because of a large collaborative efforts such as the multimodal brain tumor segmentation challenge16 and the global Federated Tumor Segmentation (FeTS) initiative.17 This has led several groups to explore the use of AI-based segmentation tools for automated volumetric GBM MRI response assessment.18⇓⇓-21 In this work, we evaluate an automated, AI-based, volumetric brain tumor response assessment tool on a large cohort of patients treated at a high-volume brain tumor center. We compare AI-based results to standardized neuroradiologist response assessments in 2 key domains: ability to recapitulate human response assessments and ability to stratify patients by overall survival (OS).

MATERIALS AND METHODS

Study Population

This was a single-center, retrospective, Institutional Review Board–approved study with a waiver for informed consent. Candidate participants were identified by systematic search of electronic health encounter records from 2017–2021 for all adult patients with a diagnosis of “glioblastoma” at a high-volume academic brain tumor center by using Center for Medicaid Services Hierarchical Condition Category codes (n = 4689). This included both isocitrate dehydrogenase (IDH) mutant and wild-type grade 4 astrocytomas in line with current WHO classifications at the time of diagnosis (referred to as “GBM” henceforth for conciseness). Exclusion criteria were: patients lacking at least 1 MRI brain examination with and without intravenous contrast (n=3199) and patients lacking at least 1 standardized neuroradiologist response assessment (n=856). The final study population consisted of 634 patients. A patient flow diagram is provided as Fig 1.

FIG 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIG 1.

Patient flow diagram for study inclusion.

Neuroradiologist Response Assessments

Formal neuroradiologist-based GBM MRI response assessments by using the BT-RADS structured reporting system were available as part of routine clinical care. BT-RADS scores and baseline comparison examination dates were extracted from radiology reports by using a custom semisupervised natural language processing (NLP) algorithm with near-perfect internal validation performance. The full data curation pipeline is shown in Fig 2. For each patient, we searched for all reports containing BT-RADS scores. Then, for each BT-RADS report, the NLP algorithm retrieved the prior examination date and searched for its prior examination with the retrieved examination date (complete methodologic details and performance assessment provided as Supplemental Data). This yielded 2446 pairs of examinations (current and baseline prior) with BT-RADS scores. One baseline prior can be paired with multiple follow-up examinations. BT-RADS scores included the following numerical categories: 1 = imaging improvement, 2 = no appreciable imaging change, 3 = imaging worsening, 4 = imaging worsening with >25% increase in 2D enhancing tumor measurements (equivalent to RANO progression).

FIG 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIG 2.

Pipeline of data curation process aided by NLP and image segmentation methods.

MRI Data

All routine brain tumor MRI examinations were performed with a Brain Tumor Imaging Protocol,22 a compliant protocol including 3D, gradient-echo, T1-weighted pre- and postcontrast sequences and 2D, T2-weighted, and T2-FLAIR sequences. MRI data were retrieved for each pair of examinations corresponding to the BT-RADS scores identified in the previous section, which resulted in 3403 unique MRI examinations. Scanner information is included in the Supplemental Data.

Image Processing and Automated Tumor Segmentation

MRI data underwent standard image preprocessing steps including translation-only alignment to the Montreal Neurological Institute brain atlas (MNI352) for FOV standardization23 and skull stripping by using a publicly available deep learning method.24 Preprocessed images then underwent automated, volumetric tumor segmentation by using 3D convolutional segmentation neural network. This model was specifically designed for posttreatment examinations including 4 distinct compartments: resection cavity, enhancing tumor, necrotic tumor core, and surrounding nonenhancing T2-FLAIR signal abnormality. The final model was pretrained on an external postoperative brain MRI examination. We utilized nnU-Net36 to train and validate the model. Internal validation results showed a mean ± standard deviation (SD) of 0.8861 ± 0.2476 for enhancing tumor and 0.9833 ± 0.0372 for surrounding nonenhancing FLAIR signal abnormality (complete methodologic details and performance assessment provided as Supplemental Data).

Artificial Intelligence Volumetric Tumor Response Assessment (AI-VTRA)

An AI scoring system (AI-VTRA) based on volumetric differences for enhancing tumor (VDET) and surrounding nonenhancing FLAIR hyperintensity (VDFLAIR) was computed for each pair of examinations in the data set and was used to develop AI-based volumetric equivalents to BT-RADS scores. BT-RADS 4 was defined as a ≥40% increase in VDET, as the extrapolated volumetric threshold derived from 2D measurements, for measurable disease (enhancing tumor volume greater than 1 mL) consistent with multiple previously published studies.25⇓-27,38 Other relevant volumetric thresholds (notably a ± 10% threshold for no significant change) were determined empirically, as previously published values did not exist. BT-RADS 3 was defined as either 1) VDET between 10% and 40% increase or 2) VDET < 10% change and VDFLAIR ≥ 40% increase. BT-RADS 2 was defined as either 1) VDET < 10% change or 2) VDET ≥ 10% increase and VDFLAIR ≥ 40% increase. BT-RADS 1 was defined as either 1) VDET ≥ 10% decrease or 2) VDET < 10% change and ≥ 40% decrease in VDFLAIR. Complete criteria for AI-VTRA are presented in Table 1. To assess the importance of including VDFLAIR, we also evaluated AI-VTRAET, which was solely based on VDET (Supplemental Data).

View this table:
  • View inline
  • View popup
Table 1:

Relationship between BT-RADS score and AI-VTRA for each glioblastoma MRI follow-up assessment scorea

AI Performance for Recapitulating Human BT-RADS Scores

Performance of automated volumetric criteria for replicating human BT-RADS scores was evaluated across the entire data set. Composite performance for all BT-RADS categories was assessed with the macro-F1 score. Performance for individual BT-RADS categories was assessed with sensitivity, specificity, precision, micro-F1 score (calculated globally across all categories), and macro-F1 score (calculated for each category and then averaged).

AI Performance for Survival Stratification

Performance for survival stratification was assessed based on the highest response assessment category assigned within the first 6 months of MRI follow-up, which typically (though not necessarily) corresponded to the second postoperative MRI examination. Time from initial diagnosis was not available for all patients and was not included in the analysis. Three hundred twenty-three of 634 (51%) patients had at least 1 BT-RADS assessment in the first 6 months of follow-up and were included in this subanalysis. This cohort was substratified by response score and whether they were assigned this score by human alone, by AI alone, or by both human and AI simultaneously. We plotted Kaplan-Meier survival curves of each substrata to visualize survival probability. Patients who were still alive at the last available follow-up were censored. Log-rank tests were used to determine the pair-wise differences between survival curves.

Multivariate Survival Modeling

Multivariate Cox proportional hazard models were applied for human (Eq 1) and AI assessments (Eq 2) separately to assess the relative predictive value for survival prediction. Besides the scores, we included normalized age, sex, race, and ethnicity in the model. Time between baseline and follow-up examinations was considered as the time-varying covariate in the Cox model. We removed observations due to unknown IDH status before fitting the Cox models. The Concordance index (C-index) was calculated for each Cox model. To compare the difference in C-index between 2 Cox models, we applied statistical tests that account for the paired data (see Supplemental Data for details). Embedded Image Embedded Image

Statistical Analyses

Statistical analyses were performed in Python Version 3.8 and R Version 4.2. Kaplan-Meier estimates were computed by using the “lifelines” package in Python. Cox modeling was performed in R by using the “survival” package. The scale method in R was used to normalize Age. We set the confidence level as 95%, and P values less than .05 were considered significant.

RESULTS

Patient Characteristics

Basic study participant demographic data are reported in Table 2. The mean age was 56 ± 13 years. Three hundred seventy-two of 634 (59%) patients were men, and 262/634 (41%) patients were women. Five hundred sixty-six of 634 (89%) patients listed their primary self-reported race as white, 41/634 (7%) as black or African American, and 9/634 (1%) as Asian. Eight of 634 (1%) patients reported a secondary race, and 10/634 (2%) patients did not report race. Four hundred seventy-nine of 634 (76%) patients had an IDH wild-type tumor, 63/634 (10%) patients had an IDH mutant tumor, and 92/634 (14%) patients had missing or inconclusive IDH testing.

View this table:
  • View inline
  • View popup
Table 2:

Basic demographics for the 634 patients included in the study cohorta

MRI Data and Segmentation

The 634 included patients had 3403 qualifying MRI brain examinations (average of 3.85 examinations per patient). The average time between baseline and follow-up studies was 160 days, with an SD of 236 days. Automated volumetric tumor segmentation was successfully completed for all examinations without errors. The average segmentation time was 11.5 seconds per examination. Representative segmented MRI from 4 different patients’ examination pairs with each of the different assessment categories are presented in Fig 3.

FIG 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIG 3.

Example MR images, radiologist response assessment categories, and volumetric changes for 4 patients at 2 different time points.

AI Performance for Recapitulating Human BT-RADS Scores

For recapitulating human BT-RADS scores, AI-VTRA had a higher macro-F1 score (AI-VTRA macro-F1 = 0.548) compared with AI-VTRAET (AI-VTRAET macro-F1 = 0.535). Performance metrics for predicting each of the individual BT-RADS scores are provided in Table 3. AI-VTRAET alone demonstrated improved performance compared with AI-VTRA for a single score, BT-RADS 2 (no significant change). Overall, automated volumetrics yielded moderate performance (F1 > 0.7) for predicting neuroradiologist BT-RADS scores of 1, 2, and 4, and yielded moderate performance (F1 > 0.55) for predicting BT-RADS 3. Total counts and percentages for each score and an analysis of major discrepancies between human and AI assessments are provided in the Supplemental Data.

View this table:
  • View inline
  • View popup
Table 3:

Performance metrics (macro-F1, micro-F1, sensitivity, specificity, and precision) for AI-VTRA/AI-VTRAET predictions of radiologist-based response assessment. Within each category, we binarized the BT-RADS and AI predictions based on the target score and computed the metrics

Fixed Time Point Survival Analysis

Four hundred sixty-five of 634 (73%) patients died during the follow-up period. Median OS for the cohort was 443 days from the first available MRI examination, and median survival after the 6-month time point selected for the fixed time point survival analysis (S6mo) was 401 days. Median S6mo stratified by the highest human (BT-RADS) response category assessed during the first 6 months of follow-up was 401 days for BT-RADS 1, 625 days for BT-RADS 2, 394 days for BT-RADS 3, and 207 days for BT-RADS 4. Median S6mo stratified by the highest AI (AI-VTRA) category assessed during the first 6 months of follow-up was 450 days for imaging improvement, 501 days for no significant change, 346 days for imaging worsening, and 305 days for image worsening equivalent to RANO progression. Survival curves for each BT-RADS and AI-VTRA category are presented in Fig 4. There was statistically worse overall S6mo for patients assessed as image worsening equivalent to RANO progression by human alone compared with by AI alone (log-rank P = .007). For other assessment categories, S6mo was not significantly different when assessed by AI alone versus human alone.

FIG 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIG 4.

Fixed time point Kaplan-Meier survival curves for each response assessment category stratified by AI- and radiologist-based assessment methods. * Indicates a statistically significant difference.

Multivariate Survival Modeling

A multivariate Cox proportional hazard model for S6mo yielded a C-index for human assessments versus AI assessments (0.637 [0.600, 0.674] versus 0.594 [0.555, 0.633], P = .012), indicating significant improvement in predictive ability for human BT-RADS assessment. We showed hazard ratios and 95% CI of fitted fixed effects in Table 4 and Table 5 for BT-RADS and AI-VTRAS, respectively. Both models suggested that Imaging RANO Progression (score of 4) had significantly worse survival than No change (score of 2). The model that included BT-RADS suggested significantly worse survival in Improving (score of 1) and Worsening (score of 3) than No change.

View this table:
  • View inline
  • View popup
Table 4:

Hazard ratios, confidence intervals, and P values of BT-RADS, age, sex, primary self-reported race, self-reported ethnicity, and IDH

View this table:
  • View inline
  • View popup
Table 5:

Hazard ratios, confidence intervals, and P values of AI-VTRA, age, sex, primary self-reported race, self-reported ethnicity, and IDH

DISCUSSION

The goal of this study was to compare AI-based volumetric GBM MRI response assessment with standardized radiologist response assessments. First, we addressed the ability of AI to recapitulate radiologist response assessments. Our results show that AI-based volumetric response assessment yielded overall moderate performance (Macro F1 ≈ 0.7) for recapitulating most human response assessment categories (BT-RADS 1, 2, and 4). Performance was lowest (Macro F1 ≈ 0.6) for predicting BT-RADS 3. This is likely related to the high variability of this assessment category, which ranges from minimal changes to relatively large tumor volume increases that do not meet the threshold for RANO progression. Prediction of this category is further complicated by the need to specify a volumetric threshold for “no significant change,” which is incongruous with human response assessments where this threshold may differ depending on the clinical scenario. For example, radiologists may intuitively ignore T2/FLAIR signal attributed to posttreatment changes, whereas the volumetric segmentation model does not explicitly distinguish nonenhancing tumor versus treatment effect. These results suggest that AI-based volumetric response assessments may be better suited as a clinical decision support adjunct rather than a replacement for radiologists’ assessments.28 While different thresholds may ultimately be relevant for IDH mutant versus wild-type grade 4 tumors, they are currently treated the same by BT-RADS. The subgroup analysis of the IDH wild-type tumor data set (included in the section IDH Subgroup Analysis in Supplemental Data) showed that by using the same threshold, the composite metric AI-VTRA outperforms the AI-VTRAET in both the IDH wild-type data set and the original data set.

A separate but related domain for evaluating AI-based volumetric response assessment is its ability to stratify patients by OS. Compared with a similar survival analysis study on BT-RADS stratification conducted by Kim et al,39 our study reported the same nonsignificant hazard ratios for IDH status and significantly high hazard ratios for score 4. For all assessment categories other than BT-RADS 4, there was no statistically significant difference in OS, whether assigned by AI alone or human alone. However, OS was statistically lower for patients assessed as BT-RADS 4 by human alone compared with AI alone, with a median S6mo of 207 versus 305 days, respectively. Interestingly, when BT-RADS 4 was assigned by both AI and human assessments, survival was more similar to those assigned by AI alone. In addition, though there were no statistically significant survival differences in other assessment categories whether assigned by AI alone versus human alone, human assessment resulted in larger differences in survival between assessment categories. These findings are indicative of the fact that human assessments often draw on additional findings or clinical history not captured by the proposed AI method, such as the progression of nonenhancing tumors in the setting of anti-angiogenic therapy. Incorporating this additional data will likely be important for improving AI-based GBM assessment methods in the future. We observed poorer S6mo for BT-RADS 1 (401 days) than BT-RADS 2 (625 days), almost equivalent to BT-RADS 3 (394 days). We suspect that this might be due to bevacizumab pseudoresponse in patients with late-stage recurrent disease, which is consistent with the prior published report40 on survival following BT-RADS assessment.

GBM MRI response assessments are highly complex owing to the highly variable appearance of recurrent tumor and treatment changes. There are several well-known issues with current response assessments that could be addressed with AI, including the inherent inaccuracies and high interrater variability of 2D measurements.11,29⇓⇓-32 The results of this study add to a growing body of literature focused on AI-based GBM MRI response assessments,28,33,34 which, like many applications of AI in neuro-oncology, have yet to deliver promised benefits in a meaningful way.35 However, our results highlight 3 important observations: 1) simple rule-based AI volumetric response assessments yield only moderate performance for predicting human response assessments, 2) by using this approach, human assessments yielded a small but significant improvement in survival stratification performance, and 3) major discrepancies between human and AI assessments were rare, and both human and AI error were identified as causes. Overall, these results highlight the need for better AI models that can incorporate additional clinical and imaging variables into the response assessment. Though potentially incomplete segmentation of the lesion from the AI model may contribute to the survival discrepancy between AI and human assessment, we do not believe that incomplete segmentation of tumors was a major factor in our study. Based on the defined rules for AI-VTRA and our evaluations, we believe that the primary factors are 1) a 25% 2D increase does not precisely correspond to a 40% volumetric increase and 2) the fact that human determination of “no significant change” does not necessarily correspond to any specific volumetric threshold.

Several prior studies have investigated automated brain tumor MRI segmentation as a means of assessing longitudinal tumor burden and even predicting time to progression and OS.18,19,21,40,41 However, these studies have largely focused on automated volumetrics as an alternative to standard response assessment criteria rather than as a comprehensive method for automation of these criteria. For example, Kickingereder et al21 compared brain tumor growth dynamics derived from automated segmentation with central RANO assessment for a longitudinal multi-institution cohort of 532 patients and found that the volumetric assessment was superior for predicting OS. However, to our knowledge, no prior work has evaluated automated volumetrics for predicting human BT-RADS scores or RANO progressive disease assessments. Our study differs from prior work in that it includes a larger number of patients and focuses on recapitulating human response assessments. This approach focuses on a paradigm of automating existing assessments rather than proposing new ones.

Pseudoprogression of GBM is a posttreatment phenomenon, with variable incidences from at least 9%, that can confuse the interpretations of tumor growth due to the pathology.37 This study focuses exclusively on objective imaging change rather than subjective interpretation of the reason for this change. As such, the problem of pseudoprogression (and pseudoresponse) is not directly addressed and is a major limitation of this approach. Future work will be required to automate prediction of true versus pseudoprogression effectively and will likely require additional inputs such as treatment history and advanced imaging modalities like perfusion-weighted MRI.37,38

This study has several important limitations. First, this was a single-center retrospective study, which limits the generalization of its results. One generalization issue was the class imbalance favoring unchanged/improving conditions in our data set, which may lead to the under-representation of progression cases in this study. As an attempt to account for this issue, we reported multiple classification metrics. Second, this study used a relatively simplistic logic-based approach for assigning tumor volumetric differences to response categories. Third, this study relied on BT-RADS scores for radiologist-based response assessments. The BT-RADS system has been previously validated in several studies13; however, it is not yet as widely utilized as other response assessment criteria such as RANO. Fourth, though our NLP algorithm reached 99% accuracy in our internal validation, we would expect minor NLP- and human-induced errors in information retrieval from the reports, which may cause inexplicable discrepancies between AI versus human evaluations. Fifth, we did not include new lesions, which is part of the RANO progression criteria.7 In future studies, we propose to apply a connected component algorithm to evaluate the growth of each separate lesion region and incorporate this analysis into the AI-VTRA rules. Finally, AI-based response assessments did not benefit from any information on treatment (such as radiation or anti-angiogenic therapy), which fundamentally limits their ability to replicate radiologist-based response assessments.

CONCLUSIONS

AI-based volumetric GBM MRI response assessment following BT-RADS criteria can provide moderate performance for replicating human response assessments and show comparable performance for OS stratification. While this approach is unlikely to be useful for stand-alone response assessment, it may be useful for certain scenarios where radiologist interpretations are infeasible or as an adjunct to radiologist-based response assessment.

Footnotes

  • This research has been supported in part by an award from the Foundation of the American Society of Neuroradiology to Dr. Evan Calabrese titled “Prospective Evaluation of Automated Pre- and Postoperative Tumor Segmentation for Patients with Glioblastoma.”

  • Disclosure forms provided by the authors are available with the full text and PDF of this article at www.ajnr.org.

References

  1. 1.↵
    1. De Vleeschouwer S
    1. Tamimi AF,
    2. Juweid M
    . Epidemiology and outcome of glioblastoma. In: De Vleeschouwer S, ed. Glioblastoma. Exon Publications; 2017 doi:10.15586/codon.glioblastoma.2017.ch8
    CrossRefPubMed
  2. 2.↵
    1. Louis DN,
    2. Perry A,
    3. Wesseling P, et al
    . The 2021 WHO Classification of Tumors of the Central Nervous System: a summary. Neuro Oncol 2021;23:1231–51 doi:10.1093/neuonc/noab106 pmid:34185076
    CrossRefPubMed
  3. 3.↵
    1. Delgado-López PD,
    2. Corrales-García EM
    . Survival in glioblastoma: a review on the impact of treatment modalities. Clin Transl Oncol 2016;18:1062–71 doi:10.1007/s12094-016-1497-x pmid:26960561
    CrossRefPubMed
  4. 4.↵
    1. Marenco-Hillembrand L,
    2. Wijesekera O,
    3. Suarez-Meade P, et al
    . Trends in glioblastoma: outcomes over time and type of intervention: a systematic evidence based analysis. J Neurooncol 2020;147:297–307 doi:10.1007/s11060-020-03451-6 pmid:32157552
    CrossRefPubMed
  5. 5.↵
    1. Reardon DA,
    2. Ballman KV,
    3. Buckner JC, et al
    . Impact of imaging measurements on response assessment in glioblastoma clinical trials. Neuro Oncol 2014;16 Suppl 7:vii24–35 doi:10.1093/neuonc/nou286 pmid:25313236
    CrossRefPubMed
  6. 6.↵
    1. Wen PY,
    2. Macdonald DR,
    3. Reardon DA, et al
    . Updated response assessment criteria for high-grade gliomas: Response Assessment in Neuro-Oncology Working Group. J Clin Oncol 2010;28:1963–72 doi:10.1200/JCO.2009.26.3541 pmid:20231676
    Abstract/FREE Full Text
  7. 7.↵
    1. Chukwueke UN,
    2. Wen PY
    . Use of the Response Assessment in Neuro-Oncology (RANO) criteria in clinical trials and clinical practice. CNS Oncol 2019;8:CNS28 doi:10.2217/cns-2018-0007 pmid:30806082
    CrossRefPubMed
  8. 8.↵
    1. Ramakrishnan D,
    2. von Reppert M,
    3. Krycia M, et al
    . Evolution and implementation of radiographic response criteria in neuro-oncology. Neurooncol Adv 2023;5:vdad118 doi:10.1093/noajnl/vdad118 pmid:37860269
    CrossRefPubMed
  9. 9.↵
    1. Macdonald DR,
    2. Cascino TL,
    3. Schold SC, et al
    . Response criteria for phase II studies of supratentorial malignant glioma. J Clin Oncol 1990;8:1277–80 doi:10.1200/JCO.1990.8.7.1277 pmid:2358840
    Abstract
  10. 10.↵
    1. Eisenhauer EA,
    2. Therasse P,
    3. Bogaerts J, et al
    . New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer 2009;45:228–47 doi:10.1016/j.ejca.2008.10.026 pmid:19097774
    CrossRefPubMed
  11. 11.↵
    1. Ellingson BM,
    2. Wen PY,
    3. Cloughesy TF
    . Modified criteria for radiographic response assessment in glioblastoma clinical trials. Neurotherapeutics 2017;14:307–20 doi:10.1007/s13311-016-0507-6 pmid:28108885
    CrossRefPubMed
  12. 12.↵
    1. Weinberg BD,
    2. Gore A,
    3. Shu H-KG, et al
    . Management-based structured reporting of posttreatment glioma response with the Brain Tumor Reporting and Data System. J Am Coll Radiology 2018;15:767–71 doi:10.1016/j.jacr.2018.01.022 pmid:29503151
    CrossRefPubMed
  13. 13.↵
    1. Gore A,
    2. Hoch MJ,
    3. Shu H-KG, et al
    . Institutional implementation of a structured reporting system: our experience with the brain tumor reporting and data system. Acad Radiology 2019;26:974–80 doi:10.1016/j.acra.2018.12.023 pmid:30661977
    CrossRefPubMed
  14. 14.↵
    1. Zhang JY,
    2. Weinberg BD,
    3. Hu R, et al
    . Quantitative improvement in brain tumor MRI through structured reporting (BT-RADS). Acad Radiology 2020;27:780–84 doi:10.1016/j.acra.2019.07.028 pmid:31471207
    CrossRefPubMed
  15. 15.↵
    1. Chappell R,
    2. Miranpuri SS,
    3. Mehta MP
    . Dimension in defining tumor response. J Clin Oncol 1998;16:1234 doi:10.1200/JCO.1998.16.3.1234 pmid:9508213
    CrossRefPubMed
  16. 16.↵
    1. Menze BH,
    2. Jakab A,
    3. Bauer S, et al
    . The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans Med Imaging 2015;34:1993–2024 doi:10.1109/TMI.2014.2377694 pmid:25494501
    CrossRefPubMed
  17. 17.↵
    1. Pati S,
    2. Baid U,
    3. Edwards B, et al
    . Federated learning enables big data for rare cancer boundary detection. Nat Commun 2022;13:7346 doi:10.1038/s41467-022-33407-5 pmid:36470898
    CrossRefPubMed
  18. 18.↵
    1. Rudie JD,
    2. Calabrese E,
    3. Saluja R, et al
    . Longitudinal assessment of posttreatment diffuse glioma tissue volumes with three-dimensional convolutional neural networks. Radiology Artif Intell 2022;4:e210243 doi:10.1148/ryai.210243 pmid:36204543
    CrossRefPubMed
  19. 19.↵
    1. Chang K,
    2. Beers AL,
    3. Bai HX, et al
    . Automatic assessment of glioma burden: a deep learning algorithm for fully automated volumetric and bidimensional measurement. Neuro Oncol 2019;21:1412–22 doi:10.1093/neuonc/noz106 pmid:31190077
    CrossRefPubMed
  20. 20.↵
    1. Suter Y,
    2. Notter M,
    3. Meier R, et al
    . Evaluating automated longitudinal tumor measurements for glioblastoma response assessment. Front Radiology 2023;3:1211859 doi:10.3389/fradi.2023.1211859 pmid:37745204
    CrossRefPubMed
  21. 21.↵
    1. Kickingereder P,
    2. Isensee F,
    3. Tursunova I, et al
    . Automated quantitative tumour response assessment of MRI in neuro-oncology with artificial neural networks: a multicentre, retrospective study. Lancet Oncol 2019;20:728–40 doi:10.1016/S1470-2045(19)30098-1 pmid:30952559
    CrossRefPubMed
  22. 22.↵
    1. Ellingson BM,
    2. Bendszus M,
    3. Boxerman J
    ; Jumpstarting Brain Tumor Drug Development Coalition Imaging Standardization Steering Committee, et al. Consensus recommendations for a standardized brain tumor imaging protocol in clinical trials. Neuro Oncol 2015;17:1188–98 doi:10.1093/neuonc/nov095 pmid:26250565
    CrossRefPubMed
  23. 23.↵
    1. Mazziotta JC,
    2. Toga AW,
    3. Evans A, et al
    . A probabilistic atlas of the human brain: theory and rationale for its development. The International Consortium for Brain Mapping (ICBM). Neuroimage 1995;2:89–101 doi:10.1006/nimg.1995.1012 pmid:9343592
    CrossRefPubMed
  24. 24.↵
    1. Pati S,
    2. Baid U,
    3. Edwards B, et al
    . The federated tumor segmentation (FeTS) tool: an open-source solution to further solid tumor research. Phys Med Biol 2022;67:204002 doi:10.1088/1361-6560/ac9449
    CrossRef
  25. 25.↵
    1. Gahrmann R,
    2. van den Bent M,
    3. van der Holt B, et al
    . Comparison of 2D (RANO) and volumetric methods for assessment of recurrent glioblastoma treated with bevacizumab—a report from the BELOB trial. Neuro Oncol 2017;19:853–61 doi:10.1093/neuonc/now311 pmid:28204639
    CrossRefPubMed
  26. 26.↵
    1. Wang M-Y,
    2. Cheng J-L,
    3. Han Y-H, et al
    . Measurement of tumor size in adult glioblastoma: classical cross-sectional criteria on 2D MRI or volumetric criteria on high resolution 3D MRI? Eur J Radiology 2012;81:2370–74 doi:10.1016/j.ejrad.2011.05.017 pmid:21652157
    CrossRefPubMed
  27. 27.↵
    1. Wen PY,
    2. van den Bent M,
    3. Youssef G, et al
    . RANO 2.0: update to the Response Assessment in Neuro-Oncology criteria for high- and low-grade gliomas in adults. J Clin Oncol 2023;41:5187–99 doi:10.1200/JCO.23.01059 pmid:37774317
    CrossRefPubMed
  28. 28.↵
    1. Vollmuth P,
    2. Foltyn M,
    3. Huang RY, et al
    . Artificial intelligence (AI)-based decision support improves reproducibility of tumor response assessment in neuro-oncology: an international multi-reader study. Neuro Oncol 2023;25:533–43 doi:10.1093/neuonc/noac189 pmid:35917833
    CrossRefPubMed
  29. 29.↵
    1. Vos MJ,
    2. Uitdehaag BMJ,
    3. Barkhof F, et al
    . Interobserver variability in the radiological assessment of response to chemotherapy in glioma. Neurology 2003;60:826–30 doi:10.1212/01.wnl.0000049467.54667.92 pmid:12629241
    CrossRefPubMed
  30. 30.↵
    1. Galanis E,
    2. Buckner JC,
    3. Maurer MJ, et al
    . Validation of neuroradiologic response assessment in gliomas: measurement by RECIST, two-dimensional, computer-assisted tumor area, and computer-assisted tumor volume methods. Neuro Oncol 2006;8:156–65 doi:10.1215/15228517-2005-005 pmid:16533757
    CrossRefPubMed
  31. 31.↵
    1. Dempsey MF,
    2. Condon BR,
    3. Hadley DM
    . Measurement of tumor “size” in recurrent malignant glioma: 1D, 2D, or 3D? AJNR Am J Neuroradiol 2005;26:770–76 pmid:15814919
    PubMed
  32. 32.↵
    1. Yang D
    . Standardized MRI assessment of high-grade glioma response: a review of the essential elements and pitfalls of the RANO criteria. Neurooncol Pract 2016;3:59–67 doi:10.1093/nop/npv023 pmid:31579522
    CrossRefPubMed
  33. 33.↵
    1. Ellingson BM
    . On the promise of artificial intelligence for standardizing radiographic response assessment in gliomas. Neuro Oncol 2019;21:1346–47 doi:10.1093/neuonc/noz162 pmid:31504809
    CrossRefPubMed
  34. 34.↵
    1. Sotoudeh H,
    2. Shafaat O,
    3. Bernstock JD, et al
    . Artificial intelligence in the management of glioma: era of personalized medicine. Front Oncol 2019;9:768 doi:10.3389/fonc.2019.00768 pmid:31475111
    CrossRefPubMed
  35. 35.↵
    1. Rudie JD,
    2. Rauschecker AM,
    3. Bryan RN, et al
    . Emerging applications of artificial intelligence in neuro-oncology. Radiology 2019;290:607–18 doi:10.1148/radiol.2018181928 pmid:30667332
    CrossRefPubMed
  36. 36.↵
    1. Isensee F,
    2. Jaeger PF,
    3. Kohl SA, et al
    . nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 2021;18:203–11 pmid:33288961
    CrossRefPubMed
  37. 37.↵
    1. Thust SC,
    2. van den Bent MJ,
    3. Smits M
    . Pseudoprogression of brain tumors. J Magn Reson Imaging 2018;48:571–89
    CrossRefPubMed
  38. 38.↵
    1. Linhares P,
    2. Carvalho B,
    3. Figueiredo R, et al
    . Early pseudoprogression following chemoradiotherapy in glioblastoma patients: the value of RANO evaluation. J Oncol 2013;2013:690585 doi:10.1155/2013/690585 pmid:24000284
    CrossRefPubMed
  39. 39.↵
    1. Kim S,
    2. Hoch MJ,
    3. Peng L, et al
    . A brain tumor reporting and data system to optimize imaging surveillance and prognostication in high-grade gliomas. J Neuroimaging 2022;32:1185–92 doi:10.1111/jon.13044 pmid:36045502
    CrossRefPubMed
  40. 40.↵
    1. Bianconi A,
    2. Rossi LF,
    3. Bonada M, et al
    . Deep learning-based algorithm for postoperative glioblastoma MRI segmentation: a promising new tool for tumor burden assessment. Brain Inform 2023;10:26 doi:10.1186/s40708-023-00207-6 pmid:37801128
    CrossRefPubMed
  41. 41.↵
    1. Crimi A,
    2. Bakas S
    1. Bangalore Yogananda CG,
    2. Wagner B,
    3. Nalawade SS, et al
    . Fully automated brain tumor segmentation and survival prediction of gliomas using deep learning and MRI. In: Crimi A, Bakas S, eds. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2019. Lecture Notes in Computer Science. Springer; 2020;11993:99–112
    CrossRef
  • Received July 28, 2024.
  • Accepted after revision October 19, 2024.
  • © 2025 by American Journal of Neuroradiology
PreviousNext
Back to top
Advertisement
Print
Download PDF
Email Article

Thank you for your interest in spreading the word on American Journal of Neuroradiology.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Development and Evaluation of Automated Artificial Intelligence–Based Brain Tumor Response Assessment in Patients with Glioblastoma
(Your Name) has sent you a message from American Journal of Neuroradiology
(Your Name) thought you would like to see the American Journal of Neuroradiology web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Cite this article
Jikai Zhang, Dominic LaBella, Dylan Zhang, Jessica L. Houk, Jeffrey D. Rudie, Haotian Zou, Pranav Warman, Maciej A. Mazurowski, Evan Calabrese
Development and Evaluation of Automated Artificial Intelligence–Based Brain Tumor Response Assessment in Patients with Glioblastoma
American Journal of Neuroradiology Apr 2025, DOI: 10.3174/ajnr.A8580

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
0 Responses
Respond to this article
Share
Bookmark this article
AI Tools for Glioblastoma Response Evaluation
Jikai Zhang, Dominic LaBella, Dylan Zhang, Jessica L. Houk, Jeffrey D. Rudie, Haotian Zou, Pranav Warman, Maciej A. Mazurowski, Evan Calabrese
American Journal of Neuroradiology Apr 2025, DOI: 10.3174/ajnr.A8580
del.icio.us logo Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One
Purchase

Jump to section

  • Article
    • SUMMARY:
    • ABBREVIATIONS:
    • MATERIALS AND METHODS
    • RESULTS
    • DISCUSSION
    • CONCLUSIONS
    • Footnotes
    • References
  • Figures & Data
  • Supplemental
  • Info & Metrics
  • Responses
  • References
  • PDF

Related Articles

  • PubMed
  • Google Scholar

Cited By...

  • No citing articles found.
  • Crossref (1)
  • Google Scholar

This article has been cited by the following articles in journals that are participating in Crossref Cited-by Linking.

  • Automated longitudinal treatment response assessment of brain tumors: A systematic review
    Tangqi Shi, Aaron Kujawa, Christian Linares, Tom Vercauteren, Thomas C Booth
    Neuro-Oncology 2025

More in this TOC Section

  • Temporal Evolution of Vestibular schwannoma
  • CE MRI for Brain Metastasis Detection
  • ASL Repeatability Reproducibility in Brain Tumors
Show more BRAIN TUMOR IMAGING

Similar Articles

Advertisement

Indexed Content

  • Current Issue
  • Accepted Manuscripts
  • Article Preview
  • Past Issues
  • Editorials
  • Editor's Choice
  • Fellows' Journal Club
  • Letters to the Editor
  • Video Articles

Cases

  • Case Collection
  • Archive - Case of the Week
  • Archive - Case of the Month
  • Archive - Classic Case

More from AJNR

  • Trainee Corner
  • Imaging Protocols
  • MRI Safety Corner

Multimedia

  • AJNR Podcasts
  • AJNR Scantastics

Resources

  • Turnaround Time
  • Submit a Manuscript
  • Submit a Video Article
  • Submit an eLetter to the Editor/Response
  • Manuscript Submission Guidelines
  • Statistical Tips
  • Fast Publishing of Accepted Manuscripts
  • Graphical Abstract Preparation
  • Imaging Protocol Submission
  • Evidence-Based Medicine Level Guide
  • Publishing Checklists
  • Author Policies
  • Become a Reviewer/Academy of Reviewers
  • News and Updates

About Us

  • About AJNR
  • Editorial Board
  • Editorial Board Alumni
  • Alerts
  • Permissions
  • Not an AJNR Subscriber? Join Now
  • Advertise with Us
  • Librarian Resources
  • Feedback
  • Terms and Conditions
  • AJNR Editorial Board Alumni

American Society of Neuroradiology

  • Not an ASNR Member? Join Now

© 2025 by the American Society of Neuroradiology All rights, including for text and data mining, AI training, and similar technologies, are reserved.
Print ISSN: 0195-6108 Online ISSN: 1936-959X

Powered by HighWire