This article requires a subscription to view the full text. If you have a subscription you may use the login form below to view the article. Access to this article can also be purchased.
ABSTRACT
BACKGROUND AND PURPOSE: The Radiological Society of North America has actively promoted artificial intelligence (AI) challenges since 2017. Algorithms emerging from the recent RSNA 2022 Cervical Spine Fracture Detection Challenge demonstrated state-of-theart performance in the competition’s dataset, surpassing results from prior publications. However, their performance in real-world clinical practice is not known. As an initial step towards the goal of assessing feasibility of these models in clinical practice, we conducted a generalizability test using one of the leading algorithms of the competition.
MATERIALS AND METHODS: The deep learning algorithm was selected due to its performance, portability and ease of use and installed locally. 100 examinations (50 consecutive cervical spine CT scans with at least one fracture present and 50 consecutive negative CT scans) from a Level 1 trauma center not represented in the competition dataset were processed at 6.4s per exam. Ground truth was established based on the radiology report with retrospective confirmation of positive fracture cases. Sensitivity, specificity, F1 score, and AUC were calculated.
RESULTS: The external validation dataset was comprised of older patients in comparison to the competition set (53.5 ± 21.8 years vs 58 ± 22.0 respectively; p < .05). Sensitivity and specificity were 86% and 70% in the external validation group and 85% and 94% in the competition group, respectively. Fractures misclassified by the CNN frequently had features of advanced degenerative disease, subtle nondisplaced fractures not easily identified on the axial plane, and malalignment.
CONCLUSIONS: The model performed with a similar sensitivity on the test and external dataset, suggesting that such a tool could be potentially generalizable as a triage tool in the emergency setting. Discordant factors such as age-associated comorbidities may affect accuracy and specificity of AI models when used in certain populations. Further research should be encouraged to help elucidate the potential contributions and pitfalls of these algorithms in supporting clinical care.
ABBREVIATIONS: AI= artificial intelligence; CNN = convolutional neural networks; RSNA= Radiological Society of North America
Footnotes
The authors declare no conflicts of interest related to the content of this article.
- © 2025 by American Journal of Neuroradiology