Examinations of Research Volunteers Incidental Findings from 16,400 Brain MRI

BACKGROUND AND PURPOSE: Incidental ﬁ ndings are discovered in neuroimaging research, ranging from trivial to life-threatening. We describe the prevalence and characteristics of incidental ﬁ ndings from 16,400 research brain MRIs, comparing spontaneous detection by nonradiology scanning staff versus formal neuroradiologist interpretation. MATERIALS AND METHODS: We prospectively collected 16,400 brain MRIs (7782 males, 8618 females; younger than 1 to 94years of age; median age, 38 years) under an institutional review board directive intended to identify clinically relevant incidental ﬁ ndings. The study population included 13,150 presumed healthy volunteers and 3250 individuals with known neurologic diagnoses. Scanning staff were asked to ﬂ ag concerning imaging ﬁ ndings seen during the scan session, and neuroradiologists produced structured reports after reviewing every scan. RESULTS: Neuroradiologists reported 13,593/16,400 (83%) scans as having normal ﬁ ndings, 2193/16,400 (13.3%) with abnormal ﬁ ndings without follow-up recommended, and 614/16,400 (3.7%) with “ abnormal ﬁ ndings with follow-up recommended. ” The most common abnormalities prompting follow-up were vascular (263/614, 43%), neoplastic (130/614, 21%), and congenital (92/614, 15%). Volunteers older than 65 years of age were signi ﬁ cantly more likely to have scans with abnormal ﬁ ndings ( P , .001); however, among all volunteers

][4][5][6] There is poor consensus on whether the baseline prevalence of clinically significant brain abnormalities in the general population justifies the routine use of neuroradiologists to review research MRIs.Standard practices for research MRI interpretation differ by institution and by country, but budgetary and workflow constraints have historically limited expert review solely to scans flagged by scanning technologists and research personnel.These nonradiologists have variable experience and, in most circumstances, lack formal training in diagnostic MR imaging reporting; nonetheless, they are tasked with screening and referring concerning findings for further review, leaving most scans without formal interpretation.
In this prospective cross-sectional study, we describe the prevalence and characteristics of incidental findings and assess the detection rate of abnormalities of nonradiologists compared with neuroradiologists from a series of 16,400 consecutive research brain MRIs collected at a single institution across 18 years.

MATERIALS AND METHODS
All research activities performed and described were conducted in accordance with an institutional review board-approved protocol at the University of Wisconsin-Madison.

Population Recruitment and Inclusion
Brain MRIs were collected from 17,010 consecutive volunteers from research studies conducted at the University of Wisconsin-Madison from April 2002 to March 2020.The final study population included 16,400 scans (7782 males, 8618 females; younger than 1 year of age to 94 years; median age, 38 years) after excluding 610 whose participant intake forms lacked age and/or sex.The overall study data base compiled neuroimaging from volunteers in .300research protocols and 73 principal investigator (PI) groups.All volunteers or their guardians provided informed consent before participation.Participants were recruited by each individual PI on the basis of eligibility criteria for their respective studies.Most studies recruited healthy age-matched control volunteers, while a minority recruited individuals with pre-existing conditions such as stroke, MS, and dementia.
Each scan was treated as a unique case, though some participants were scanned more than once.We are unable to quantify how many participants were serially scanned because of the research scan anonymization, a code that sometimes changed with time for the same individual.Typical workflow required that all scans be read unless a prior MRI in the same protocol had been read within the past year, in which case the PI was not required to submit the scan for radiologist interpretation.We encountered significant abnormalities on follow-up scans in some previously healthy subjects, justifying review of new studies.Most important, volunteers with known pre-existing medical conditions, including those with disease-related neuroimaging findings, were not excluded.Therefore, volunteers with known conditions were considered to have either normal or abnormal findings, or no follow-up was recommended unless other previously unknown brain abnormalities were discovered.If a volunteer had previously been informed of a clinically significant finding and this was seen again at follow-up, this duplicate was placed in the "abnormal, no follow-up" category unless there had been clear-cut interval worsening.Volunteers with normal anatomic variants and common incidental findings of doubtful significance were categorized as having normal findings.

Brain MRI Acquisition and Analysis
MRIs were performed on GE Healthcare MRI scanners at multiple research sites.Most scans were performed at 3T (15,888/ 16,400, 97%).Each PI chose pulse sequences on the basis of individual study needs, leading to a heterogeneous variety of scan protocols.Virtually all included T1-weighted images (mostly volumetric acquisitions) and additional sequences were included for most protocols, particularly in those older than 45 years of age for aging and dementia research.Examinations containing brain anatomy and already postprocessed parameter maps (eg, perfusion if available) were sent to the PACS for neuroradiologist interpretation.Advanced imaging techniques and raw data files including PET, 4D flow MRA, fMRI, and diffusion tensor maps were not interpreted.

MRI Interpretation and Reporting
Nonradiologists including scanning staff (MRI technologists and nurses) and research personnel (PhD scientists and neuropsychologists) were instructed to document concerns at the time of scanning using the same Web-based intake form they had used to upload cases to the reading queue of neuroradiologists.All scanning technicians were certified for MRI safety and technical proficiency, as verified by more senior technicians and ultimately the PI.The technicians in our neuroscience centers were specialty research personnel, most without a radiologic technologist degree, typically with 3-15 years of experience.The technicians in our combined clinical/research site were mostly formally certified radiologic technologists with 2-20 years of experience.Excluding the 202/16,400 scans for which scanner location was unspecified on the intake form, 9944/16,198 (61%) scans were obtained on scanners designated for research only, while 6254/16,198 (39%) scans were acquired on clinical scanners.All scans were anonymously coded, sent to the PACS, and formally interpreted by a neuroradiologist.Intake forms provided readers with volunteers' age and sex, study diagnosis, and known medical conditions.Each neuroradiologist (H.A.R. with .30years of experience, A.S.F. with .20 years of experience; V.P. with .20 years of experience, L.E.W. with .10 years of experience) independently reviewed scans and generated reports using a structured form linked to the volunteer's research examination on the PACS (Online Supplemental Data).In each report, the neuroradiologist classified each examination finding as 1) normal, 2) abnormal, no follow-up, or 3) abnormal, follow-up recommended.
Our main aim while categorizing scans was to identify the full range of incidental findings in our population, but to only recommend follow-up for potentially clinically significant abnormalities.A clinically significant abnormality was defined as an unexpected MRI finding the radiologist considered serious enough to prompt notification of the research subject and review by their medical practitioner.Trivial changes, normal variants, and lesions within expectation were not recommended for follow-up or notification to limit anxiety and potential expense of follow-up, while identifying clear-cut concerning findings with potential clinical implications.The normal/normal variant scans included commonly encountered conditions in the general population such as inflammatory changes of the paranasal sinus, reactive-appearing cervical lymph nodes, small pineal and arachnoid cysts, uncomplicated developmental venous abnormalities, mild WM changes in the elderly, and slightly low cerebellar tonsils.Examples of "abnormal/no follow-up" would include lesions related to known conditions listed on the intake form (eg, MS, prior trauma, or stroke), excess hippocampal atrophy in a volunteer with dementia, or concerning-but-stable conditions for which the volunteer was already notified on the basis of earlier abnormal findings on a research scan.The "abnormal, follow-up recommended" scans contained more concerning lesions that we thought the volunteer should be aware of, even if there were no immediate treatment implications.

Follow-up on Incidental Findings
All volunteers or their guardians signed informed consent/assent under an institutional review board-approved protocol in which they addressed disclosure of incidental findings.On categorizing a scan as abnormal, follow-up recommended, the neuroradiologist informed the PI team, who unblinded the file and referred to the volunteer's informed-consent document to determine the volunteer's preference.The lead investigator would either directly communicate the finding to the volunteer, ask the neuroradiologist to contact the volunteer to disclose the findings, or respect the wishes of the volunteer not to be informed of incidental findings.All clinically relevant findings were communicated to both the participant and his or her physician if requested.

Statistical Analysis
After we acquired and interpreted 17,010 brain MRIs, those without documented age and/or sex were excluded, resulting in a final study population of 16,400.Examinations marked abnormal with follow-up recommended were further subcategorized on the basis of abnormality type using information in each structured report (Online Supplemental Data).

Descriptive Statistical Analysis
Most variables are either categoric or binary.Variables are summarized by the percentage of volunteers in each group.Correlations between categoric or binary variables were evaluated using x 2 tests.Continuous variables are presented as mean (SD) and compared using ANOVA for multiple groups and the Student t test for 2 groups.The association between scans with abnormal findings and those with normal findings was determined by univariate logistic regression adjusted for age and sex.All analyses were performed using R statistical and computing software (Version 3.5.2;http://www.rproject.org/),and P values , .05 were considered statistically significant.
We compared written initial concerns by nonradiologist reviewers with the neuroradiologist's scan classification.Our other descriptive analyses divide scans with abnormal findings on the basis of whether follow-up was recommended; however, this analysis treated all scans with abnormalities as 1 classification because we intended to determine the ability of nonradiologist reviewers to classify scans as having abnormal-versus-normal findings on the basis of whether they perceived at least 1 incidental finding to be present or absent, respectively.Initial concerns were considered relevant to the analysis if the text described a presumptive abnormality (eg, "cyst," "meningioma") and were excluded if it listed known lesions or these were irrelevant (eg, "subject motion," "anxiety meds given before scan").Relevant initial concerns were compared with the final neuroradiologist classification (normal versus abnormal) and were presented in terms of sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).

Inferential Statistical Analysis
An ordinal logistic regression model was constructed to investigate how sex and age affect scan classification.Results of this analysis are presented in log order with standard error (SE) and 95% CIs.An increase in log order represents an increased likelihood of a scan having abnormal findings if a given variable was present.

Study Characteristics
This study comprised 16,400 research volunteers enrolled in studies for which brain MRIs were acquired at the University of Wisconsin-Madison and included both typical volunteers (ie, those without previously identified intracranial abnormalities) as well as individuals with a known brain lesion or congenital predisposition to neuropathology detectable by imaging (eg, excess mineralization in trisomy 21).Study demographic characteristics are shown in Table 1.
Except as detailed below, changes commonly encountered in the general population were placed into the "normal/normal variant" category.This included paranasal sinus mucosal changes (n ¼ 2891, 17.6%), prominent perivascular spaces (n ¼ 2857, WM changes were assessed with each volunteer's age and known risk factors in mind but were quantified only for some aging studies using the 10-point Cardiovascular Health Studies (CHS) score. 7Overall, 5089/16,400 (31%) volunteers were noted to have WM changes.Although most volunteers with WM changes were not prospectively scored using CHS methods, we retrospectively estimated that most volunteers (4116/5089, 81%) had mild disease (CHS 2-4).A minority (973/5089, 19%) had moderate-to-severe WM disease (CHS 5-9); of these patients, 95/ 5089 (2%) were considered to have abnormal findings with follow-up recommended to assess treatable vascular risk factors.Developmental venous anomalies were considered abnormal only if they showed adjacent parenchymal changes including gliosis or cavernoma.
Scans recommended for follow-up were subcategorized by 2 independent reviewers on the basis of the most concerning finding in each examination (Online Supplemental Data).Vascular pathologies were most common (43%), and of these, WM hyperintensities were the leading cause for referral.Examples of scans with abnormal findings with follow-up recommended are shown in Fig 2 .A detailed breakdown of findings is found in the Online Supplemental Data.

Disclosure of Potentially Serious Abnormalities
Reports were released only for abnormal scans with follow-up recommended.With rare exceptions, volunteers in the abnormal, follow-up category were first informed by telephone.Most "cold calls" were made by one of the neuroradiologists (H.A.R.) who is also boardcertified in neurology.Volunteers were provided a brief, written report containing selected images.Results, reports, and recommendations were communicated to the participant's physician if requested in writing.Original data files were not released.

Abnormality Detection Analysis of Nonradiologists
Initial concerns at the time of scanning were noted for 133/16,400 (,1%) scans (Table 2).Overall, nonradiologists showed very low sensitivity to abnormalities, flagging only 52/2807 (2%) scans later considered to have abnormal findings by a neuroradiologist, regardless of whether follow-up was recommended.Among scans flagged by nonradiologists and confirmed to contain an abnormality, 22/52 (42%) contained an abnormality warranting further clinical evaluation.Therefore, nonradiologists detected 22/2807 (,1%) scans in which a clinically significant abnormality was confirmed and recommended for follow-up.Under the assumption that  nonradiologist reviewers omitted comments if they considered a scan to have normal findings, nonradiologist reviewers demonstrated high specificity for examinations with normal finding (99%).Furthermore, nonradiologist reviewers demonstrated modest positive predictive value (39%) for examinations with confirmed abnormalities and good negative predictive value (83%) for examinations with normal findings.

Inferential Analysis
x 2 tests for independence were performed to identify categoric variables significantly associated with scan category classification.The nominal level of significance a ¼ .05 was used as a threshold for statistical significance.Sex was not significantly associated with category classification (P ¼ .37),while age dichotomized as "younger than 65" and "65 or older" was associated with category classification (P , .001).The Cramer V statistic was computed to determine the effect size of this association (V ¼ 0.16), indicating a small effect.The highly statistically significant result from the x 2 test most likely results from sample size versus the effect of age on category classification.
An ordinal logistic regression model was constructed to understand how sex and age affect scan classification.Male volunteers were more likely to be classified as having normal findings than female volunteers (log odds ¼ À0.201; SE ¼ 0.07; 95% CI, À0.34 to À1.14).Considering those with abnormal findings, young volunteers (younger than 65 years of age) were more likely to be recommended for follow-up than volunteers older than 65 years of age (log odds ¼ 1.014; SE ¼ 0.06; 95% CI, 0.89À1.14).

DISCUSSION
Incidental findings are previously unknown abnormalities of potential clinical significance discovered on research brain examinations that are unrelated to the research study aims and distinct from a volunteer's clinical history.There is significant public interest in knowing the baseline prevalence of brain abnormalities, yet routine screening of brain MRIs for asymptomatic individuals has not been recommended. 8Furthermore, it is unclear whether expert review of research brain imaging examinations is prudent or if, instead, nonradiologists can detect abnormalities to facilitate expert review.Therefore, in 2002, the neuroradiology section at the University of Wisconsin Department of Radiology implemented a system for documenting incidental findings in research brain MRIs.As part of this initiative, nonradiologists were instructed to report any concerns at the time of scanning before formal interpretation by a neuroradiologist.
Consistent with other studies examining incidental findings in research MRIs, our study found about 4% of volunteers had at least 1 potentially serious brain abnormality. 4In a study examining incidental findings in 1867 healthy young adults, a similar prevalence of potentially serious brain abnormalities was reported. 6However, we consider that the approach of the study for the detection of abnormalities was insufficient because some scans were screened only by nonexperts viewing only T1-and T2-weighted images; only after being flagged during this initial screening step would a scan undergo expert review by an experienced clinical neuroradiologist reviewing all acquired sequences.In contrast, in our study, every research volunteer underwent expert review of all sequences acquired per each specific study protocol.
Similar to the results of our study, an analysis of 2000 individuals older than 55 years of age from the Rotterdam Study (a prospective, population-based cohort study of age-related brain changes) found that the most common incidental findings were subclinical vascular pathologies and that the prevalence of abnormalities increased with age. 2 In contrast, while potential malignancies represented roughly half of incidental findings in a meta-analysis of studies with incidental findings, 4 in our study, neoplastic phenomena were found in only 21% of MRIs recommended for follow-up.This discrepancy may be due to the emphasis on aging research at our institution, which could bias results toward nonspecific, age-associated WM hyperintensities. 7The authors of the Rotterdam Study claimed that a major strength of their study was its uniform MRI protocol, which indeed strengthens its internal validity.However, our study has greater external validity because of the variety of brain MRI protocols used across studies at our institution as well as the age range from infancy to elderly, reflecting the realistic heterogeneity of research neuroimaging protocols.
This study also examined and compared detection rates of abnormalities for all brain MRIs between nonradiologist reviewers and neuroradiologists.We prospectively collected this information to estimate how a workflow system using a selective "flag and refer" approach would compare with the "read every scan" approach.Our study found that nonradiologists flagged ,2% of scans containing abnormalities, regardless of whether follow-up was recommended.However, among scans flagged and later confirmed to contain an abnormality, 22/52 (42%) were recommended for further clinical evaluation, demonstrating a poor PPV (22/133, 0.16) for flagging scans containing abnormalities warranting further evaluation.Nonradiologists were more likely to detect large abnormalities of variable clinical significance (eg, cystlike lesions, ventriculomegaly) and miss subtle, potentially serious abnormalities (eg, aneurysms, infiltrative gliomas) and virtually all head and neck pathology (eg, parotid tumors, pathologic cervical adenopathy).There were several cases flagged for innocuous findings (eg, cerebellar vermis cyst) and normal variant anatomy (eg, mega cisterna magna) that contained additional undetected abnormalities (eg, ICA aneurysm).These results are expected on the basis of training and experience and particularly because the neuroradiologists' interpretations were considered ground truth.Ultimately, the results offer insight into the prevalence and characteristics of significant lesions that would be potentially missed by using a flag and refer screening approach alone.
In the United Kingdom Biobank study, a large-scale, multimodal (abdominal, cardiac, and brain MRI) population-based cohort study of adults 40-69 years of age examining incidental findings, radiographers were trained and tasked with identifying "incidental findings that might be clinically serious or life-threatening" for referral to a specialist radiologist to review. 9The workflow for detection of incidental findings was examined by comparing study findings with those in the systematic radiologist review of the first 1000 imaged participants.This study found that radiographers flagged 179/1000 (18%) scans for further review by a radiologist.Radiographers detected fewer overall incidental findings than the radiologists performing systematic review (18/1000, 1.8%, versus 179/1000, 17.9%, respectively) but a relatively greater percentage with serious final diagnoses (5/18, 28%, versus 21/179, 12%).Radiographers also missed 16/21 serious final diagnoses (false-negatives), whereas a systematic radiologist review led to many final diagnoses of doubtful clinical significance (158/179, false-positives).
There are 3 crucial caveats when comparing the United Kingdom Biobank study with our study.First, only the first 1000 participants' scans were systematically reviewed by radiologists and compared with radiographer impressions, whereas nonradiologist reviewers in our study had the opportunity to flag every scan and a neuroradiologist reviewed every scan regardless of whether it was flagged.Second, the multimodal nature of the United Kingdom Biobank study enables comparison of the abnormality detection rate for incidental findings throughout the body, whereas our study focused solely on those detectable by brain MRI.Last, our research protocols prevent verification of final diagnoses via supplemental diagnostic studies.When comparing studies, nonradiologist reviewers in the United Kingdom Biobank study flagged scans at greater rates (179/1000, 17.9%) versus our study (33/16,400, ,1%).They also flagged scans in which abnormalities were detected and confirmed by radiologists at similar rates (21/179, 12%, versus 22/133, 16%).Overall, both studies demonstrated that nonradiologists flagged few scans with potentially serious abnormalities.
Our study has several limitations.First, we could not verify provisional neuroradiologic diagnoses on the basis of research brain scans, but these were, nonetheless, considered the ground truth.This issue is because subsequent clinical evaluations prompted by incidental findings were separate institutional review board-approved study activities and the anonymized research protocol forbade follow-up communication with participants receiving follow-up.Second, some participants were scanned more than once, potentially leading to overrepresentation of findings in any given volunteer.However, the authors estimated that fewer than 2000 participants were serially scanned.In the context of 16,400 volunteers, it is unlikely that serially scanned participants had a statistically significant impact on summary results, and some serial scans revealed new significant findings, justifying independent analysis of all scans.Third, research brain MRIs are not performed for diagnostic purposes.Although acquired on high-quality MR imaging scanners and interpreted by neuroradiologists, research brain MRIs contain only the sequences necessary to suit the purpose of each study.Therefore, it is likely that some clinically significant brain abnormalities went undetected due to limited research imaging protocols.
Few protocols included MRA, resulting in a lower-thanexpected detection rate for aneurysms in this large population.Conversely, "soft calls," or provisional diagnoses based on limited information and/or with low confidence were more likely to occur out of caution on the part of the neuroradiologist interpreting each scan.Last, our discovery that nonradiologists showed very low sensitivity to abnormalities compared with neuroradiologists may be biased because nonradiologists knew that every scan underwent expert review.Accordingly, initial appraisals of scans by nonradiologists may have been more cursory and thus less sensitive compared with a scenario in which scans are expertly interpreted only on request.We emphasize that the comparison of nonradiologists with neuroradiologists was performed not to compare diagnostic performance per se but to help quantify the effect on discovery of significant lesions using either approach.

CONCLUSIONS
Incidental findings are previously unknown lesions of potential clinical significance found in brain MRIs performed for research volunteers.In a large series of research volunteers, incidental findings were found in roughly 4% of brain MRIs.The most common type of incidental finding was vascular disease followed by neoplastic and congenital lesions.When asked to note any concerning lesions on the initial image acquisition, scanning staff and research personnel flagged ,2% of scans later found to contain at least 1 significant finding by neuroradiologists.Given the frequency of clinically relevant abnormalities coupled with a low abnormality detection rate by nonradiologists, routine neuroradiologist review of all research brain MRI scans should be considered to ensure that potentially serious abnormalities are detected.

FIG 1 .
FIG 1.Violin plots stratified by scan category.Boxplots within each plot have medians and interquartile ranges.The median age and interquartile range of volunteers with normal examination findings were 28 and 42 years, respectively.The median age and interquartile range of volunteers with abnormal examination findings for which follow-up was not recommended were 61 and 27 years, respectively.The median age and interquartile range of volunteers with abnormal examinations for which follow-up was recommended were 58 and 37 years, respectively.

FIG 2 .
FIG 2. Illustrative cases of incidental findings for which clinical follow-up was recommended.Case examples of the 3 most frequently encountered abnormality categories, lesions marked by arrows, all reportedly asymptomatic at the time of scan.A, Vascular: a 38-year-old participant with trisomy 21 and normal scan findings 2 years earlier and found to have bihemispheric ischemic lesions, suspected to be cardioembolic versus Moyamoya vasculopathy (axial T2-FLAIR).B, Neoplastic: a 68-year-old participant with normal research scan findings 3 years earlier now has an infiltrative left parietal mass, later proven to be a glioblastoma (axial T2-FLAIR).C, Congenital: a 29-year-old participant with extensive left posterior Sylvian polymicrogyria (sagittal T1).

Table 1 :
Characteristics of study volunteers

Table 2 :
Comparison between concerns of nonradiologists about initial imaging versus impressions of neuroradiologists a