Impact of Software Modeling on the Accuracy of Perfusion MRI in Glioma

BACKGROUND AND PURPOSE: Relative cerebral blood volume, as measured by T2*-weighted dynamic susceptibility-weighted contrast-enhanced MRI, represents the most robust and widely used perfusion MR imaging metric in neuro-oncology. Our aim was to determine whether differences in modeling implementation will impact the correction of leakage effects (from blood-brain barrier disruption) and the accuracy of relative CBV calculations as measured on T2*-weighted dynamic susceptibility-weighted contrast-enhanced MR imaging at 3T field strength. MATERIALS AND METHODS: This study included 52 patients with glioma undergoing DSC MR imaging. Thirty-six patients underwent both non-preload dose– and preload dose–corrected DSC acquisitions, with 16 patients undergoing preload dose–corrected acquisitions only. For each acquisition, we generated 2 sets of relative CBV metrics by using 2 separate, widely published, FDA-approved commercial software packages: IB Neuro and nordicICE. We calculated 4 relative CBV metrics within tumor volumes: mean relative CBV, mode relative CBV, percentage of voxels with relative CBV > 1.75, and percentage of voxels with relative CBV > 1.0 (fractional tumor burden). We determined Pearson (r) and Spearman (ρ) correlations between non-preload dose– and preload dose–corrected metrics. In a subset of patients with recurrent glioblastoma (n = 25), we determined receiver operating characteristic area under the curve for fractional tumor burden accuracy to predict the tissue diagnosis of tumor recurrence versus posttreatment effect. We also determined correlations between rCBV and microvessel area from stereotactic biopsies (n = 29) in 12 patients. RESULTS: With IB Neuro, relative CBV metrics correlated highly between non-preload dose– and preload dose–corrected conditions for fractional tumor burden (r = 0.96, ρ = 0.94), percentage > 1.75 (r = 0.93, ρ = 0.91), mean (r = 0.87, ρ = 0.86), and mode (r = 0.78, ρ = 0.76). These correlations dropped substantially with nordicICE. With fractional tumor burden, IB Neuro was more accurate than nordicICE in diagnosing tumor versus posttreatment effect (area under the curve = 0.85 versus 0.67) (P < .01). The highest relative CBV–microvessel area correlations required preload dose and IB Neuro (r = 0.64, ρ = 0.58, P = .001). CONCLUSIONS: Different implementations of perfusion MR imaging software modeling can impact the accuracy of leakage correction, relative CBV calculation, and correlations with histologic benchmarks.

crosis), [8][9][10][11] and predict tumoral response and patient survival after targeted therapy. [12][13][14][15][16] Despite the potential clinical impact of pMRI, broad-scale integration has been slowed by the need to define optimal methodologic conditions to maximize rCBV accuracy. While a number of factors can affect rCBV measurements (eg, image acquisition, motion correction, signal fitting, and mathematic modeling), most methodologic studies have focused on techniques that correct for T1-weighted leakage errors from blood-brain barrier disruption and T2/T2*-weighted residual errors from contrast recirculation within tortuous microvasculature. [17][18][19][20][21][22][23][24] Specifically, DSC relies on the assumptions that gadolinium-based contrast agents (GBCA) transit through tissue as a single bolus and remain within the vascular lumen. Yet, these premises are often violated in the setting of high-grade glioma, increasing the likelihood of rCBV inaccuracies.
On the basis of previous comparison studies, the administration of GBCA preload dose (PLD) and the subsequent use of software modeling (during image postprocessing) offer the most effective methods for rCBV correction. [17][18][19] PLD, given before DSC acquisition, minimizes T1 leakage effects by presaturating tissue T1 signal and decreasing subsequent GBCA extravascular diffusion. [17][18][19][20][21][22]25,26 Because of theoretic dose-dependent risks of nephrogenic systemic fibrosis, the GBCA dose is generally minimized, with most studies showing effective T1 leakage-correction with a PLD as low as 0.05-0.1 mmol/kg. 19 Additionally, modeling correction has proved necessary to correct residual T1 errors and T2/T2*-weighted recirculation effects following PLD. While a number of modeling algorithms have been proposed, the method published by Boxerman et al 17 remains the most highly cited and validated algorithm to date, and it is widely considered the standard for DSC-pMRI.
Generally speaking, modeling correction requires implementation of mathematic algorithms through computer software programs developed either in-house by individual academic centers or incorporated within vendor-supplied commercial packages. Vendor-supplied options offer the advantage of wide availability and ease of standardization across multiple institutions, but the methods by which the algorithms are implemented can vary by vendor. While we generally assume negligible differences in how various software programs incorporate mathematic modeling to calculate rCBV, this assumption has not been directly tested, particularly with validation against standard benchmarks such as histology.
In this study, we compared 2 commonly published, commercially available implementations of the Boxerman algorithm, 17 as integrated within the IB Neuro (IBN, Version 1.1; Imaging Biometrics, Elm Grove, Wisconsin) and nordicICE (NICE, Version 2.3.13; NordicNeuroLab, Bergen, Norway) software packages. 8,9,[14][15][16][17][18]20,[25][26][27][28] We present data from a cohort of 52 patients with glioma who underwent DSC-pMRI acquisition at the time of clinical MR imaging. The goals of this study are to determine the equivalency of modeling implementation and rCBV calculation across platforms and to assess whether rCBV variations, if present, will significantly impact correlations with histologic benchmarks. Our overarching goal is to provide information that will help work toward consensus and standardization of pMRI methodology.

Subjects
We searched our data base (2007)(2008)(2009)(2010)(2011)(2012)(2013) for patients with histopathologically confirmed glioma who had conventional 3T MR imaging with pMRI at 2 different institutions. We included patients in whom the same examination contained 2 separate DSC-pMRI acquisitions (and separate bolus contrast injections) and/or the MR imaging was performed preoperatively for stereotactic resection and/or biopsy within 1 day after imaging. Subjects were pooled from 2 separate institutions: Barrow Neurological Institute at St. Joseph's Hospital and Medical Center and Mayo Clinic, Arizona. All patient data were anonymized for Health Insurance Portability and Accountability Act compliance. The institutional review board approved our study. All patients undergoing pMRI had estimated glomerular filtration rates of Ͼ60 mg/ min/1.72 m 2 .

Perfusion MR Imaging Data Acquisition
Each 3T examination was performed on 1 of 2 MR imaging magnets (Signa HDx; GE Healthcare, Milwaukee, Wisconsin; or Magnetom Skyra; Siemens, Erlangen, Germany). All patients underwent initial preload dose administration that allowed the acquisition of PLD-corrected DSC-pMRI data, which were all acquired via a second GBCA injection (0.05-mmol/kg, gadodiamide or gadobenate dimeglumine) by using previously described methods. 8,19 In all patients, the PLD amount totaled 0.1 mmol/kg, administered via either a single bolus injection or 2 separate (0.05mmol/kg) bolus injections, depending on the departmental protocol at the time of imaging. In a subset of patients, we acquired non-PLD-corrected DSC-pMRI data during the initial PLD bolus injection, by using either 0.05-or 0.1-mmol/kg GBCA injections, depending on the clinical perfusion MR imaging protocol used at the time of acquisition. We performed a separate subanalysis to determine the impact of different injection doses as shown in On-line Table 1.
All DSC data (gradient-echo echo-planar imaging with TR/ TE/flip angle ϭ 1500 -2000/20 ms/60°, FOV ϭ 24 ϫ 24 cm, matrix ϭ 128 ϫ 128, 5-mm section, no gap) were acquired during 3 minutes with the bolus injection occurring at the 1-minute mark after the start of the DSC sequence. All GBCA injections were via power injector at 3-5 mL/s, followed by a 20-mL normal saline flush. The final GBCA dose for all patients (irrespective of the method of PLD administration) was 0.15 mmol/kg of body weight.

Perfusion MR Imaging Data Analysis
After transferring all MR imaging data to an off-line workstation and removing baseline points collected during the first 5 seconds, we generated whole-brain rCBV maps by using 2 commonly published commercial software packages: nordicICE (Version 2.3.13) and IB Neuro (Version 1.1), both approved by the US Food and Drug Administration. For NICE, we used all available default options and included leakage correction in all cases. Default options consisted of automatic prebolus baseline selection to define the prebolus baseline and integration intervals and subsequent noise threshold adjustment to maximize brain tissue used for CBV calculation. We did not use spatial or temporal smoothing for either software package, to help maintain data integrity and limit potential confounding factors. We performed rCBV calculations with ␥ variate fitting before leakage correction or without ␥ variate fitting. For IBN, we used all default options including leakage correction: 1) automated detection of brain tissue mask for voxels used in CBV calculation, 2) automated detection of contrast arrival within brain mask voxels to define the prebolus baseline and integration intervals, and 3) leakage correction based on Boxerman et al. 17 For rCBV generated with either NICE or IBN, we coregistered the rCBV maps with stereotactic anatomic images by using registration methods implemented in the Insight Segmentation and Registration Toolkit (www.itk.org) within the IB Suite (Version 1.0.454; Imaging Biometrics), as previously described. 17,18,29,30 We normalized all rCBV maps to mean CBV from two 3 ϫ 3 voxel-sized square ROIs within the contralateral frontal and parietal normal-appearing white matter. 8,19 To reduce variability, we used identical normal-appearing white matter ROIs for both software package analyses to generate all rCBV metrics. We calculated multiple previously published rCBV metrics including the following: 1) volume fraction of tumor voxels above the rCBV threshold of 1.75 (percentage Ͼ1.75); 2) volume fraction of tumor voxels above the rCBV threshold of 1.0, also known as perfusion MR imaging fractional tumor burden (FTB); 3) histogram mean rCBV; and 4) histogram mode rCBV for all tumor voxels. We chose the thresholds of 1.0 and 1.75 because of previous studies reporting the biologic significance of these values. 6,8,30 On the basis of the rCBV maps generated from NICE and IBN packages, we calculated volume fraction metrics by using the IB Suite and histogram metrics by using custom code written in Matlab (Version R2012a; MathWorks, Natick, Massachusetts). To reduce variability, we also used identical segmented enhancing tumor volumes for both software analyses and all rCBV metrics (as described below).

Stereotactic Biopsy, Image Coregistration, and Histologic Microvessel Analysis
Our cohort included a subset of patients in whom neurosurgeons collected an average of 2-3 tissue specimens from each tumor by using stereotactic surgical localization, following the smallest possible diameter craniotomies to minimize brain shift. Biopsies were performed without knowledge of rCBV analyses. Similar to those in previous studies, biopsy locations and neuronavigational coordinates were recorded and coregistered with MR imaging to enable localized rCBV measurement (3 ϫ 3 voxel-sized ROIs) at corresponding biopsy sites. 11,31 Multiple biopsy targets in the same patient were separated by a minimum of 2 cm. The neurosurgeon visually validated stereotactic imaging locations with corresponding intracranial anatomic landmarks, such as vascular structures. Stereotactic biopsy samples were sectioned (10-m thickness), CD-34 stained, and submitted for quantification of total microvessel area (MVA) by using previously published methods. [31][32][33][34] Corresponding sections were also stained with hematoxylin-eosin per standard protocol. For each CD-34stained slide, we measured total microvessel area as previously described. 31,32,35 Raw data from 7 of these patients were studied previously. 31 The current study differs in the following ways: 1) We used commercial software packages and modeling correction to measure rCBV, 2) we determined test performance differences between packages, and 3) we compared PLD against non-PLD conditions.

Quantification of Histologic Tumor Fraction in Recurrent Glioblastoma Multiforme
Our cohort included a subset of 25 patients with recurrent glioblastoma multiforme, previously treated with the protocol of Stupp et al. 36 We enrolled each of these patients at the time of recurrence, at which time they underwent preoperative MR imaging (including pMRI) for surgical debulking of newly developed or enlarging lesions suspicious for recurrence identified on surveillance contrast-enhanced MR imaging.
Following debulking, we fixed all surgical tissue specimens in 10% formalin, embedded them in paraffin, sectioned them (10 m), and stained them with hematoxylin-eosin per standard diagnostic protocol at our institution. Two neuropathologists quantified glioblastoma multiforme and/or posttreatment effect elements for all specimens without knowledge of DSC-MR imaging, by simultaneously estimating histologic fractional volume of tumor relative to nonneoplastic treatment-related features, as previously described. 8,30,37,38 Features of tumor recurrence 38 and posttreatment effect 37,39 were quantified and used to determine the histologic tumor fraction from surgical resection material to diagnose either tumor progression (histologic tumor fraction of Ն50%) or posttreatment effect (histologic tumor fraction of Ͻ50%) on the basis of group median values. Raw data from these 25 patients have been studied previously. 8 Like the prior study, the current study measures FTB but with several important differences in experimental design: 1) We used and compared 2 separate modeling algorithm implementations to calculate FTB, 2) we assessed performance differences between methods by comparing test accuracies (with receiver operating characteristic analysis), and 3) we use a simplified classification system to establish the clinical presence/absence of tumor progression.

Statistical Analysis
A biostatistician performed all analyses. We first determined Pearson and Spearman correlations between non-PLD-and PLDcorrected conditions for all rCBV metrics as calculated by IBN and NICE. Second, we used receiver operating characteristic anal-ysis to determine the accuracy of FTB (as measured by IBN and NICE) to diagnose tumor versus posttreatment effect. Finally, we determined Pearson and Spearman correlations between localized rCBV and MVA from corresponding stereotactic biopsies. P Ͻ .05 was statistically significant.

Subjects and Tumor Types
We enrolled 52 patients (17 women, 35 men; mean age, 53 years), of whom 87% (45/52) had high-grade gliomas with 78% (35/45) presenting at recurrence after standard multimodal therapy. On-line Table 2 summarizes the tumor types for primary and recurrent cases.

Comparing rCBV Measurements in the Presence and Absence of Preload Dose
Comparing rCBV between PLD and non-PLD conditions gives an indication of how well modeling implementation corrects T1 leakage errors. We acquired both PLD-and non-PLD-corrected rCBV values in a subset of patients (n ϭ 36) for whom we calculated 4 separate rCBV metrics (mean, mode, percentage Ͼ1.75, and FTB) by using both IB Neuro and nordicICE software packages. When we used IBN (Fig 1), rCBV thresholding metrics correlated very highly between non-PLD-and PLD-corrected conditions (FTB: r ϭ 0.96, ϭ 0.94; percentage Ͼ 1.75: r ϭ 0.93, ϭ 0.91); correlations were also high for mean rCBV (r ϭ 0.87, ϭ 0.86) and mode rCBV (r ϭ 0.78, ϭ 0.76). With NICE modeling, these correlations dropped substantially (Fig 1) for thresholding metrics (FTB: r ϭ 0.70, ϭ 0.71; percentage Ͼ 1.75: r ϭ 0.59, ϭ 0.60), mean rCBV (r ϭ 0.43, ϭ 0.62), and mode rCBV (r ϭ 0.51, ϭ 0.65). When we added ␥ variate fitting, correlations for mean rCBV by using NICE decreased though the other metrics remained largely unchanged (Table 1). On visual inspection of thresholding maps, non-PLD and PLD-corrected voxels showed greater spatial correspondence when using IBN compared with NICE (Fig 2). Table 1 summarizes correlations for all conditions. Of these 36 patients, 10 received PLD via 2 separate half-dose injections. To assess the potential effects of heterogeneity in PLD administration, we performed a subanalysis (n ϭ 26) excluding these 10 subjects, which showed correlations consistent with the original analysis (On-line Table 1).

The Type of Modeling Implementation Impacts the Accuracy of rCBV to Diagnose Tumor versus Pseudoprogression/Radiation Necrosis
In a subset of patients with recurrent glioblastoma multiforme (n ϭ 25) undergoing surgical debulking for suspected tumor recurrence, we used receiver operating characteristic analysis to determine the accuracy of FTB, as measured by IBN or NICE, to diagnosed tumor versus posttreatment effect (ie, pseudoprogression, radiation necrosis). We used histologic tumor fraction from surgical resection to categorize each subject's diagnosis as either tumor recurrence (histologic tumor fraction of Ն50%) or posttreatment effect (histologic tumor fraction of Ͻ50%). We used PLD correction for all cases. The area under the curve for FTB, as measured by IBN (0.85), was significantly higher than that by NICE (0.67; P Ͻ .01) (Fig 3).

Influence of PLD and Modeling Correction on the Correlation of rCBV with Stereotactic Microvessel Area Quantification
We measured localized rCBV values corresponding to coregistered stereotactic biopsy samples (n ϭ 29) in a subset of patients (n ϭ 12). We determined Spearman and Pearson correlations between matched rCBV and histologic microvessel area measurements under multiple conditions, which varied by method of modeling correction or the presence/absence of PLD correction (Table 2). Both PLD correction and IBN modeling were needed to maximize rCBV correlations with MVA (r ϭ 0.64, ϭ 0.58, P ϭ .001).

DISCUSSION
Relative CBV represents the most robust and widely used perfusion MR imaging metric in neuro-oncology. [40][41][42][43][44][45][46] Of the techniques that measure rCBV, DSC is the most commonly used method because of wide availability, straightforward postprocessing, and easy-to-use software programs. 40,41 DSC uses the indicator dilution theory based on susceptibility (T2-/T2*-weighted signal drop) from first-pass transit of a single GBCA bolus injection. DSC assumes an intact BBB with no extravascular GBCA leakage or recirculation and thus requires correction methods when these factors occur (discussed below). Dynamic contrast-enhancement MRI and arterial spin-labeling offer alternative approaches to DSC for calculation of rCBV. The theory and limitations of these techniques have been described previously. 23,24,[40][41][42] Correctly performing DSC requires several technical considerations based on comparison data from prior studies validating optimal conditions for best practice. First, DSC-pMRI generally necessitates both PLD and mathematic modeling to achieve the highest degree of T1 leakage correction and rCBV accuracy. [17][18][19] Results from our study support this requirement ( Table 2). Regarding PLD amount, most groups use a single dose (0.1 mmol/ kg) of GBCA, 8,9,[14][15][16][17][18][19][20]22,25,26,42 particularly at 1.5T, though adequate PLD correction could be achieved with a GBCA dose as low as 0.05 mmol/kg at 3T. 19 Second, gradient-echo T2*-weighted DSC represents the most preferred and widely published method for DSC. While spin-echo T2-weighted DSC offers a higher signal-to-noise ratio and fewer susceptibility artifacts, 45 double or triple GBCA injection doses (0.2-0.3 mmol/kg) are typically needed during the acquisition of spin-echo DSC, 2,7,30 to overcome the lower contrast-to-noise ratio (ie, signal drop in response to the GBCA first-pass bolus). Compared with spin-echo, gradient-echo DSC offers advantages such as the following: 1) superior contrast-to-noise ratio (ie, greater signal drop during GBCA firstpass), which allows lower contrast dosage during DSC acquisition (0.05-0.1 mmol/kg) and improves the quality of rCBV data, minimizing the need for signal denoising; 2) greater sensitivity to microvessels of all sizes (including larger tortuous glomeruloidtype vessels commonly observed in high-grade gliomas); and 3) the ability to use flip angles of Ͻ90°to minimize T1 leakage effects. 11,19,31,42,44,45 Finally, in regard to mathematic modeling, the algorithm published by Boxerman et al 17 remains the most highly cited and validated method to date and has been implemented commercially for widespread use.
The study results here underscore the importance of how soft-  Note:-gvf indicates ␥ variate fitting. a NICE calculations were performed with and without gvf. IBN modeling shows substantially higher correlation between PLD and non-PLD metrics (compared with NICE), suggesting higher rCBV accuracy in the absence of PLD correction. Statistical significance is P value Ͻ .05.
ware programs implement a particular modeling algorithm for rCBV calculation. In this study, we tested 2 widely published FDA-approved commercial packages that offer separate implementations of the Boxerman method, 8,9,[14][15][16][17][18][19][20][21]24,[26][27][28] and we evaluated how the modeling implementation by each software program would impact T1 leakage correction and rCBV correlation with histologic measures. We minimized potential confounding factors by using identical segmented tumor volumes and regions in normalized white matter to evaluate each implementation method, and we used default settings and leakage correction for both software packages. For NICE, these included automated selection of the prebolus baseline and subsequent noise thresholding to maximize brain tissue for calculation of CBV. The rCBV metrics on IBN (compared with NICE) demonstrated greater consistency between PLD and non-PLD conditions, most notably with mean rCBV (IBN: r ϭ 0.87; NICE: r ϭ 0.43) and percentage Ͼ 1.75 (IBN: r ϭ 0.93; NICE: r ϭ 0.59). This suggests that the modeling correction by IBN provides more effective correction of T1 errors, which are most prominent at non-PLD conditions. While we observed strong correlations between non-PLD and PLD measures (when using IBN), further studies are likely needed to determine the following: 1) whether PLD can or should be omitted, 2) what the optimal conditions would be to allow PLD omission (ie, modeling implementation, 3T field strength), and 3) whether this omission would significantly impact prognostic and diagnostic accuracy. Under PLD conditions, separate experiments confirmed significantly higher FTB accuracy with IBN (area under the curve ϭ 0.84), compared with NICE (area under the curve ϭ 0.67, p Ͻ 0.01), in diagnosing histopathologically confirmed tumor versus posttreatment effect (ie, pseudoprogression, radiation necrosis). IBN also provided the highest degree of correlation between localized rCBV and tissue microvessel area ( Table 2).
In this study, we chose to validate rCBV measurements against histopathology rather than outcomes. Imaging measurements such as rCBV are most directly related to histologic correlates such as microvessel volume and histologic identity (eg, tumor grade, tumor versus posttreatment effect). How these histologic features (and their imaging correlates) predict survival may be confounded by a number of different factors such as age, molecular markers (isocitrate dehydrogenase), methylation status (eg, O6-methylguanine-DNA methyltransferase), extent of resection, salvage therapy at the time of recurrence, and so forth. [6][7][8][9][10][11][12][13][14][15][16]47,48 While clinical outcomes are desirable as end points, they must be correlated with imaging and histologic features together in a controlled trial with a larger patient cohort, which is beyond the scope of this article. Our purpose in this study was simply to determine which method of rCBV measurement (ie, which software package) came closest to informing of underlying tissue features. We think that this context justifies the rationale for validating rCBV against histopathologic benchmarks.
We recognize potential study limitations. First, we limited the scope of the evaluation to 2 specific software packages, though many commercial options exist. We simplified the project to maximize the potential clinical impact because we evaluated the most published and validated modeling algorithm to date (Box-