Article Text
Abstract
Background Towards the translation of computational fluid dynamics (CFD) techniques into the clinical workflow, performance increases achieved with parallel multi-central processing unit (CPU) pulsatile CFD simulations in a patient-derived model of a bilobed posterior communicating artery aneurysm were evaluated while simultaneously monitoring changes in the accuracy of the solution.
Methods Simulations were performed using 2, 4, 6, 8, 10 and 12 processors. In addition, a baseline simulation was obtained with a dual-core dual CPU computer of similar computational power to clinical imaging workstations. Parallel performance indices including computation speed-up, efficiency (speed-up divided by number of processors), computational cost (computation time × number of processors) and accuracy (velocity at four distinct locations: proximal and distal to the aneurysm, in the aneurysm ostium and aneurysm dome) were determined from the simulations and compared.
Results Total computation time decreased from 9 h 10 min (baseline) to 2 h 34 min (10 CPU). Speed-up relative to baseline increased from 1.35 (2 CPU) to 3.57 (maximum at 10 CPU) while efficiency decreased from 0.65 to 0.35 with increasing cost (33.013 to 92.535). Relative velocity component deviations were less than 0.0073% and larger for 12 CPU than for 2 CPU (0.004±0.002%, not statistically significant, p=0.07).
Conclusions Without compromising accuracy, parallel multi-CPU simulation reduces computing time for the simulation of hemodynamics in a model of a cerebral aneurysm by up to a factor of 3.57 (10 CPUs) to 2 h 34 min compared with a workstation with computational power similar to clinical imaging workstations.
- Aneurysm
- MRI
- Technique
Statistics from Altmetric.com
Introduction
Computational fluid dynamics (CFD) techniques have evolved to an established modality for the computation of hemodynamics in patient-derived models of cerebral aneurysms.1 An increasing number of studies have reported correlations of CFD results with aneurysm formation,2 rupture risk3–6 and thrombus formation.7 Other approaches have attempted to predict and to quantify hemodynamic changes after endovascular treatment,8 ,9 including placement of flow-diverting stents.10 Parallel to these efforts, results obtained with CFD have been validated using a variety of techniques based on ex vivo11 ,12 or in vivo studies by comparison with flow information obtained from clinical images.13–15
A number of obstacles have to be overcome to integrate fluid dynamics modeling into clinical workflow. These requirements are automated segmentation of the aneurysm lumen and adjacent arteries, automated meshing of the computational domain including identification and definition of inflow and outflow zones, reduction of calculation times to clinically acceptable values and integration of the simulation results into the clinical workflow either as images or in another format suitable for use on existing radiological equipment such as a picture archiving and communication system (PACS) system.
A possible approach for accelerating the simulations is the use of parallel multi-central processing unit (CPU) software code. Decreasing cost of hardware and easy access to workstations integrating multiple dual or quad core processors as well as the availability of commercial parallel CFD solvers which automatically perform the additional set-up, such as subdivision of the computational domain and management of inter-process communications, make this approach particularly appealing.
In this study we evaluated the commercially available (Fluent, ANSYS) multi-CPU parallel CFD solver with a model system of a double-lobed aneurysm of the posterior communicating artery (PComA). Pulsatile simulations were performed with varying numbers of CPUs (equal to the number of computational processes executable in parallel) as part of a distributed memory multicomputer of the ANSYS Remote Simulation Facility (RSF, Lebanon, New Hampshire, USA). In addition, a baseline simulation was performed on a workstation with computational power similar to clinical imaging workstations (one single process). The geometry of the aneurysm and the parent artery was derived from clinical images and inflow was modeled using physiologically measured flow data to ensure the evaluation of the multi-CPU solver with patient-derived boundary conditions. In addition, the validity of the CFD simulation results was assessed by comparing measured and simulated intra-aneurysmal flow patterns (IFPs).14
A number of parallel performance indices including average and total computation time, message count and data transfer were calculated from these simulations. Relative speed-up, efficiency, computational cost as well as accuracy of the computed velocity field (expressed in relative differences in the velocity vector components at four distinct points in the computational model) were quantified for the multi-CPU solutions relative to the baseline. This comparison enabled us to obtain a benchmark for performance increase achieved with the multi-CPU configurations compared with single process execution achievable on standard clinical equipment.
Materials and methods
Aneurysm characterization
Aneurysm morphology
Three-dimensional (3D) digital subtraction angiographic (DSA) image data (Siemens C-Arm System, Axiom Artis, dBA, Siemens AG Healthcare Sector, Forchheim, Germany) acquired during a diagnostic examination of an unruptured cerebral aneurysm of the PComA was retrospectively collected for one patient (a woman aged 68 years). The aneurysm was successfully treated by balloon-assisted coiling with the preservation of the PComA which originated from the neck of the aneurysm.
Inflow and intra-aneurysmal flow patterns
Prior to endovascular treatment, time-resolved two-dimensional (2D) phase contrast magnetic resonance images (2D pcMRI) were acquired from a location in the internal carotid artery (ICA) proximal to the aneurysm and at three approximately perpendicular cross-sections within the aneurysm (field-of-view 160 mm×160 mm, acquisition matrix 256×192, slice thickness 5 mm, interpolated in-plane resolution 0.652 mm, peripheral retrospective gating). From the phase difference images of the 2D pcMRI dataset of the ICA (12 time points), the velocity inflow waveform of the computational model was interpolated by spline interpolation as reported previously.14 From the 2D pcMRI cross-sectional image sets, IFPs were derived for comparison with CFD results16 (velocity encoding (VENC) for oblique axial and sagittal oriented IFP: 60 cm/s and for oblique coronal IFP: 100 cm/s).
Computational fluid dynamics
Computational mesh
3D DSA image data were smoothed by convolution with a 3D Gaussian kernel (Visualization Tool Kit, V.5.4, Kitware) and converted into a stereolithographic file (Paraview V.3.8.0). From this surface mesh a 3D polyhedral-based mesh with boundary layer was created (curvature size function: maximum angle difference of adjacent surface elements 5°, growth rate 1.5, maximum element size 0.5, minimum element size 0.05; boundary layer: first percent 0.6, growth rate 2, rows 7, internal continuity, transition pattern 1:1, GAMBIT, V.2.4.6, ANSYS). The final mesh size was based on mesh-independence convergence checks and consisted of 550 370 cells (932 458 nodes).
Simulation parameters
Pulsatile CFD simulations (137 time steps, 5 ms per time step) were performed for six configurations (2, 4, 6, 8, 10, 12 CPUs, number of processes corresponding to number of CPUs) at the ANSYS remote simulation facility (RSF, Dell PowerEdge SC1435 AMD Opteron 2.6 GHz, 64-bit processor with 64 GB total memory, InfiniBand interconnect) and for one configuration at a Dell Precision PWS 690 Dual Core Dual Processor 64-bit workstation (Intel Xeon CPU 5160 @ 2.8 GHz, 1 process). The latter served as the baseline calculation relative to which the multi-CPU parallel code was evaluated as it represented a typical configuration for a clinical imaging workstation. To ensure the decay of initial transients, five cardiac cycles were simulated for each configuration; the results are reported for the fifth cardiac cycle. Simulations were performed with the ANSYS Fluent parallel unsteady second-order pressure-based solver (PISO pressure-velocity coupling and PRESTO second order upwind discretization). From the baseline simulation, IFPs were calculated at cross-sections at approximately the same locations as the IFPs obtained with 2D pcMRI. Contours obtained from the magnitude of blood flow velocity components normal to each cross-section were scaled in the same fashion as the IFP obtained with 2D pcMRI for comparison. Volumetric inflow was modeled after the volumetric flow waveform using spline interpolation (Matlab, Mathworks) obtained from the2D pcMRI image data measured at this location.14 No stenoses or obstructions were found in the distal arteries, so outflow boundary conditions were modeled as zero pressure outlets according to low vascular resistance present in the cerebral vasculature.
Parallel simulation statistics
Performance indices
Parameters (accessible from the log files created by the ANSYS software) chosen for characterizing parallel performance for the different configurations are given in table 1. With these parameters, speed-up, efficiency and the cost of the simulations were derived.
Speed-up parameter
Arguably the most obvious benefit of using a parallel computer is the reduction in the total computation time. In this respect, a measure of parallel performance is the speed-up parameter, defined as the ratio of the execution time for the baseline calculation to the execution time using multiple processors. It can be considered the scalability of the system as the number of processors is increased. Maximum speed-up is given by Amdahl's law where, in the limit of a large number of processors, the speed-up parameter approaches the value of 1/F with F being the fraction of calculations that cannot benefit from parallelization. Speed-up parameters were calculated for average wall clock time per iteration (SPave) and for total wall clock time (SPtotal).
Efficiency parameter
The efficiency parameter E is defined as the ratio of the speed-up parameter to the number of processors and is expressed as a percentage. It is a relative quantity describing the percentage of time the processors are being used to perform the actual computation. E was calculated using the average wall clock time per iteration.
Cost parameter
The cost of a computation C in the parallel environment is defined as the product of the number of processors × the time needed to perform the simulations (total wall clock time).
Accuracy
To evaluate a potential loss in accuracy when parallelizing the computation, differences in the velocity vector components relative to the baseline configuration were quantified at four representative locations within the computational model: (1) in the ICA proximal to the aneurysm; (2) in the ICA distal to the aneurysm, both within a mostly ordered laminar flow field; (3) at the aneurysm ostium where regions of alternating flow directions (in and out of the aneurysm) are present; and (4) inside the aneurysm where slow circulating flow patterns are found. Relative deviations (as a percentage) averaged over the cardiac cycle from the baseline simulation were calculated to provide a measure for the accuracy of the parallel calculations at each location.
Simulation convergence
For each of the four locations described in the previous section, the maximum errors in the velocity magnitude over the first four cardiac cycles were calculated relative to the fifth cardiac cycle. In this way, uncertainty in the simulation results due to convergence issues could be quantified.
Results
Aneurysm characterization
Aneurysm morphology
The diameters of adjacent arteries in the final 3D computational model were compared with DSA projection images (figure 1A, B) to ensure accurate representation of aneurysm dimensions in the CFD simulations. The aneurysm was classified as saccular, exhibiting two pronounced lobes. The largest diameter was 12 mm, height of the aneurysm dome was 7 mm and the diameter of the aneurysm ostium was 6 mm. The aneurysm neck originated from the ICA and an additional outflow was present in the form of the PComA which originated at the base of the aneurysm neck at an angle of approximately 90° to the neck and approximately 180° relative to the distal section of the ICA (figure 1). The computational model included an extended section of the cavernous ICA to allow for full development of the velocity profile prior to entering the aneurysm and extended distal sections of the ICA (including its bifurcation into the anterior and middle cerebral artery) and the PComA for minimizing potential outflow errors in the intra-aneurysmal flow field.
Inflow and intra-aneurysmal flow patterns
Volumetric flow waveform in the proximal ICA ranged from 188 ml/min to 467 ml/min (average flow 304 ml/min). The average duration of the cardiac cycles used to perform the 2D pcMRI flow measurements was 690 min (figure 1C).
Computational fluid dynamics
Maximum total computed pressure ranged from 0 (at the outflows of the anterior cerebral, middle cerebral and ophthalmic artery) to 1310 Pa at the inflow (ICA, figure 1A). The pressure at the aneurysmal wall was smaller than in the proximal section of the ICA but larger than in the distal segment. Maximum wall shear stress magnitudes (WSS) were lower than 10 Pa at the aneurysmal wall. Larger WSS values were observed in the parent artery with only three small focal locations exceeding 18 Pa (figure 1B).
Flow entered the aneurysm through the superior aspect of the ostium creating vortex-like flow patterns in each lobe (figure 1D). Blood flow velocity was highest in the superior portion of the aneurysm model.
Direct comparison of the major features in the IFPs measured with 2D pcMRI and calculated with CFD showed very good agreement for each orientation (axial, coronal, sagittal), in particular with respect to the major flow pattern (figure 2). Axial and sagittal oriented IFPs were oriented approximately perpendicular to the inflow jet into the aneurysms which could be appreciated in both orientations as a well-delineated dark feature in the IFPs (2D pcMRI and CFD). The coronal IFP intersected the intra-aneurysmal vortex-like flow returning from the wall distal to the ostium. In this view, the inflow jet is fanned out along the distal wall visible as an elongated dark structure in the IFPs. Flow patterns remained stable during the cardiac cycle with only small observable changes. The relative contrast of IFP features to each other increased at times of maximum inflow relative to times of minimum inflow.
Parallel performance
Performance indices
With respect to the main intent of the study (ie, evaluation of computation time reduction), of particular interest are average wall clock times (13.317 s at baseline vs 4.054 s for 12 CPU) and total wall clock times (33 013.527 s vs 10 097.731 s, table 2). The gain in computation speed is contrasted by the increase in message count (for inter-process communication) and the increase in data transfer (for process synchronizing).
Speed-up parameter
SPave and SPtotal exhibited similar behavior, ranging from 1.31 to 3.54 and from 1.35 and 3.57, respectively (table 3), indicating that the increase in speed-up per iteration governs total gain of computational speed (actual time). Speed-up was not monotonic (table 3 and figure 3), and drops for the configurations with 8 and 12 CPUs were noted. No convergence was found for this parameter within the configurations used, so no estimate for the maximum achievable speed or F could be provided.
Efficiency parameter
Efficiency E (calculated from SPave) ranged from 0.65 (2 CPU) to 0.24 (8 CPU) (table 3), asymptotically approaching approximately 0.3 with an increasing number of CPUs (figure 3).
Cost parameter
Cost was highest for 6 CPUs (131.456) and lowest for 1 CPU (33.013), corresponding to an increase of about a factor of 4 (table 3).
Accuracy
Absolute relative differences in all velocity components were found to be relatively small with a maximum of 0.0073% in the aneurysm dome for 12 CPUs (table 4). Differences averaged over all configurations were found to be statistically higher for the distal than for the proximal ICA segment (p<0.0002), the distal segment than the aneurysm ostium (p<0.01), the distal segment than the aneurysm dome (p<0.02), the proximal segment than the aneurysm ostium (p<0.001) and the proximal segment than the aneurysm dome (p<0.003), but not between the aneurysm ostium and aneurysm dome location (p=0.39). Differences were higher averaged over all points for 12 CPUs (0.004±0.002%, figure 4A) than for 2 CPUs (0.002±0.0009), but this difference did not reach statistical significance (p=0.07).
Simulation convergence
The relative maximum error in the velocity magnitude decreased approximately linearly at all four locations (figure 4B). In the first cardiac cycle, this error varied from 21.3% (in the dome) to 12.4% (in the proximal parent artery). At the fourth cardiac cycle, for all locations this error was <5% (4% in the dome and 3% in the proximal parent artery). The relative maximum error was lowest in the proximal parent artery and highest in the dome. Ordered laminar flow in the parent artery compared with complicated flow patterns present in the aneurysm dome most likely lead to better convergence behavior of the simulation.
Discussion
Parallel multi-CPU computations based on commercially available CFD software were explored for accelerating simulations of pulsatile flow through a bilobed PComA aneurysm. Investigating configurations with a number of CPUs ranging from 1 to 12, a maximum reduction in computation time of a factor of 3.57 was achieved. CFD results compared well with measured data. In general, IFPs calculated with CFD and measured with 2D pcMRI compared favorably in this aneurysm with complex (bilobed) geometry. Comparisons of CFD results with MRI in cerebral aneurysms have been reported previously,14 ,15 ,17 ,18 and the good agreement in the main features found for this study further strengthens the validity of CFD simulations to characterize the hemodynamic environment in cerebral aneurysms.
Use of CFD simulations for understanding hemodynamics in cerebral aneurysms is an active field of research.1 ,16 ,19–23 While some studies are being conducted offline (ie, outside the clinical environment), the potential of CFD for assisting in therapeutic decision-making for the individual patient has been recognized.1 ,4 ,5 Efforts have been reported for transitioning CFD into a clinical technique for characterizing aneurysmal hemodynamics—for instance, by developing a sophisticated computational framework for better understanding of the influence of WSS on endothelial cells19—by exploring influences of mesh characteristics on simulation results26 and by validating CFD results with data measured from other clinical imaging modalities.13 ,25 ,26
Depending on the clinical presentation, the time from discovery of a cerebral aneurysm to its treatment may either be of the order of weeks or days, leaving sufficient time for conducting CFD simulations, or of the order of hours or minutes, which may be the case for a ruptured cerebral aneurysm that requires immediate embolization or clipping. An increase in computational speed to provide simulation results for the latter case may eventually be achieved either with novel algorithms27 and/or by improvements in hardware. Commercially available CFD solvers are advantageous if in-house code development or maintenance are not feasible.
Other reports are available that comment on the efficacy and feasibility of parallel simulations for patient-specific modeling. Fisher et al28 used the Pittsburgh Supercomputing Center in a CFD simulation of the carotid artery employing 1024 nodes. Dong et al29 took advantage of the TeraGrid consortium which combines the resources of multiple long distance clusters internationally. For routine clinical applications, these approaches provide challenges. By simulating a carotid artery system, Yue et al reported parallel performance and scalability on a Linux cluster with up to 16 processors.30 Wang et al investigated parallel performance with Total Cavoplumonary Connection model systems of complex geometry on a Beowulf computer cluster with each node having two 1.6 GHz processors using the ANSYS Fluent parallel solver.31 Performance was evaluated on up to 36 nodes. Efficiency was reported to drop substantially beyond 16 processors (suspected to be because of suboptimal automated decomposition of the computational domain). For 10 nodes, a speed-up of a factor of 10 was reported which is higher than the speed-up factor of 3.57 found in our study. Variations in the decomposition of the computational domain and also the computer cluster may have contributed to this difference. In addition, variations in the hardware between the single node workstation and the hardware used in the RSF may have had an influence (processors in the RSF may have been slightly slower than the processor used for the single run). Also, the kind of network interconnection between the processors in the ASF may have influenced total speed-up. It can be noted from table 2 that message count and data transfer per iteration are increased with a larger number of processors. Unfortunately, no information about the time spent in communication versus the time spent in computation was available from the log files.
With the achieved speed-up, no compromise in computational accuracy was noted. The maximum relative difference of 7.3×10−3% has to be contrasted with the inherent uncertainties in the boundary conditions of the studied model. Volumetric flow measured with 2D pcMRI has been reported to have a reproducibility of the order of 7.9% for a pulsatile flow model intended to replicate conditions in cerebral vessels.32 Wall motion of cerebral aneurysms has been reported to be region-dependent (potentially larger at sites of a daughter aneurysms), with maximum wall extensions of the order of tenth of millimeters up to millimeters.25 ,33 While relative differences in the velocity vector components increased for the configuration with 12 CPUs compared with baseline (figure 3), this increase was not found to be statistically significant. However, location-dependent statistical differences in the velocity values were observed with higher differences in the ostium and dome compared with the proximal and distal artery segments. Flow patterns were found to be more complex in the former location than in the latter location.
The observed speed-up (3.57) is encouraging, effectively reducing computation time from an unacceptable waiting period (almost 10 h) to a potentially manageable delay (about 2.5 h) for clinical use, as often a few hours may pass between an imaging study and an actual clinical intervention, particularly of unruptured cerebral aneurysms. For ruptured aneurysms, further reduction in computation time is desirable as immediate treatment after angiographic examination of the patient is often indicated. Five cardiac cycles were used in the present simulation to reduce the effects of initial transients. As demonstrated by the relative maximum error calculated with the fifth simulation as baseline, a reduction in the convergence error of about a factor of 4 was achieved in the fourth cardiac cycle compared with the first cardiac cycle. If higher convergence errors are acceptable, using a smaller number of cardiac cycles in the simulations may be a possibility to further reduce overall computation time. Another option may be to conduct steady state simulations (constant inflow) only to obtain a first impression of the flow field. As flow in the parent arteries of cerebral aneurysms has a strong antegrade component which is modulated by the pulsatility of the cardiac output34 (in contrast to the aorta where retrograde flow is also observed), this may be an acceptable approach. However, time-dependent quantities such as oscillatory shear index or temporal variations of the WSS (which have been reported to be of influence in aneurysm formation or growth2 ,35) or the formation of blebs23 cannot be derived from these simulations. Another approach may be the reduction of mesh elements or elimination of boundary layers (as used in our case). A recent comparison of CFD simulations of cerebral aneurysms employing only meshes of different sizes with tetrahedral elements and meshes including polyhedral elements and boundary layers only revealed relatively small differences in velocity components and WSS magnitudes ranging from 2.9% to 5.4%.36
The ease of use of the commercial multi-CPU code (no additional set-up or user interaction required) makes its application appealing for patient-derived modeling as there was no appreciable disadvantage in terms of accuracy. Additional simulations with other aneurysms of varying flow complexity may therefore be warranted, particularly with respect to the automated domain decomposition and to a further increase in the number of nodes to obtain a more comprehensive picture of its performance.
Conclusion
Parallel multi-CPU simulations employing up to 12 CPUs were found to reduce computing time for the simulation of hemodynamics in a cerebral aneurysm model from clinically unacceptable values of about 10 h to potentially acceptable values of about 2.5 h without compromising accuracy of the calculated vector field. Further reduction may be achieved with a larger number of processors, thereby providing a means to integrate CFD simulations into the clinical workflow for therapeutic decision-making.
Footnotes
-
Contributors CK: conception and design. CY, SP: analysis and interpretation. EG: critical revision for intellectual content. MH: conception and design. YJZ, RPK, OD, RGG: final approval.
-
Competing interests MH is an employee of ANSYS Inc. There are no other conflicts of interests associated with this manuscript.
-
Ethics approval Ethical approval was obtained from The Methodist Hospital Institutional Review Board.
-
Provenance and peer review Not commissioned; externally peer reviewed.