Diagnostic Accuracy of 4 Commercially Available Semiautomatic Packages for Carotid Artery Stenosis Measurement on CTA

BACKGROUND AND PURPOSE: Semiautomatic measurement of ICA stenosis potentially increases observer reproducibility. In this study, we assessed the diagnostic accuracy and interobserver reproducibility of a commercially available semiautomatic ICA stenosis measurement on CTA and estimated the agreement among different software packages. MATERIALS AND METHODS: We analyzed 141 arteries from 90 patients with TIA or ischemic stroke. Manual stenosis measurements were performed by 2 neuroradiologists. Semiautomatic measurements by using 4 methods (3mensio and comparable software from Philips, TeraRecon, and Siemens) were performed by 2 observers. Diagnostic accuracy was estimated by comparing semiautomatic with manual measurements. Interobserver reproducibility and agreement between different packages was assessed by calculation of the intraclass correlation coefficient and Bland-Altman 95% limits of agreement. False-negative classifications were retrospectively inspected by a neuroradiologist. RESULTS: There was no significant difference in the diagnostic performance of the 4 semiautomatic methods. The sensitivity for detecting ≥50% and ≥70% degree of stenosis was between 76% and 82% and 46% and 62%, respectively. Specificity and overall diagnostic accuracy were between 92% and 97% and 85% and 90%, respectively. The interobserver intraclass correlation coefficient was between 0.83 and 0.96 for semiautomatic measurements and 0.81 for manual measurement. The limits of agreement between each pair of semiautomatic packages ranged from −18%–24% to −33%–31%. False-negative classifications were caused by ulcerative plaques and observer variation in stenosis and reference measurements. CONCLUSIONS: Semiautomatic methods have a low-to-good sensitivity and a good specificity and overall diagnostic accuracy. The high interobserver reproducibility makes semiautomatic stenosis measurement valuable for clinical practice, but semiautomatic measurements should be checked by an experienced radiologist.

C arotid endarterectomy in neurologically symptomatic patients with a 70%-99% stenosis results in a 16% decrease in the absolute risk for an ipsilateral stroke in 5 years. However, endarterectomy is only marginally beneficial for patients with a 50%-69% stenosis and has no positive effect in patients with a Ͻ50% stenosis. 1 Therefore, the degree of carotid stenosis is crucial in clinical decision-making, and precise and accurate measurement of the degree of stenosis is mandatory. The stenosis measurements on which these thresholds are based were determined by using conventional angiography, which is considered as the original criterion standard. 2 Due to neurologic complications related to DSA 3 and a good diagnostic accuracy of noninvasive tests, carotid stenosis measurement on CTA or MRA has become the standard in clinical practice. 4,5 However, manual measurement of the degree of stenosis on CTA according to the NASCET method is prone to low interobserver reproducibility and requires experience. 6,7 Semiautomatic methods increase the interobserver reproducibility and accelerate the measurement. 8,9 Furthermore, semiautomatic methods require less observer experience compared with manual measurement. 10 Multiple semiautomatic packages are currently available and used in clinical practice. Be-cause different vendors may use different algorithms, 11 the reliability of measurements with different software packages is unclear. To become a valuable clinical tool, the diagnostic accuracy must be further investigated. The goal of this study was to assess the agreement and diagnostic accuracy of 4 commercially available software packages for semiautomatic stenosis measurement compared with manual measurement on CTA and to estimate the interobserver reproducibility and the agreement among different semiautomatic packages.

Patient Selection
Patients with a recent TIA or stroke suspected of having ICA stenosis were evaluated by duplex sonography. According to local guidelines, when the stenosis on duplex sonography was Ն30% for a man and Ն50% for a woman, CTA was performed to estimate the degree of stenosis more precisely. All consecutive patients (n ϭ 110) who underwent a 64-section CTA with a 0.9-mm section thickness for carotid stenosis evaluation between April 2006 and December 2008 were retrospectively included in this analysis. This complete population was previously investigated to assess the performance of semiautomatic measurement of ICA stenosis on CTA by using Vitrea 2 version 4.1.2.0 (Vital Images, Plymouth, Minnesota). 8 In the current study, we report the diagnostic accuracy and reproducibility of semiautomatic carotid stenosis measurement on CTA by using 4 other commercially available software packages and estimate the agreement among different software packages. Furthermore, this complete population was previously used to investigate the relation of calcium volume with carotid artery disease, 12 and it was also used to investigate the prevalence of intracranial carotid artery disease and quantify the intracranial stenosis, 13,14 and to investigate the relation between intracranial carotid artery stenosis and poor outcome. 15 Patients with a previous carotid intervention (n ϭ 16) and those with CTA of insufficient quality (n ϭ 4) were excluded; 90 patients remained for further analysis. The mean age was 66.8 years (range, 35-89 years), and 54 were men. Forty patients (44%) had ischemic stroke as a final diagnosis; 32 patients (36%), a transient ischemic attack; and 14 patients (16%), amaurosis fugax. Three patients (3%) were asymptomatic, and 1 patient (1%) had an ocular ischemic syndrome.
Because CTA was performed in the clinical setting, informed consent was waived by the local medical ethics committee.

CTA Protocol
CTA was performed as previously described 8 with a 64-section scanner (Brilliance 64; Philips Healthcare, Best, the Netherlands). Eighty milliliters of contrast (iodixanol, Visipaque 320; GE Healthcare, Piscataway, New Jersey) was infused at 4 mL/s. Acquisition and reconstruction parameters were as follows: 120-kV tube voltage, 265 mAs, pitch of 0.765, and reconstructed section thickness of 0.9 mm with an increment of 0.45 mm. The scan ranged from the aortic arch up to 3 cm above the sella turcica. The in-plane grid was 512 ϫ 512 pixels, with an FOV ranging from 128 ϫ 128 mm 2 to 217 ϫ 217 mm 2 , with an average of 155 ϫ 155 mm 2 .

Stenosis Measurement
For both the manual and semiautomatic measurements, the observers were blinded to patient information and each other's findings. The degree of stenosis was defined according to the NASCET criteria 2 by using the minimal diameter at the stenosis and the maximum reference diameter at a healthy part of the artery well beyond (Ͼ30 mm) the stenosis. 8 Because the cross-section of an artery is not round, there is no true diameter. Therefore, we defined the "minimal diameter" as the minimal cross-sectional distance of the artery from wall to wall and the "maximum diameter" as the maximum cross-sectional distance of the artery from wall to wall. The minimal diameter of the stenosis was determined by the observers within 3 cm proximal and distal to the bifurcation. 16 Arteries with near-occlusion (collapsed or small distal artery) were identified according to the criteria described by Bartlett et al. 17 For both the manual and semiautomatic measurements, occlusion of the arteries was reported. For all measurements, the processing time was recorded.

Manual Stenosis Measurements
Manual measurements were performed on CTA by 2 neuroradiologists both with Ͼ10 years of experience according to the method described by Bartlett et al 17 by using a workstation with MPR functionality (Impax, Version 5.2; Agfa-Gevaert, Mortsel, Belgium). Measurements were performed on a plane perpendicular to the centerline of the artery. The first observer measured all arteries, which were used as the reference, and a subset of 50 arteries a second time with a delay of 2 months. The second observer measured a subset of 48 arteries.

Semiautomatic Stenosis Measurements
Semiautomatic stenosis grading was performed with software packages from Pie Medical Imaging (3mensio Vascular 6.1; Pie Medical Imaging Maastricht, the Netherlands), Philips (Extended Brilliance Workspace, Version 4.1 Advanced Vessel Analysis), TeraRecon (Vessel Analysis 4.4.6.85; TeraRecon, San Mateo, California), and Siemens (syngo INspace4D Advanced Vessel Analysis 2009 -2013; Siemens, Erlangen, Germany). One trained observer (2 years of experience) performed stenosis measurement by using all software packages with Ͼ2 months between measurements with different packages. A second trained observer (6 months of experience) performed the measurements by using Philips and 3mensio software, and a third trained observer (6 months of experience) performed the measurements by using Siemens and TeraRecon software, both with Ͼ2 months between measurements with different software packages. To prevent recall of measurements performed in previous studies on this population, we selected observers who were not involved in the previous studies. 8,[12][13][14][15] Using the software from Philips, TeraRecon, and Siemens, we placed Ն2 seed points on the axial images: The first seed point was placed in the ICA close to the base of the skull, and the last seed point, in the common carotid artery below the bifurcation (Ͼ5 cm). Subsequently, software packages automatically segmented the ICA and determined the centerline of the ICA. Minimal and maximal lumen diameters of the arteries were automatically calculated and displayed together with curved planar reformations of the artery. By dragging a slider along the curved planar reformation of the artery, observers were able to select the minimal stenosis diameter. For the 3mensio software, the ICA of interest was automatically segmented after placement of a single seed point. Subsequently, seed points were placed in the ICA, bifurcation, external carotid artery, and common carotid artery on a 3D representation, and the centerline was automatically determined. The 3mensio software fitted an ellipse on the segmented cross-sectional lumen area and presented the minimal and maximum diameters of the ellipse as the lumen diameters and displayed them together with curved planar reformations of the artery. The observer selected the region of the ICA containing the stenosis, and the software automatically determined the smallest diameter of the stenosis. For all software packages, the reference location was selected by dragging a slider on the curved planar reformation along the distal ICA well beyond the site of stenosis. The reference location was selected at a vertically running part with the largest diameter and the least variation in diameter. At the selected reference location, the minimal and maximal diameters were recorded. For all software packages, erroneous or incomplete segmentations and erroneous centerlines were manually corrected.
To evaluate potential improvements of the interobserver reproducibility, we performed an additional measurement with a second standardized reference location exactly 30 mm above the minimal stenosis diameter for a single software package (Siemens) and calculated the degree of stenosis.

Statistical Analysis
Diagnostic Accuracy. To determine the diagnostic accuracy of semiautomatic stenosis measurement, we used the manual stenosis measurements by the first observer as a reference. The agreement of the semiautomatic stenosis measurements with the manual reference was assessed by scatterplots, Bland-Altman analysis with 95% limits of agreement, and the calculation of the intraclass correlation coefficient (ICC) (agreement, 2-way-mixed, single measure). Diagnostic accuracy was determined for diagnoses of Ն50% and Ն70% stenoses. Sensitivity, specificity, positive predictive value, negative predictive value, and overall diagnostic accuracy were calculated. The extended McNemar test was used to compare the sensitivity, specificity, and overall diagnostic accuracy among the software packages. P values Ͻ .05 were considered statistically significant. Statistical analyses were performed by using SPSS, Version 21 (IBM, Armonk, New York).
Inter-and Intraobserver Reproducibility. Inter-and intraobserver reproducibility of the manual measurements and interobserver reproducibility of the semiautomatic measurements were assessed by scatterplots, Bland-Altman analysis, and the calculation of the ICC. A paired t test was used to determine whether the interobserver bias for semiautomatic measurements was statistically different from manual measurements. The Fisher Z-test was used to determine whether the interobserver ICC for semiautomatic measurements was statistically significantly different from manual measurements. The agreement of observers classifying a stenosis equal to or higher than a cutoff of 50% and 70% was assessed by using statistics.
Agreement between Different Semiautomatic Software Packages. The agreement between measurements with different semiautomatic software packages was assessed by Bland-Altman analysis and calculating the ICC. Instead of choosing a fixed observer per software package, we randomly selected 1 of the 2 observers for each measurement to avoid observer dependence. Thus, we aimed to simulate a clinical setting in which multiple users may use the software package.

Retrospective Error Analysis
Semiautomatic measurements classified as false-negative were retrospectively investigated by a neuroradiologist (10 years of experience) and trained observer (2 years of experience) to inspect whether the measurement was correctly performed by the observers and no erroneous centerlines or erroneous lumen segmentations were present. A measurement was classified as false-negative if the degree of stenosis was above the cutoff point (50% or 70%) according to manual measurement but below the cutoff for the semiautomatic measurement.
As Table 1 shows, the average processing time of all semiautomatic measurements was faster than that for manual measurements. See On-line Fig 1 for examples of semiautomatic ICA stenosis measurement.

Diagnostic Accuracy
The agreement of semiautomatic measurements with manual measurements is illustrated in Fig 1 by scatterplots. The ICC and limits of agreement are shown in Table 2. All software packages showed a high correlation, with ICCs between 0.86 and 0.88. The mean paired difference between manual and semiautomatic measurements was small, ranging from 2.1% Ϯ 13% to 3.8% Ϯ 14% (Fig 2). However, the Bland-Altman limits of agreement were wide, ranging from Ϫ23%-27% to Ϫ24%-31% (Fig 2). The diagnostic performance is presented in Table 3. The semiautomatic measurements have a low sensitivity for detecting a Ն70% stenosis, with sensitivity values between 46% and 62%. The specificity and overall diagnostic accuracy of detecting Ն70% degree stenosis were good for semiautomatic measurements, ranging between 96%-97% and 87%-90%, respectively. The semiautomatic measurements showed a moderate-togood sensitivity for detecting Ն50% stenosis with values between 68% and 82%. The specificity and overall diagnostic accuracy for detecting Ն50% stenosis were good, ranging between 93%-95% and 85%-88%, respectively. No statistically significant differences in the diagnostic performance among software packages were found. All occluded arteries were detected by the observers regardless of the semiautomatic method used.

Inter-and Intraobserver Reproducibility
Observer reproducibility is illustrated in Figs 3-5, and the results can be found in Tables 4 and 5. The Bland-Altman plots showed a small inter-and intraobserver reproducibility bias with wide limits of agreement for manual stenosis measurements (Fig 3). The manual measurements have a reasonable-to-good inter-and intraobserver reliability, with an ICC of 0.81 and 0.88, respectively. The Bland-Altman plots show that interobserver reproducibility bias was smallest for 3mensio and Philips (Fig 5). The semiautomatic measurements have a reasonable-to-excellent interobserver reproducibility with ICCs between 0.83 and 0.96. For 3mensio and Philips, the interobserver reproducibility was significantly better than the interobserver reproducibility of the manual measurements. With the Siemens software with a fixed reference location 3 cm above the minimal stenosis diameter, the average difference in degree of steno-  sis was 3.5% Ϯ 15% compared with 6.5% Ϯ 12% for the standard reference location (P Ͻ .001) and the interobserver reproducibility was slightly lower, with an ICC of 0.84 compared with 0.86 with a non-statistically significant difference (P ϭ .55).
For detecting a stenosis of Ն50%, the statistics for the interobserver agreement were good for manual measurement and, depending on the software package, fair to excellent for the semiautomatic measurements (Tables 4 and 5). For detecting a stenosis of Ն70%, the statistics for interobserver agreement were fair for the manual measurement and, depending on the software package, poor to excellent for the semiautomatic measurement packages.

Agreement among Semiautomatic Measurements
The agreement among measurements with different semiautomatic software packages can be found in the On-line Table. The correlation of measurements with different semiautomatic pack-  Bland-Altman plots of the degree of stenosis determined by manual and semiautomatic assessment. The black lines represent the mean paired difference and 95% limits of agreement. The characteristic V-shape in the Bland-Altman plot is caused by 1 of the 2 measurements being zero with the other measurement being nonzero. These measurements happened particularly when the degree of stenosis was small (Ͻ30%). ages is high, with ICCs ranging from 0.92 to 0.98. The mean paired differences between semiautomatic packages range from 0.49% to 5.7%, and the Bland-Altman limits of agreement are wide, ranging from Ϫ17%-18% to Ϫ33%-31%.

Retrospective Error Analysis
Most measurements classified as false-negative were because the semiautomatic method measured a larger stenosis diameter and/or a smaller reference diameter compared with manual measurements by the neuroradiologist (78% [36/46] for a stenosis of Ն70% and 89% [51/57] for a stenosis of Ն50%). There were no apparent errors in the centerline, and only 4.3% (2/46) of the false-negatives for a stenosis of Ն70% and 5.3% (3/57) for a stenosis of Ն50% were caused by erroneous lumen segmentation due to calcium. An ulcerative plaque hampered semiautomatic measurements in 17.4% (8/46) for a stenosis of Ն70% and 5.3% (3/57) for a stenosis of Ն50% and resulted in severe overestimation of the stenosis diameter compared with manual measurement (Fig 6). 3mensio fits an ellipse on the segmented lumen and uses the smallest diameter of the ellipse as a minimal stenosis diameter; this can result in a minimal stenosis diameter that is larger than the minimal stenosis diameter measured by a radiologist (Fig 6). This method caused 40% (4/10) of the 3mensio false-negatives for a stenosis of Ն70% and 20.0% (3/15) for a stenosis of Ն50. For 8.7% (4/46) of the false-negatives for a stenosis of Ն70% and 12.3% (7/57) for a stenosis of Ն50%, the difference in the degree of stenosis with manual measurement was only 5% and the manual measurements were just above the cutoff point and the semiautomatic measurements were just below the cutoff.

DISCUSSION
In this study, we investigated the diagnostic performance of 4 commercially available semiautomatic software packages with manual measurement as a reference. All semiautomatic methods had a moderate-to-good sensitivity for detecting a stenosis of Ն50% and low sensitivity for detecting a stenosis of Ն70%. All semiautomatic methods had a good specificity and overall diag-  nostic accuracy for detecting stenoses of Ն70% and Ն50%. All semiautomatic stenosis measurement methods are 40% faster than manual measurements. For 3mensio, we found a much higher interobserver reproducibility compared with manual measurement. All semiautomatic methods had a good correlation with manual measurement.
Our results are in line with the previously reported sensitivity and specificity of 75% and 98% for detecting a stenosis of Ն70% and 78% and 93% for detecting a stenosis of Ն50%. 8 Our results are similar to the previously reported sensitivity and specificity of 44.2% and 97.7% for detecting a stenosis of Ն70% and 86.2% and 93.1% for detecting a stenosis of Ն50% in 46 patients with known cerebrovascular disease. 18 The interobserver agreement for semiautomatic measurements is in line with previously reported statistics of 0.55 for detecting a stenosis of Ն50%, and 0.59 for detecting a stenosis of Ն70% 19 and Pearson correlation coefficients of 0.89 and 0.90. 9,19 As in previously reported studies, 8,9 we found that semiautomatic stenosis measurement can increase observer reproducibility.
This study has a number of limitations. For pragmatic reasons, we used different observers for different software packages; this difference makes it more difficult to compare the semiautomatic software packages. We used manual measurements on CTA as a reference, while the original NASCET classification is based on conventional catheter angiography. Due to the risks associated with conventional catheter angiography, 3 it would be unethical to perform DSA. Bucek et al 18 showed that the median difference between semiautomatic CTA and manual DSA stenosis measurement was smaller than the median difference between manual measurement on CTA and DSA, Ϫ2% versus 11%, respectively. This finding may imply that manual stenosis measurement tends to overestimate the degree of stenosis compared with measurement on DSA; this overestimation may have caused the low sensitivity found in this study. Due to the low observer reproducibility of manual stenosis measurement, one could question its value as a reference standard to determine the diagnostic accuracy of semiautomatic measurements. However, because manual stenosis measurement is standard in clinical practice, we believed that this measurement was the best choice to evaluate the accuracy of the automated methods.
Retrospective error analysis of the false-negatives showed that most false-negatives were due to the semiautomatic method measuring a larger stenosis diameter and/or a smaller reference diam-  eter compared with manual measurements by a radiologist. Onetenth of the false-negatives were caused by an ulcerative plaque that hampered correct semiautomatic measurement and was dif-ficult for a nonradiologist to detect. To determine the agreement among different pairs of software packages, we randomly selected 1 of the 2 observers for each measurement instead of using the mean of the 2 observers. Averaging the measurements diminishes outliers and therefore might result in a too optimistic agreement between the different semiautomatic software packages. 20 Furthermore in this manner, we aimed to simulate a clinical setting in which multiple users may use the software package.    Creating MPRs perpendicular to the artery, needed for manual stenosis measurement and manual measurement of the diameter of the artery lumen, is prone to observer variation and requires experience. 7,8 This variation resulted in the lower interobserver reproducibility of manual stenosis measurement compared with semiautomatic measurement.
Semiautomatic methods ease stenosis measurements and can have a higher observer reproducibility compared with manual measurement, because manual creation of MPRs and manual lumen measurement are not needed. All 4 semiautomatic software packages are comparable in the ease of use and required observer skills. 3mensio was the only package that determined the minimal diameter of the stenosis automatically. This higher level of automation may have resulted in the superior observer agreement. Although the interobserver reproducibility can be higher for semiautomatic measurements, manual selection of the minimal stenosis diameter and reference diameter is still needed and is therefore a source of observer variability. Furthermore, manual correction of the center line and lumen segmentation are often needed to ensure accurate measurement, especially when the artery is very tortuous or the plaque is calcified. 8,17 The manual selection of the minimal stenosis diameter and reference diameter and the manual corrections may have caused the wide Bland-Altman limits of agreement for the semiautomatic methods and the low observer reproducibility for some of the packages.
Endarterectomy is beneficial for patients with a stenosis degree of Ն50% for men and Ն70% for women. 1 Therefore accurate and reproducible measurement of the degree of stenosis is crucial for selecting patients for endarterectomy.

CONCLUSIONS
Most semiautomatic software packages have a higher observer reproducibility than manual measurements, which results in more consistent stenosis measurement and less observer dependency in treatment selection. Because of the necessity of manual corrections of semiautomatic measurements, training of the observers and awareness of erroneous centerlines and lumen segmentations remain crucial. All 4 semiautomatic methods have a high positive predictive value and a good overall diagnostic accuracy for the detection of an ICA stenosis of Ն50% and Ն70%. The potentially excellent observer reproducibility of semiautomatic measurements makes them suitable for clinical practice, but the poor sensitivity for a stenosis of Ն70% should be taken into account and measurements should be checked by a radiologist. Example of ulcerative plaque. On the left side how 3mensio segments the artery is shown, and in the right upper corner how 3mensio segments the lumen and the ulcerative plaque is shown. The turquoise line is the minimal stenosis diameter as determined by 3mensio (3.5 mm); the white-with-red line is a measurement of the true lumen (1.2 mm). The right lower corner shows a sagittal view of the ulceration. This image also shows 3mensio fitting an ellipse (yellow) on the segmented lumen of the artery (turquoise).