Table 2:

Performance of binary classification across all models and experts in the test set

MethodAUCAccuracy (95% CI)Sensitivity (95% CI)Specificity (95% CI)
Radiomics (by TPOT)
 MB vs non-MB0.940.85 (0.74–0.92)0.91 (0.72–0.99)0.81 (0.67–0.90)
 EP vs non-EP0.840.80 (0.69–0.88)0.52 (0.32–0.71)0.93 (0.81–0.98)
 PA vs non-PA0.940.88 (0.78–0.94)0.95 (0.76–1.00)0.84 (0.70–0.92)
Radiomics (by manual optimized pipeline)
 MB vs non-MB0.980.91 (0.81–0.96)0.96 (0.78–1.00)0.88 (0.75–0.95)
 EP vs non-EP0.700.71 (0.59–0.81)0.19 (0.07–0.40)0.95 (0.83–0.99)
 PA vs non-PA0.930.86 (0.75–0.93)0.77 (0.56–0.90)0.91 (0.78–0.97)
Expert 1
 MB vs non-MBNA0.67 (0.55–0.77)0.65 (0.45–0.81)0.67 (0.52–0.79)
 EP vs non-EPNA0.74 (0.60–0.82)0.57 (0.36–0.75)0.82 (0.68–0.91)
 PA vs non-PANA0.74 (0.62–0.83)0.50 (0.31–0.69)0.86 (0.72–0.94)
Expert 2
 MB vs non-MBNA0.64 (0.52–0.75)0.57 (0.37–0.75)0.66 (0.51–0.77)
 EP vs non-EPNA0.68 (0.54–0.79)0.43 (0.25–0.64)0.80 (0.66–0.89)
 PA vs non-PANA0.68 (0.56–0.78)0.50 (0.31–0.69)0.77 (0.63–0.87)