Table 1:

Performance of multiclass classification across all models and experts in test set

MethodMicro-Averaged AUCAccuracy (95% CI)Sensitivity (95% CI)Specificity (95% CI)
Radiomics (by TPOT)0.910.83 (0.72–0.90)MB: 0.87 (0.67–0.96)MB: 0.91 (0.78–0.97)
EP: 0.67 (0.46–0.83)EP: 0.98 (0.88–1.00)
PA: 0.95 (0.76–1.00)PA: 0.86 (0.72–0.94)
Radiomics (by CHSQ and GLM)0.920.74 (0.62–0.83)MB: 0.96 (0.77–1.00)MB: 0.84 (0.70–0.92)
EP: 0.33 (0.17–0.55)EP: 0.93 (0.81–0.98)
PA: 0.91 (0.71–0.99)PA: 0.84 (0.70–0.92)
Expert 1NA0.58 (0.46–0.69)MB: 0.65 (0.45–0.81)MB: 0.67 (0.52–0.79)
EP: 0.57 (0.36–0.75)EP: 0.82 (0.68–0.91)
PA: 0.50 (0.31–0.69)PA: 0.86 (0.72–0.94)
Expert 2NA0.50 (0.38–0.62)MB: 0.57 (0.37–0.75)MB: 0.66 (0.51–0.77)
EP: 0.43 (0.25–0.64)EP: 0.80 (0.66–0.89)
PA: 0.50 (0.31–0.69)PA: 0.77 (0.63–0.87)
  • Note:—NA indicates not applicable.