Table 2:

Comparison of performance metrics of segmentations for different CNN modelsa

ModelDicePrecisionSensitivity
LOWB6.5 (0.3–20.9)5.7 (0.3–32.7)8.5 (0.3–28.5)
ADCb56.4 (27.1–75.4)59.4 (22.3–78.4)58.2 (32.7–78.9)
DWI72.3 (46.2–82.5)73.0 (38.3–88.1)84.0 (62.4–90.8)
ADC+LOWB76.5 (51.9–86.1)78.1 (47.2–88.8)79.2 (66.6–89.7)
DWI+LOWB76.7 (58.4–85.4)79.4 (52.0–89.8)83.0 (64.8–90.6)
DWI+ADC79.0 (57.1–86.4)79.0 (62.1–90.5)82.6 (68.4–91.4)
DWI+ADC+LOWB78.9 (56.2–86.2)77.4 (55.0–89.8)83.4 (71.3–91.8)
E2 (DWI+ADC)82.0 (62.9–88.1)82.0 (65.1–92.6)b84.1 (71.0–92.6)
E3 (DWI+ADC+LOWB)82.2 (64.9–88.9)83.2 (67.7–93.3)83.9 (71.9–92.4)
  • a All metrics are denoted in percentages as median (IQR). Of the nonensemble models, significant differences in Dice, precision, and sensitivity were found (P < .001). The ensemble models, E2 and E3, were superior to all other models (P < .001).

  • b Excludes 1 subject with an automatically segmented lesion volume of zero because precision is undefined in this circumstance.