Skip to main content

benchmark

warning

This is a work in progress. The content is subject to change as the project evolves. MEDS-DEV results, while in principle reproducible, are not yet finalized.

The MEDS-DEV benchmark evaluates machine learning models across shared clinical prediction tasks on MEDS-formatted datasets. Use the views below to explore results:

  • Leaderboard — compare model scores across all tasks at a glance
  • Heatmap — visualize performance patterns across the task × model matrix
  • Model Rankings — see which models win most often across tasks
  • Per-Task Detail — drill into a specific task to compare all metrics
Loading benchmark data...