benchmark
warning
This is a work in progress. The content is subject to change as the project evolves. MEDS-DEV results, while in principle reproducible, are not yet finalized.
The MEDS-DEV benchmark evaluates machine learning models across shared clinical prediction tasks on MEDS-formatted datasets. Use the views below to explore results:
- Leaderboard — compare model scores across all tasks at a glance
- Heatmap — visualize performance patterns across the task × model matrix
- Model Rankings — see which models win most often across tasks
- Per-Task Detail — drill into a specific task to compare all metrics
Loading benchmark data...