benchmark

warning

This is a work in progress. The content is subject to change as the project evolves. MEDS-DEV results, while in principle reproducible, are not yet finalized.

The MEDS-DEV benchmark evaluates machine learning models across shared clinical prediction tasks on MEDS-formatted datasets. Use the views below to explore results:

Leaderboard — compare model scores across all tasks at a glance
Heatmap — visualize performance patterns across the task × model matrix
Model Rankings — see which models win most often across tasks
Per-Task Detail — drill into a specific task to compare all metrics

Loading benchmark data...