Extracting a Prediction Task Cohort from MEDS Data
To easily extract prediction tasks from MEDS Data, we will use the ACES package. This package allows you to define simple configuration files that specify the inclusion/exclusion criteria for tasks you want to extract and have them be automatically extractable from MEDS data via a command line interface. See the ACES documentation for more information.
In this tutorial, we'll run an end-to-end extraction of a prediction task from the MIMIC-IV Demo dataset.
Tutorial Set-up
[10]:
import os
from pathlib import Path
DEMO_DIR = Path(os.getenv("MEDS_DEMO_DIR", "./demo_output"))
MIMIC-IV Demo Dataset
You can use the MIMIC_IV_MEDS package to easily download and automatically transform the MIMIC-IV-Demo dataset into MEDS:
[22]:
OUTPUT_DIR = DEMO_DIR / "meds/"
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
!MEDS_extract-MIMIC_IV root_output_dir=$OUTPUT_DIR do_demo=True do_copy=True hydra/job_logging=disabled
Extracting the Prediction Task
In-ICU Mortality
For this tutorial, we'll extract a cohort for a basic in-ICU mortality prediction task. Let's first define our task parameters.
Our goal is to predict the mortality outcome for a patient's entire ICU admission using historical patient data plus the initial 24 hours of data after the patient was first admitted into the ICU.
Suppose we only want to consider patients whose ICU admission was longer than 48 hours. As such, patients who died or are discharged from the ICU within 48 hours of being admitted are excluded.
Note: This task is distinct from in-hospital mortality, 30-day mortality, or imminent mortality.
We can visualize this task as a series of inter-related windows using the below timeline:
- The "blue" region represents the "input" data window. All historical patient data up to and including the first 24 hours of a patient's ICU admission will serve as input into a downstream model.
- The "red" bar represents the ICU admission, and would "trigger" the start of our prediction task.
- The "yellow" region represents the first 48 hours of a patient's ICU admission, and we stipulate a "gap" whereby the patient must not have died or been discharged during that period.
- The "magenta" region represents our prediction "target" window which can be of varying length, as it ends whenever a patient dies or is discharged.
ACES Configuration File
ACES uses a configuration language to capture cohort and task definitions. Details about this configuration language is available in the ACES Documentation.
Let's walkthrough the construction of an ACES configuration file. At a minimum, the most basic configuration file would contain the predicates
, trigger
, and windows
sections.
Predicates
To capture our task definition, we must first define at least three simple concepts, or ACES plain predicates, for our demo dataset.
For starters, as we are specifically interested in mortality "in the ICU", an ICU admission
and a ICU discharge
predicate would be needed to represent events where patients are officially admitted to the ICU and where patients are officially discharged. We also need the death
predicate to capture death events so we can accurately capture the mortality component.
To define predicates, we would need to find how these concepts are represented in our dataset. For MIMIC-IV-Demo in MEDS format, these concepts can be found using simple regular expressions:
predicates:
icu_admission:
code: { regex: "^ICU_ADMISSION//.*" }
icu_discharge:
code: { regex: "^ICU_DISCHARGE//.*" }
death:
code: { regex: "MEDS_DEATH.*" }
Since patients can either die or be discharged from the ICU, we may also create a more complex concept, or an ACES derived predicate, by joining the above simple concepts using an OR
relationship:
predicates:
discharge_or_death:
expr: or(icu_discharge, death)
Trigger
As mentioned above, a patient's admission into the ICU triggers our prediction task. A designated field defines this ACES trigger, and its value must always be one of the specified predicates. For our task, this predicate would be icu_admission
:
trigger: icu_admission
Windows
The windows section contains the remaining three windows we defined previously - input
, gap
, and target
.
For details on the configuration language syntax for windows, please see the documentation. Briefly, certain fields are present in all windows:
windows:
window_name:
start:
end:
start_inclusive:
end_inclusive:
However, some windows also have optional parameters, such as the has
field, which captures predicate count criteria for that particular window:
windows:
window_name:
...
has:
predicate_a: (min, max)
predicate_b: (min, max)
...
For our in-ICU mortality prediction task, we can define:
-
input
, which begins at the start of a patient's record (ie.,null
), and ends 24 hours pasttrigger
(ie.,icu_admission
). As we'd like to include the events specified at both the start and end ofinput
, if present, we can set bothstart_inclusive
andend_inclusive
asTrue
.Note: Since we'd like our model to make a prediction at the end of
input
, we can setindex_timestamp
to beend
, which corresponds to the timestamp oftrigger + 24h
.windows: input: start: null end: trigger + 24h start_inclusive: True end_inclusive: True index_timestamp: end
-
gap
, which also begins attrigger
, and ends 48 hours after. As we have included the left boundary event intrigger
(ie.,icu_admission
), it would be reasonable to not include it again as it should not play a role ingap
. As such, we setstart_inclusive
toFalse
. As we'd like our ICU admission to be at least 48 hours long, we can place constraints specifying that there cannot be any additionalicu_admission
,icu_discharge
, ordeath
ingap
.windows: gap: start: trigger end: start + 48h start_inclusive: False end_inclusive: True has: icu_admission: (None, 0) discharge_or_death: (None, 0)
-
target
, which begins at the end ofgap
, and ends at the next discharge or death event (ie.,discharge_or_death
predicate). We can use this arrow notation which ACES recognizes as event references (ie.,->
and<-
; see Time Range Fields). In our case, we endtarget
at the nextdischarge_or_death
. Similarly, as we included the event at the end ofgap
, if any, already ingap
, we can setstart_inclusive
toFalse
.Note: Since we'd like to make a binary mortality prediction, we can extract the
death
predicate as a label fromtarget
, by specifying thelabel
field to bedeath
.windows: target: start: gap.end end: start -> discharge_or_death start_inclusive: False end_inclusive: True label: death
Now, we can put all the components together to form a complete ACES configuration file that captures everything we need for our cohort and task:
predicates:
icu_admission:
code: { regex: "^ICU_ADMISSION//.*" }
icu_discharge:
code: { regex: "^ICU_DISCHARGE//.*" }
death:
code: { regex: "MEDS_DEATH.*" }
discharge_or_death:
expr: or(icu_discharge, death)
trigger: icu_admission
windows:
input:
start: null
end: trigger + 24h
start_inclusive: True
end_inclusive: True
index_timestamp: end
gap:
start: trigger
end: start + 48h
start_inclusive: False
end_inclusive: True
has:
icu_admission: (None, 0)
discharge_or_death: (None, 0)
target:
start: gap.end
end: start -> discharge_or_death
start_inclusive: False
end_inclusive: True
label: death
End-to-End Extraction using the ACES CLI
With the configuration file ready, extracting the cohort from our demo dataset is extremely straightforward. All we need to do is run a simple command-line tool.
[3]:
in_icu = """
predicates:
icu_admission:
code: { regex: "^ICU_ADMISSION//.*" }
icu_discharge:
code: { regex: "^ICU_DISCHARGE//.*" }
death:
code: { regex: "MEDS_DEATH.*" }
discharge_or_death:
expr: or(icu_discharge, death)
trigger: icu_admission
windows:
input:
start: null
end: trigger + 24h
start_inclusive: True
end_inclusive: True
index_timestamp: end
gap:
start: trigger
end: start + 48h
start_inclusive: False
end_inclusive: True
has:
icu_admission: (None, 0)
discharge_or_death: (None, 0)
target:
start: gap.end
end: start -> discharge_or_death
start_inclusive: False
end_inclusive: True
label: death
"""
Let's save the final configuration file in a YAML file in our demo directory:
[4]:
COHORT_NAME = "in_icu"
COHORT_DIR = DEMO_DIR / "cohorts"
COHORT_DIR.mkdir(parents=True, exist_ok=True)
with open(COHORT_DIR / f"{COHORT_NAME}.yaml", "w") as f:
f.write(in_icu)
We can now set some variables for CLI parameters. For more information on CLI arguments, please see the documentation, including instructions for using expand_shards
for simultaneous extraction of cohorts over multiple MEDS shards.
[5]:
DATA_STANDARD = "meds"
DATA_ROOT = OUTPUT_DIR / "MEDS_cohort/data/"
DATA_SHARD = "$(expand_shards train/1 tuning/1 held_out/1)"
[6]:
!aces-cli \
cohort_name=$COHORT_NAME \
cohort_dir=$COHORT_DIR \
data=sharded \
data.standard=$DATA_STANDARD \
data.root=$DATA_ROOT \
data.shard=$DATA_SHARD -m
[2025-03-25 02:15:42,893][HYDRA] Launching 3 jobs locally [2025-03-25 02:15:42,893][HYDRA] #0 : cohort_name=in_icu cohort_dir=demo_output/cohorts data=sharded data.standard=meds data.root=demo_output/meds/MEDS_cohort/data data.shard=train/0 [32m2025-03-25 02:15:43.313[0m | [1mINFO [0m | [36maces.__main__[0m:[36mmain[0m:[36m149[0m - [1mLoading config from 'demo_output/cohorts/in_icu.yaml'[0m [32m2025-03-25 02:15:43.318[0m | [1mINFO [0m | [36maces.config[0m:[36mload[0m:[36m1341[0m - [1mParsing windows...[0m [32m2025-03-25 02:15:43.318[0m | [1mINFO [0m | [36maces.config[0m:[36mload[0m:[36m1350[0m - [1mParsing trigger event...[0m [32m2025-03-25 02:15:43.318[0m | [1mINFO [0m | [36maces.config[0m:[36mload[0m:[36m1392[0m - [1mParsing predicates...[0m [32m2025-03-25 02:15:43.323[0m | [1mINFO [0m | [36maces.__main__[0m:[36mmain[0m:[36m159[0m - [1mAttempting to get predicates dataframe given: standard: meds ts_format: '%m/%d/%Y %H:%M' root: demo_output/meds/MEDS_cohort/data shard: train/0 path: ${data.root}/${data.shard}.parquet _prefix: /${data.shard} [0m [32m2025-03-25 02:15:43.324[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m269[0m - [1mLoading MEDS data...[0m [32m2025-03-25 02:15:43.367[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m273[0m - [1mGenerating plain predicate columns...[0m [32m2025-03-25 02:15:43.395[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m277[0m - [1mAdded predicate column 'icu_admission'.[0m [32m2025-03-25 02:15:43.424[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m277[0m - [1mAdded predicate column 'icu_discharge'.[0m [32m2025-03-25 02:15:43.464[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m277[0m - [1mAdded predicate column 'death'.[0m [32m2025-03-25 02:15:43.464[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m280[0m - [1mCleaning up predicates dataframe...[0m [32m2025-03-25 02:15:43.505[0m | [1mINFO [0m | [36maces.predicates[0m:[36mget_predicates_df[0m:[36m703[0m - [1mLoaded plain predicates. Generating derived predicate columns...[0m [32m2025-03-25 02:15:43.506[0m | [1mINFO [0m | [36maces.predicates[0m:[36mget_predicates_df[0m:[36m717[0m - [1mAdded predicate column 'discharge_or_death'.[0m [32m2025-03-25 02:15:43.506[0m | [1mINFO [0m | [36maces.predicates[0m:[36mget_predicates_df[0m:[36m724[0m - [1mGenerating special predicate columns...[0m [32m2025-03-25 02:15:43.506[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m76[0m - [1mChecking if '(subject_id, timestamp)' columns are unique...[0m [32m2025-03-25 02:15:43.509[0m | [1mINFO [0m | [36maces.utils[0m:[36mlog_tree[0m:[36m67[0m - [1m trigger ┣━━ input.end ┃ ┗━━ input.start ┗━━ gap.end ┗━━ target.end [0m [32m2025-03-25 02:15:43.510[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m85[0m - [1mBeginning query...[0m [32m2025-03-25 02:15:43.510[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m92[0m - [1mNo static variable criteria specified, removing all rows with null timestamps...[0m [32m2025-03-25 02:15:43.511[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m99[0m - [1mIdentifying possible trigger nodes based on the specified trigger event...[0m [32m2025-03-25 02:15:43.511[0m | [1mINFO [0m | [36maces.constraints[0m:[36mcheck_constraints[0m:[36m110[0m - [1mExcluding 72,774 rows as they failed to satisfy '1 <= icu_admission <= None'.[0m [32m2025-03-25 02:15:43.512[0m | [1mINFO [0m | [36maces.extract_subtree[0m:[36mextract_subtree[0m:[36m252[0m - [1mSummarizing subtree rooted at 'input.end'...[0m [32m2025-03-25 02:15:43.596[0m | [1mINFO [0m | [36maces.extract_subtree[0m:[36mextract_subtree[0m:[36m252[0m - [1mSummarizing subtree rooted at 'input.start'...[0m [32m2025-03-25 02:15:43.707[0m | [1mINFO [0m | [36maces.extract_subtree[0m:[36mextract_subtree[0m:[36m252[0m - [1mSummarizing subtree rooted at 'gap.end'...[0m [32m2025-03-25 02:15:43.770[0m | [1mINFO [0m | [36maces.constraints[0m:[36mcheck_constraints[0m:[36m110[0m - [1mExcluding 2 rows as they failed to satisfy 'None <= icu_admission <= 0'.[0m [32m2025-03-25 02:15:43.771[0m | [1mINFO [0m | [36maces.constraints[0m:[36mcheck_constraints[0m:[36m110[0m - [1mExcluding 53 rows as they failed to satisfy 'None <= discharge_or_death <= 0'.[0m [32m2025-03-25 02:15:43.772[0m | [1mINFO [0m | [36maces.extract_subtree[0m:[36mextract_subtree[0m:[36m252[0m - [1mSummarizing subtree rooted at 'target.end'...[0m [32m2025-03-25 02:15:43.858[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m114[0m - [1mDone. 60 valid rows returned corresponding to 47 subjects.[0m [32m2025-03-25 02:15:43.858[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m129[0m - [1mExtracting label 'death' from window 'target'...[0m [32m2025-03-25 02:15:43.858[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m150[0m - [1mSetting index timestamp as 'end' of window 'input'...[0m [32m2025-03-25 02:15:43.862[0m | [33m[1mWARNING [0m | [36maces.__main__[0m:[36mget_and_validate_label_schema[0m:[36m114[0m - [33m[1mOutput contains columns that are not valid MEDS label columns. For now, we are dropping them. If you need these columns, please comment on https://github.com/justin13601/ACES/issues/97 Columns: - trigger - input.end_summary - input.start_summary - gap.end_summary - target.end_summary[0m [32m2025-03-25 02:15:43.870[0m | [1mINFO [0m | [36maces.__main__[0m:[36mmain[0m:[36m191[0m - [1mCompleted in 0:00:00.555944. Results saved to 'demo_output/cohorts/in_icu/train/0.parquet'.[0m [2025-03-25 02:15:43,871][HYDRA] #1 : cohort_name=in_icu cohort_dir=demo_output/cohorts data=sharded data.standard=meds data.root=demo_output/meds/MEDS_cohort/data data.shard=tuning/0 [32m2025-03-25 02:15:44.054[0m | [1mINFO [0m | [36maces.__main__[0m:[36mmain[0m:[36m149[0m - [1mLoading config from 'demo_output/cohorts/in_icu.yaml'[0m [32m2025-03-25 02:15:44.058[0m | [1mINFO [0m | [36maces.config[0m:[36mload[0m:[36m1341[0m - [1mParsing windows...[0m [32m2025-03-25 02:15:44.059[0m | [1mINFO [0m | [36maces.config[0m:[36mload[0m:[36m1350[0m - [1mParsing trigger event...[0m [32m2025-03-25 02:15:44.059[0m | [1mINFO [0m | [36maces.config[0m:[36mload[0m:[36m1392[0m - [1mParsing predicates...[0m [32m2025-03-25 02:15:44.060[0m | [1mINFO [0m | [36maces.__main__[0m:[36mmain[0m:[36m159[0m - [1mAttempting to get predicates dataframe given: standard: meds ts_format: '%m/%d/%Y %H:%M' root: demo_output/meds/MEDS_cohort/data shard: tuning/0 path: ${data.root}/${data.shard}.parquet _prefix: /${data.shard} [0m [32m2025-03-25 02:15:44.060[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m269[0m - [1mLoading MEDS data...[0m [32m2025-03-25 02:15:44.064[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m273[0m - [1mGenerating plain predicate columns...[0m [32m2025-03-25 02:15:44.067[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m277[0m - [1mAdded predicate column 'icu_admission'.[0m [32m2025-03-25 02:15:44.070[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m277[0m - [1mAdded predicate column 'icu_discharge'.[0m [32m2025-03-25 02:15:44.074[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m277[0m - [1mAdded predicate column 'death'.[0m [32m2025-03-25 02:15:44.074[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m280[0m - [1mCleaning up predicates dataframe...[0m [32m2025-03-25 02:15:44.080[0m | [1mINFO [0m | [36maces.predicates[0m:[36mget_predicates_df[0m:[36m703[0m - [1mLoaded plain predicates. Generating derived predicate columns...[0m [32m2025-03-25 02:15:44.081[0m | [1mINFO [0m | [36maces.predicates[0m:[36mget_predicates_df[0m:[36m717[0m - [1mAdded predicate column 'discharge_or_death'.[0m [32m2025-03-25 02:15:44.081[0m | [1mINFO [0m | [36maces.predicates[0m:[36mget_predicates_df[0m:[36m724[0m - [1mGenerating special predicate columns...[0m [32m2025-03-25 02:15:44.081[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m76[0m - [1mChecking if '(subject_id, timestamp)' columns are unique...[0m [32m2025-03-25 02:15:44.082[0m | [1mINFO [0m | [36maces.utils[0m:[36mlog_tree[0m:[36m67[0m - [1m trigger ┣━━ input.end ┃ ┗━━ input.start ┗━━ gap.end ┗━━ target.end [0m [32m2025-03-25 02:15:44.082[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m85[0m - [1mBeginning query...[0m [32m2025-03-25 02:15:44.083[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m92[0m - [1mNo static variable criteria specified, removing all rows with null timestamps...[0m [32m2025-03-25 02:15:44.083[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m99[0m - [1mIdentifying possible trigger nodes based on the specified trigger event...[0m [32m2025-03-25 02:15:44.083[0m | [1mINFO [0m | [36maces.constraints[0m:[36mcheck_constraints[0m:[36m110[0m - [1mExcluding 6,242 rows as they failed to satisfy '1 <= icu_admission <= None'.[0m [32m2025-03-25 02:15:44.084[0m | [1mINFO [0m | [36maces.extract_subtree[0m:[36mextract_subtree[0m:[36m252[0m - [1mSummarizing subtree rooted at 'input.end'...[0m [32m2025-03-25 02:15:44.097[0m | [1mINFO [0m | [36maces.extract_subtree[0m:[36mextract_subtree[0m:[36m252[0m - [1mSummarizing subtree rooted at 'input.start'...[0m [32m2025-03-25 02:15:44.123[0m | [1mINFO [0m | [36maces.extract_subtree[0m:[36mextract_subtree[0m:[36m252[0m - [1mSummarizing subtree rooted at 'gap.end'...[0m [32m2025-03-25 02:15:44.136[0m | [1mINFO [0m | [36maces.constraints[0m:[36mcheck_constraints[0m:[36m110[0m - [1mExcluding 0 rows as they failed to satisfy 'None <= icu_admission <= 0'.[0m [32m2025-03-25 02:15:44.136[0m | [1mINFO [0m | [36maces.constraints[0m:[36mcheck_constraints[0m:[36m110[0m - [1mExcluding 7 rows as they failed to satisfy 'None <= discharge_or_death <= 0'.[0m [32m2025-03-25 02:15:44.137[0m | [1mINFO [0m | [36maces.extract_subtree[0m:[36mextract_subtree[0m:[36m252[0m - [1mSummarizing subtree rooted at 'target.end'...[0m [32m2025-03-25 02:15:44.165[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m114[0m - [1mDone. 8 valid rows returned corresponding to 5 subjects.[0m [32m2025-03-25 02:15:44.165[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m129[0m - [1mExtracting label 'death' from window 'target'...[0m [32m2025-03-25 02:15:44.166[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m150[0m - [1mSetting index timestamp as 'end' of window 'input'...[0m [32m2025-03-25 02:15:44.168[0m | [33m[1mWARNING [0m | [36maces.__main__[0m:[36mget_and_validate_label_schema[0m:[36m114[0m - [33m[1mOutput contains columns that are not valid MEDS label columns. For now, we are dropping them. If you need these columns, please comment on https://github.com/justin13601/ACES/issues/97 Columns: - trigger - input.end_summary - input.start_summary - gap.end_summary - target.end_summary[0m [32m2025-03-25 02:15:44.173[0m | [1mINFO [0m | [36maces.__main__[0m:[36mmain[0m:[36m191[0m - [1mCompleted in 0:00:00.117831. Results saved to 'demo_output/cohorts/in_icu/tuning/0.parquet'.[0m [2025-03-25 02:15:44,174][HYDRA] #2 : cohort_name=in_icu cohort_dir=demo_output/cohorts data=sharded data.standard=meds data.root=demo_output/meds/MEDS_cohort/data data.shard=held_out/0 [32m2025-03-25 02:15:44.267[0m | [1mINFO [0m | [36maces.__main__[0m:[36mmain[0m:[36m149[0m - [1mLoading config from 'demo_output/cohorts/in_icu.yaml'[0m [32m2025-03-25 02:15:44.271[0m | [1mINFO [0m | [36maces.config[0m:[36mload[0m:[36m1341[0m - [1mParsing windows...[0m [32m2025-03-25 02:15:44.271[0m | [1mINFO [0m | [36maces.config[0m:[36mload[0m:[36m1350[0m - [1mParsing trigger event...[0m [32m2025-03-25 02:15:44.271[0m | [1mINFO [0m | [36maces.config[0m:[36mload[0m:[36m1392[0m - [1mParsing predicates...[0m [32m2025-03-25 02:15:44.272[0m | [1mINFO [0m | [36maces.__main__[0m:[36mmain[0m:[36m159[0m - [1mAttempting to get predicates dataframe given: standard: meds ts_format: '%m/%d/%Y %H:%M' root: demo_output/meds/MEDS_cohort/data shard: held_out/0 path: ${data.root}/${data.shard}.parquet _prefix: /${data.shard} [0m [32m2025-03-25 02:15:44.273[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m269[0m - [1mLoading MEDS data...[0m [32m2025-03-25 02:15:44.275[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m273[0m - [1mGenerating plain predicate columns...[0m [32m2025-03-25 02:15:44.277[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m277[0m - [1mAdded predicate column 'icu_admission'.[0m [32m2025-03-25 02:15:44.279[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m277[0m - [1mAdded predicate column 'icu_discharge'.[0m [32m2025-03-25 02:15:44.281[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m277[0m - [1mAdded predicate column 'death'.[0m [32m2025-03-25 02:15:44.282[0m | [1mINFO [0m | [36maces.predicates[0m:[36mgenerate_plain_predicates_from_meds[0m:[36m280[0m - [1mCleaning up predicates dataframe...[0m [32m2025-03-25 02:15:44.285[0m | [1mINFO [0m | [36maces.predicates[0m:[36mget_predicates_df[0m:[36m703[0m - [1mLoaded plain predicates. Generating derived predicate columns...[0m [32m2025-03-25 02:15:44.286[0m | [1mINFO [0m | [36maces.predicates[0m:[36mget_predicates_df[0m:[36m717[0m - [1mAdded predicate column 'discharge_or_death'.[0m [32m2025-03-25 02:15:44.286[0m | [1mINFO [0m | [36maces.predicates[0m:[36mget_predicates_df[0m:[36m724[0m - [1mGenerating special predicate columns...[0m [32m2025-03-25 02:15:44.286[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m76[0m - [1mChecking if '(subject_id, timestamp)' columns are unique...[0m [32m2025-03-25 02:15:44.287[0m | [1mINFO [0m | [36maces.utils[0m:[36mlog_tree[0m:[36m67[0m - [1m trigger ┣━━ input.end ┃ ┗━━ input.start ┗━━ gap.end ┗━━ target.end [0m [32m2025-03-25 02:15:44.287[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m85[0m - [1mBeginning query...[0m [32m2025-03-25 02:15:44.288[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m92[0m - [1mNo static variable criteria specified, removing all rows with null timestamps...[0m [32m2025-03-25 02:15:44.288[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m99[0m - [1mIdentifying possible trigger nodes based on the specified trigger event...[0m [32m2025-03-25 02:15:44.288[0m | [1mINFO [0m | [36maces.constraints[0m:[36mcheck_constraints[0m:[36m110[0m - [1mExcluding 4,163 rows as they failed to satisfy '1 <= icu_admission <= None'.[0m [32m2025-03-25 02:15:44.289[0m | [1mINFO [0m | [36maces.extract_subtree[0m:[36mextract_subtree[0m:[36m252[0m - [1mSummarizing subtree rooted at 'input.end'...[0m [32m2025-03-25 02:15:44.298[0m | [1mINFO [0m | [36maces.extract_subtree[0m:[36mextract_subtree[0m:[36m252[0m - [1mSummarizing subtree rooted at 'input.start'...[0m [32m2025-03-25 02:15:44.322[0m | [1mINFO [0m | [36maces.extract_subtree[0m:[36mextract_subtree[0m:[36m252[0m - [1mSummarizing subtree rooted at 'gap.end'...[0m [32m2025-03-25 02:15:44.333[0m | [1mINFO [0m | [36maces.constraints[0m:[36mcheck_constraints[0m:[36m110[0m - [1mExcluding 0 rows as they failed to satisfy 'None <= icu_admission <= 0'.[0m [32m2025-03-25 02:15:44.333[0m | [1mINFO [0m | [36maces.constraints[0m:[36mcheck_constraints[0m:[36m110[0m - [1mExcluding 6 rows as they failed to satisfy 'None <= discharge_or_death <= 0'.[0m [32m2025-03-25 02:15:44.334[0m | [1mINFO [0m | [36maces.extract_subtree[0m:[36mextract_subtree[0m:[36m252[0m - [1mSummarizing subtree rooted at 'target.end'...[0m [32m2025-03-25 02:15:44.359[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m114[0m - [1mDone. 6 valid rows returned corresponding to 4 subjects.[0m [32m2025-03-25 02:15:44.359[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m129[0m - [1mExtracting label 'death' from window 'target'...[0m [32m2025-03-25 02:15:44.360[0m | [33m[1mWARNING [0m | [36maces.query[0m:[36mquery[0m:[36m142[0m - [33m[1mAll labels in the extracted cohort are the same: '0'. This may indicate an issue with the task logic. Please double-check your configuration file if this is not expected.[0m [32m2025-03-25 02:15:44.360[0m | [1mINFO [0m | [36maces.query[0m:[36mquery[0m:[36m150[0m - [1mSetting index timestamp as 'end' of window 'input'...[0m [32m2025-03-25 02:15:44.363[0m | [33m[1mWARNING [0m | [36maces.__main__[0m:[36mget_and_validate_label_schema[0m:[36m114[0m - [33m[1mOutput contains columns that are not valid MEDS label columns. For now, we are dropping them. If you need these columns, please comment on https://github.com/justin13601/ACES/issues/97 Columns: - trigger - input.end_summary - input.start_summary - gap.end_summary - target.end_summary[0m [32m2025-03-25 02:15:44.367[0m | [1mINFO [0m | [36maces.__main__[0m:[36mmain[0m:[36m191[0m - [1mCompleted in 0:00:00.099429. Results saved to 'demo_output/cohorts/in_icu/held_out/0.parquet'.[0m
Inspecting Results
The CLI would output a parquet file for every shard in each corresponding data split. Let's examine the output for the first shard of the train split (this demo dataset only has one shard in the train split):
[7]:
import pandas as pd
results = pd.read_parquet(COHORT_DIR / COHORT_NAME / "train" / "0.parquet")
Here is the label distribution of the patients that met our cohort criteria:
[8]:
results['boolean_value'].value_counts()
[8]:
boolean_value False 52 True 8 Name: count, dtype: int64
Recall that our label
predicate was set to death
in our configuration file. Thus, we can interpret this as 8 patients who died, and 52 patients who were discharged from the ICU. The full results provide the exact subject_id
for all patients that meet our cohort criteria, and prediction_time
corresponds to the index_timestamp
defined in our configuration file.
[9]:
results
[9]:
subject_id | prediction_time | boolean_value | integer_value | float_value | categorical_value | |
---|---|---|---|---|---|---|
0 | 10002428 | 2156-04-13 16:24:18 | False | NaN | NaN | None |
1 | 10002428 | 2156-04-20 18:11:19 | False | NaN | NaN | None |
2 | 10002428 | 2156-05-01 21:53:00 | False | NaN | NaN | None |
3 | 10002428 | 2156-05-12 14:49:34 | False | NaN | NaN | None |
4 | 10002495 | 2141-05-23 20:18:01 | False | NaN | NaN | None |
5 | 10003400 | 2137-02-26 23:37:19 | False | NaN | NaN | None |
6 | 10003400 | 2137-08-11 19:54:51 | False | NaN | NaN | None |
7 | 10003400 | 2137-08-18 17:36:37 | True | NaN | NaN | None |
8 | 10004235 | 2196-02-25 17:07:00 | False | NaN | NaN | None |
9 | 10004422 | 2111-01-18 09:44:50 | False | NaN | NaN | None |
10 | 10004720 | 2186-11-13 19:55:00 | True | NaN | NaN | None |
11 | 10004733 | 2174-12-05 11:28:24 | False | NaN | NaN | None |
12 | 10005817 | 2132-12-16 09:29:01 | False | NaN | NaN | None |
13 | 10005817 | 2135-01-04 21:55:32 | True | NaN | NaN | None |
14 | 10005866 | 2149-10-03 12:48:08 | False | NaN | NaN | None |
15 | 10005909 | 2144-10-30 23:09:03 | False | NaN | NaN | None |
16 | 10007058 | 2167-11-08 20:22:00 | False | NaN | NaN | None |
17 | 10007818 | 2146-06-23 11:46:29 | True | NaN | NaN | None |
18 | 10007928 | 2129-04-07 00:25:00 | False | NaN | NaN | None |
19 | 10008454 | 2110-12-01 17:11:36 | False | NaN | NaN | None |
20 | 10009628 | 2153-09-20 09:54:49 | False | NaN | NaN | None |
21 | 10010471 | 2155-12-03 20:33:00 | True | NaN | NaN | None |
22 | 10010867 | 2147-12-31 09:33:00 | False | NaN | NaN | None |
23 | 10012552 | 2140-03-26 14:37:26 | False | NaN | NaN | None |
24 | 10014078 | 2166-08-23 00:36:00 | False | NaN | NaN | None |
25 | 10014354 | 2148-07-08 15:48:09 | False | NaN | NaN | None |
26 | 10015272 | 2137-06-13 18:37:22 | False | NaN | NaN | None |
27 | 10015931 | 2177-03-25 21:48:07 | True | NaN | NaN | None |
28 | 10018081 | 2133-12-19 17:10:00 | False | NaN | NaN | None |
29 | 10018081 | 2134-08-06 14:53:33 | False | NaN | NaN | None |
30 | 10018328 | 2154-04-25 23:03:44 | False | NaN | NaN | None |
31 | 10019003 | 2153-03-29 02:21:00 | False | NaN | NaN | None |
32 | 10019003 | 2153-04-14 19:45:30 | False | NaN | NaN | None |
33 | 10019003 | 2155-07-11 17:48:57 | False | NaN | NaN | None |
34 | 10020187 | 2169-01-16 04:56:00 | False | NaN | NaN | None |
35 | 10020306 | 2135-01-22 17:01:57 | False | NaN | NaN | None |
36 | 10020640 | 2153-02-14 01:38:00 | False | NaN | NaN | None |
37 | 10020944 | 2131-02-28 16:40:00 | False | NaN | NaN | None |
38 | 10021487 | 2116-12-04 01:02:00 | False | NaN | NaN | None |
39 | 10022017 | 2189-09-11 10:05:24 | False | NaN | NaN | None |
40 | 10023117 | 2171-11-15 10:06:41 | False | NaN | NaN | None |
41 | 10023117 | 2175-03-22 03:20:53 | False | NaN | NaN | None |
42 | 10023117 | 2175-07-07 17:41:00 | False | NaN | NaN | None |
43 | 10023239 | 2137-06-20 19:09:00 | False | NaN | NaN | None |
44 | 10023239 | 2140-10-04 09:07:56 | False | NaN | NaN | None |
45 | 10024043 | 2117-04-12 22:05:00 | False | NaN | NaN | None |
46 | 10025612 | 2125-09-26 13:23:24 | False | NaN | NaN | None |
47 | 10027445 | 2142-08-01 01:41:00 | False | NaN | NaN | None |
48 | 10027602 | 2201-10-31 12:25:00 | False | NaN | NaN | None |
49 | 10029291 | 2123-02-21 04:13:00 | False | NaN | NaN | None |
50 | 10029291 | 2123-02-27 12:12:32 | False | NaN | NaN | None |
51 | 10031757 | 2137-10-16 17:29:21 | False | NaN | NaN | None |
52 | 10032725 | 2143-03-23 06:42:00 | False | NaN | NaN | None |
53 | 10035631 | 2116-02-29 18:43:20 | False | NaN | NaN | None |
54 | 10037861 | 2117-03-15 16:34:58 | True | NaN | NaN | None |
55 | 10037975 | 2185-01-18 19:12:12 | True | NaN | NaN | None |
56 | 10038933 | 2148-09-11 13:19:00 | False | NaN | NaN | None |
57 | 10038999 | 2131-05-23 21:50:33 | False | NaN | NaN | None |
58 | 10039708 | 2140-01-24 18:08:00 | False | NaN | NaN | None |
59 | 10040025 | 2148-01-25 04:50:17 | False | NaN | NaN | None |