CORR-Variables

The CORR-Variables package is a Python package for extracting and analyzing data from the Charité Outcomes Research Repository (CORR). It functions as a connector on top of the Hadoop-based Health Data Lake (HDL). It preprocesses the data into clinically meaningful and quality-checked variables to streamline research with real-world data at our Institution.

Installation

The CORR-Vars package is pre-installed and regularly updated on the IMI server (s-c01-imi-app01.charite.de).

Make sure to use the correct Python environment.

conda activate /data02/projects/icurepo/.pkg/env10

To install the package on your local machine, run the following command. CAVE: This only works if you have access to the private GitHub repository.

pip install git+https://github.com/thielem/corr-vars.git

API Reference

Quick Start

# Will import all public classes (Cohort, Variable, etc.)
from corr_vars import *

# Initialize a cohort
cohort = Cohort(obs_level="hospital_stay")
cohort.load_default_vars()

# View the first 5 rows
print(cohort.obs.head())

Core Components

Cohort

The main class for handling patient cohorts. Supports different observation levels:

Observation Levels

Observation Level

Primary Key

tmin

tmax

hospital_stay

case_id

hospital_admission

hospital_discharge

icu_stay

icu_stay_id

icu_admission

icu_discharge

procedure

procedure_id

op_start_dtime_any

op_end_dtime_any

_images/cv_obs_levels.png
cohort = Cohort(obs_level="hospital_stay")

# Save cohort to file
cohort.save("my_cohort.corr")

# Load cohort from file
cohort = Cohort.load("my_cohort.corr")

# Export to CSV
cohort.to_csv("output_folder")

Variables

Different types of variables are supported:

  • NativeDynamic: Time-series variables extracted from the database

  • NativeStatic: Static variables from the database or simple aggregations based on NativeDynamic variables

  • DerivedStatic: Computed static variables

  • DerivedDynamic: Computed time-series variables

  • Complex: Custom variables. Can be anything defined by the Python function provided by the user.

_images/cv_var_hierarchy.png

To view all available variables, we recommend using the Graphical Variable Explorer.

# Initialize cohort
cohort = Cohort(obs_level="icu_stay")

# Add static variables
# These are added to cohort.obs DataFrame
cohort.add_variable('any_proning_icu')

>>> cohort.obs.head()
    icu_stay_id  any_proning_icu ...
0   12345        True ...
1   12346        False ...
2   12347        True ...
...

# Add dynamic (time-series) variables
# These are added to cohort.obsm dictionary
cohort.add_variable('blood_sodium')

>>> cohort.obsm.keys()
['blood_sodium']

# Access time-series data
>>> cohort.obsm['blood_sodium'].head()
    icu_stay_id    recordtime        value
0   12345          2024-01-01 08:00  140
1   12345          2024-01-01 12:00  138
2   12345          2024-01-01 16:00  142
...

Development

The source code is available on GitHub: https://github.com/thielem/corr-vars

Authors

  • Moritz Thiele

  • Noel Kronenberg

  • Dario von Wedel

Version

Current version: 0.2.0