.. corr_vars documentation master file, created by sphinx-quickstart on Fri Nov 1 14:22:49 2024. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. CORR-Variables ============== The CORR-Variables package is a Python package for extracting and analyzing data from the Charité Outcomes Research Repository (CORR). It functions as a connector on top of the Hadoop-based Health Data Lake (HDL). It preprocesses the data into clinically meaningful and quality-checked variables to streamline research with real-world data at our Institution. Installation ------------ The CORR-Vars package is pre-installed and regularly updated on the IMI server (`s-c01-imi-app01.charite.de`). Make sure to use the correct Python environment. .. code-block:: bash conda activate /data02/projects/icurepo/.pkg/env10 To install the package on your local machine, run the following command. CAVE: This only works if you have access to the private GitHub repository. .. code-block:: bash pip install git+https://github.com/thielem/corr-vars.git API Reference -------------- .. toctree:: :maxdepth: 2 api/cohort api/variables api/utils Quick Start ----------- .. code-block:: python # Will import all public classes (Cohort, Variable, etc.) from corr_vars import * # Initialize a cohort cohort = Cohort(obs_level="hospital_stay") cohort.load_default_vars() # View the first 5 rows print(cohort.obs.head()) Core Components ---------------- Cohort ^^^^^^ The main class for handling patient cohorts. Supports different observation levels: .. list-table:: Observation Levels :header-rows: 1 * - Observation Level - Primary Key - tmin - tmax * - hospital_stay - case_id - hospital_admission - hospital_discharge * - icu_stay - icu_stay_id - icu_admission - icu_discharge * - procedure - procedure_id - op_start_dtime_any - op_end_dtime_any .. image:: _static/cv_obs_levels.png :width: 30% :align: center .. code-block:: python cohort = Cohort(obs_level="hospital_stay") # Save cohort to file cohort.save("my_cohort.corr") # Load cohort from file cohort = Cohort.load("my_cohort.corr") # Export to CSV cohort.to_csv("output_folder") Variables ^^^^^^^^^ Different types of variables are supported: - NativeDynamic: Time-series variables extracted from the database - NativeStatic: Static variables from the database or simple aggregations based on NativeDynamic variables - DerivedStatic: Computed static variables - DerivedDynamic: Computed time-series variables - Complex: Custom variables. Can be anything defined by the Python function provided by the user. .. image:: _static/cv_var_hierarchy.png :width: 30% :align: center To view all available variables, we recommend using the `Graphical Variable Explorer `_. .. code-block:: python # Initialize cohort cohort = Cohort(obs_level="icu_stay") # Add static variables # These are added to cohort.obs DataFrame cohort.add_variable('any_proning_icu') >>> cohort.obs.head() icu_stay_id any_proning_icu ... 0 12345 True ... 1 12346 False ... 2 12347 True ... ... # Add dynamic (time-series) variables # These are added to cohort.obsm dictionary cohort.add_variable('blood_sodium') >>> cohort.obsm.keys() ['blood_sodium'] # Access time-series data >>> cohort.obsm['blood_sodium'].head() icu_stay_id recordtime value 0 12345 2024-01-01 08:00 140 1 12345 2024-01-01 12:00 138 2 12345 2024-01-01 16:00 142 ... Development ----------- The source code is available on GitHub: https://github.com/thielem/corr-vars Authors ------- * Moritz Thiele * Noel Kronenberg * Dario von Wedel Version ------- Current version: 0.2.0