EIR Tutorial: Genetic and Multimodal Survival Prediction
This tutorial demonstrates how to use EIR for survival prediction using genetic data, both alone and in combination with clinical variables. We’ll cover two scenarios:
Training a survival model using only genetic data
Training a multimodal model using both genetic and clinical data
For a more detailed introduction to EIR’s genetic prediction capabilities, see the Genotype Tutorial. For more details about survival analysis in EIR, see the Survival Analysis Tutorial.
Project Setup
First, create a directory structure for your project:
project_directory/
├── conf/
│ ├── global_config.yaml
│ ├── input_genotype_config.yaml
│ ├── input_tabular_config.yaml
│ ├── fusion_config.yaml
│ └── output_config.yaml
└── data/
├── arrays/ # Your genetic data arrays
├── genotype.bim # SNP information file
└── phenotype.csv # Clinical and survival data
To prepare the array folder, you can use the plink-pipelines software:
plink_pipelines --raw_data_path <folder with plink bed/fam/bim fileset> --output_folder data/arrays
Configuration Files
Below are only shown parts of each configuration file, please refer to the full configuration files in the supplementary data for full configurations.
Global Configuration (global_config.yaml)
basic_experiment:
n_epochs: 5000
output_folder: FILL # Path where you want to save your experiment
batch_size: 64
valid_size: 2000
lr_schedule:
lr_schedule: plateau
lr_plateau_factor: 0.2
lr_plateau_patience: 6
optimization:
optimizer: adabelief
lr: 5.0e-05
wd: 0.0001
[rest of configs follow same pattern…]
Training Models
Genetics-Only Model
To train a model using only genetic data:
eirtrain \
--global_configs conf/global_config.yaml \
--input_configs conf/input_genotype_config.yaml \
--fusion_configs conf/fusion_config.yaml \
--output_configs conf/output_config.yaml
Multimodal Model (Genetics + Clinical)
To train a model using both genetic and clinical data:
eirtrain \
--global_configs conf/global_config.yaml \
--input_configs conf/input_genotype_config.yaml conf/input_tabular_config.yaml \
--fusion_configs conf/fusion_config.yaml \
--output_configs conf/output_config.yaml
Model Evaluation
The training will generate several outputs in your specified output folder:
Training curves showing loss and performance metrics
Survival curves for validation samples
Feature importance analysis for both genetic and clinical variables (if used, see
compute_attributionsinglobal_config.yamlsupplementary file)Saved model checkpoints
To evaluate a trained model on new data:
eirpredict \
--global_configs conf/global_config.yaml \
--input_configs conf/input_genotype_config.yaml \ # Add tabular config for multimodal
--output_configs conf/output_config.yaml \
--model_path path/to/saved/model.pt \
--evaluate \
--output_folder path/to/prediction/output
Note that for evaluation, relevant filepaths in the configurations must be updated to point to the test data / data to predict on.
Notes
Make sure to replace all
FILLplaceholders with appropriate paths and values
For more detailed information about configuration options and advanced features, refer to the EIR documentation.