Global Configurations
Core configuration classes for experiment setup, training, and optimization.
The root object is the eir.setup.schemas.GlobalConfig,
which contains all other configurations split by “theme” / functionality.
While there are a lot of options available, one can start with a minimal configuration
such as the following:
Quick Example
basic_experiment:
output_folder: my/experiment/folder
evaluation_checkpoint:
checkpoint_interval: 200
sample_interval: 200
Below is a detailed overview of all the global configuration options available in EIR.
- class eir.setup.schemas.GlobalConfig(
- basic_experiment: BasicExperimentConfig,
- model: GlobalModelConfig,
- optimization: OptimizationConfig,
- lr_schedule: LRScheduleConfig,
- training_control: TrainingControlConfig,
- evaluation_checkpoint: EvaluationCheckpointConfig,
- attribution_analysis: AttributionAnalysisConfig,
- metrics: SupervisedMetricsConfig,
- visualization_logging: VisualizationLoggingConfig,
- data_preparation: DataPreparationConfig,
- accelerator: AcceleratorConfig,
- latent_sampling: LatentSamplingConfig | None = None,
- adversarial_training: AdversarialTrainingConfig | None = None,
Global configurations that are common / relevant for the whole experiment to run.
Basic Setup
- class eir.setup.schemas.BasicExperimentConfig(
- output_folder: str,
- n_epochs: int = 10,
- batch_size: int = 64,
- valid_size: float | int = 0.1,
- manual_valid_ids_file: str | None = None,
- dataloader_workers: int = 0,
- device: str = 'cpu',
- memory_dataset: bool = False,
- Parameters:
output_folder – What to name the experiment and output folder where results are saved.
n_epochs – Number of epochs for training.
batch_size – Size of batches during training.
valid_size – Size if the validation set, if float then uses a percentage. If int, then raw counts.
manual_valid_ids_file – File with IDs of those samples to manually use as the validation set. Should be one ID per line in the file.
dataloader_workers – Number of workers for multiprocess training and validation data loading.
device – DEPRECATED: Use
AcceleratorConfig.hardwareinstead.memory_dataset – Whether to load all sample into memory during training.
Model and Training
- class eir.setup.schemas.GlobalModelConfig(
- compile_model: bool = False,
- n_iter_before_swa: None | int = None,
- pretrained_checkpoint: None | str = None,
- strict_pretrained_loading: bool = True,
- Parameters:
compile_model – Whether to compile the model before training. This can be useful to speed up training, but may not work for all models.
n_iter_before_swa – Number of iterations to run before activating Stochastic Weight Averaging (SWA).
pretrained_checkpoint – Path to a pretrained checkpoint model file (under saved_models/ in the experiment output folder) to load and use as a starting point for training.
strict_pretrained_loading – Whether to enforce that the loaded pretrained model exactly the same architecture as the current model. If
False, will only load the layers that match between the two models.
- class eir.setup.schemas.OptimizationConfig(
- optimizer: Literal['sgdm', 'adam', 'adamw', 'adahessian', 'adabelief', 'adabeliefw'] = 'adamw',
- lr: float = 0.0003,
- lr_lb: float = 0.0,
- b1: float = 0.9,
- b2: float = 0.999,
- wd: float = 0.0001,
- gradient_clipping: float = 1.0,
- gradient_accumulation_steps: None | int = None,
- gradient_noise: float = 0.0,
- Parameters:
optimizer – What optimizer to use.
lr – Base learning rate for optimizer.
lr_lb – Lower bound for learning rate when using LR scheduling
b1 – Decay of first order momentum of gradient for relevant optimizers.
b2 – Decay of second order momentum of gradient for relevant optimizers.
wd – Weight decay.
gradient_clipping – Max norm used for gradient clipping, with
p=2.gradient_accumulation_steps – Number of steps to use for gradient accumulation.
gradient_noise – Gradient noise to inject during training.
- class eir.setup.schemas.LRScheduleConfig(
- lr_schedule: Literal['cycle', 'plateau', 'same', 'cosine'] = 'plateau',
- lr_plateau_patience: int = 10,
- lr_plateau_factor: float = 0.2,
- warmup_steps: Literal['auto'] | int = 'auto',
- plot_lr_schedule: bool = False,
- Parameters:
lr_schedule – Whether to use “same” “cyclical”, “cosine” or “reduce on plateau” learning rate schedule. The “reduce on plateau” schedule will reduce the learning rate when the validation performance does not improve for a number of steps.
lr_plateau_patience – Number of validation performance steps without improvement over the best performance before reducing LR (only relevant when
lr_scheduleis"plateau".lr_plateau_factor – Factor to reduce LR when running with a plateau schedule.
warmup_steps – How many steps to use in warmup. If not set, will automatically compute the number of steps if using an adaptive optimizer, otherwise use 2000.
plot_lr_schedule – Whether to plot the learning rate schedule expected during training, useful to e.g. visualize the effect of warmup and “cosine” / “cyclical” schedules. Not relevant when using the “plateau” schedule, since the LR is not fixed beforehand.
Training Control
- class eir.setup.schemas.TrainingControlConfig(
- early_stopping_patience: int = 10,
- early_stopping_buffer: None | int = None,
- weighted_sampling_columns: None | Sequence[str] = None,
- mixing_alpha: float = 0.0,
- manifold_mixup_layer_groups: None | dict[str, Sequence[str]] = None,
- Parameters:
early_stopping_patience – Number of validation performance steps without improvement over the best performance before terminating the run.
early_stopping_buffer – Number of iterations to run before activating early stopping checks, useful if networks take a while to ‘kick into gear’.
weighted_sampling_columns – Target column to apply weighted sampling on, relevant for supervised (tabular) targets. Only applies to categorical columns. Passing in
- 'all'(note this is still passed in as an array, just containing a single string) here will use an average of all the categorical target columns to compute the weights.mixing_alpha – Alpha parameter used for mixing (higher means more mixing). See Mixup: Beyond Empirical Risk Minimization for details.
manifold_mixup_layer_groups –
Groups of layer paths for manifold mixup. Each group is a named set of layers. At each training step, one group is randomly selected and mixup is applied to the output of all layers in that group. The lambda value is sampled from Beta(mixing_alpha, mixing_alpha). Layer paths follow PyTorch’s named_modules() convention, e.g.
"input_modules.genotype.encoder.layer_0".Example:
manifold_mixup_layer_groups: early: - "input_modules.genotype.encoder.layer_0" late: - "fusion_modules.default.fc_1"
- class eir.setup.schemas.EvaluationCheckpointConfig(
- sample_interval: int = 200,
- saved_result_detail_level: int = 5,
- checkpoint_interval: None | int = None,
- n_saved_models: int = 1,
- Parameters:
sample_interval – Iteration interval to perform validation and possibly attribution analysis if set.
saved_result_detail_level –
Level of detail to save in the results file. Higher levels will save more information, but will take up more space and might be slow especially when using many tabular targets. The details are as follows (each step skips a level of detail in addition to the previous):
5: The default, save all metrics and plots.4: Skip validation plots and generated samples under the/samplesfolder.3: Skip plots for individual targets.2: Skip individual target plots (e.g. R2 training curve for regression targets).1: Skip individual target metrics (including loss).
checkpoint_interval – Iteration interval to checkpoint (i.e. save) model.
n_saved_models – Number of top N models to saved during training.
Analysis and Logging
- class eir.setup.schemas.AttributionAnalysisConfig(
- compute_attributions: bool = False,
- max_attributions_per_class: None | int = None,
- attributions_every_sample_factor: int = 1,
- attribution_background_samples: int = 256,
- Parameters:
compute_attributions – Whether to compute attributions / feature importance scores (using integrated gradients) assigned by the model with respect to the input features.
max_attributions_per_class – Maximum number of samples per class to gather for attribution / feature importance analysis. Good to use when modelling on imbalanced data.
attributions_every_sample_factor – Controls whether the attributions / feature importance values are computed at every sample interval (set to
1), every other sample interval (set to2), etc. Useful when computing the attributions takes a long time, and we don’t want to do it every time we evaluate.attribution_background_samples – Number of samples to use for the background in attribution / feature importance computations.
- class eir.setup.schemas.SupervisedMetricsConfig(
- cat_metrics: Sequence[Literal['mcc', 'acc', 'roc-auc-macro', 'ap-macro', 'f1-macro', 'precision-macro', 'recall-macro', 'cohen-kappa']] = ('mcc', 'acc', 'roc-auc-macro', 'ap-macro'),
- con_metrics: Sequence[Literal['r2', 'pcc', 'loss', 'rmse', 'mae', 'mape', 'explained-variance']] = ('r2', 'pcc', 'loss'),
- cat_averaging_metrics: Sequence[Literal['mcc', 'acc', 'roc-auc-macro', 'ap-macro', 'f1-macro', 'precision-macro', 'recall-macro', 'cohen-kappa']] | None = None,
- con_averaging_metrics: Sequence[Literal['r2', 'pcc', 'loss', 'rmse', 'mae', 'mape', 'explained-variance']] | None = None,
- Parameters:
cat_metrics – Which metrics to calculate for categorical targets.
con_metrics – Which metrics to calculate for continuous targets.
cat_averaging_metrics – Which metrics to use for averaging categorical targets. If not set, will use the average of MCC, ROC-AUC and AP.
con_averaging_metrics – Which metrics to use for averaging continuous targets. If not set, will use the average of 1.0-LOSS, PCC and R2.
- class eir.setup.schemas.VisualizationLoggingConfig(
- plot_skip_steps: int = 200,
- no_pbar: bool = False,
- log_level: Literal['debug', 'info', 'warning', 'error', 'critical'] = 'info',
- save_model_diagram: bool = False,
- Parameters:
plot_skip_steps – How many iterations to skip in plots. Useful to get a zoomed-in view of the training process after the initial ‘burn-in’ period.
no_pbar – Whether to not use progress bars. Useful when stdout/stderr is written to files.
log_level – Logging level to use. Can be one of
'debug','info','warning','error','critical'.save_model_diagram – Whether to save a diagram of the model architecture to a file in the experiment output folder.
Infrastructure
- class eir.setup.schemas.DataPreparationConfig(
- streaming_setup_samples: int = 10000,
- streaming_batch_size: int | None = None,
- Parameters:
streaming_setup_samples – Number of samples to use during streaming setup (e.g., for training tokenizers). If
None, uses all available samples.streaming_batch_size – Batch size to use during streaming setup. If
None, uses the training batch size.
- class eir.setup.schemas.AcceleratorConfig(
- hardware: str = 'auto',
- precision: Literal['64', '32', '16', 'bf16', '64-true', '32-true', '16-true', 'bf16-true', '16-mixed', 'bf16-mixed', 'transformer-engine', 'transformer-engine-float16'] = '32-true',
- strategy: str = 'auto',
- devices: int | list[int] | str = 'auto',
- num_nodes: int = 1,
Configuration for distributed and accelerated training setup.
- Parameters:
hardware – The hardware accelerator to use. Options include
'cpu','cuda','mps','tpu', etc.precision – Numerical precision for training. Options include
'32-true','16-mixed','bf16-mixed', etc. Using mixed precision can significantly speed up training on supported hardware.strategy –
The parallelization strategy to use. Common options:
'dp': Data Parallelism'ddp': Distributed Data Parallelism'ddp_spawn': Distributed Data Parallelism with spawn'deepspeed': DeepSpeed'fsdp': Fully Sharded Data Parallelism'auto': Automatically choose the best strategy based on the hardware available.
devices – Number of devices to use or specific device indices. Can be an integer (e.g., 4 for 4 GPUs) or a list of indices (e.g.,
[0, 1]).num_nodes – Number of compute nodes to use for distributed training.