Configuration API

Global Configurations 

class eir.setup.schemas.GlobalConfig(output_folder: str, n_epochs: int = 10, batch_size: int = 64, valid_size: float | int = 0.1, manual_valid_ids_file: str | None = None, dataloader_workers: int = 0, device: str = 'cpu', n_iter_before_swa: None | int = None, amp: bool = False, compile_model: bool = False, weighted_sampling_columns: None | Sequence[str] = None, lr: float = 0.001, lr_lb: float = 0.0, find_lr: bool = False, lr_schedule: Literal['cycle', 'plateau', 'same', 'cosine'] = 'plateau', lr_plateau_patience: int = 10, lr_plateau_factor: float = 0.2, gradient_clipping: float = 1.0, gradient_accumulation_steps: None | int = None, gradient_noise: float = 0.0, cat_averaging_metrics: al_cat_averaging_metric_choices | None = None, con_averaging_metrics: al_con_averaging_metric_choices | None = None, early_stopping_patience: int = 10, early_stopping_buffer: None | int = None, warmup_steps: Literal['auto'] | int = 'auto', optimizer: Literal['accsgd'], Literal['adabelief'], Literal['adabeliefw'], Literal['adabound'], Literal['adahessian'], Literal['adam'], Literal['adamod'], Literal['adamp'], Literal['adamw'], Literal['aggmo'], Literal['diffgrad'], Literal['lamb'], Literal['lars'], Literal['lookahead'], Literal['madgrad'], Literal['novograd'], Literal['pid'], Literal['qhadam'], Literal['qhm'], Literal['radam'], Literal['ranger'], Literal['rangerqh'], Literal['rangerva'], Literal['sgdm'], Literal['sgdp'], Literal['sgdw'], Literal['shampoo'], Literal['swats'], Literal['yogi'] = 'adam', b1: float = 0.9, b2: float = 0.999, wd: float = 0.0001, memory_dataset: bool = False, sample_interval: int = 200, save_evaluation_sample_results: bool = True, checkpoint_interval: None | int = None, n_saved_models: int = 1, compute_attributions: bool = False, max_attributions_per_class: None | int = None, attributions_every_sample_factor: int = 1, attribution_background_samples: int = 256, plot_lr_schedule: bool = False, no_pbar: bool = False, log_level: Literal['debug', 'info', 'warning', 'error', 'critical'] = 'info', mixing_alpha: float = 0.0, plot_skip_steps: int = 200, pretrained_checkpoint: None | str = None, strict_pretrained_loading: bool = True, latent_sampling: LatentSamplingConfig | None = None)

Global configurations that are common / relevant for the whole experiment to run.

Parameters:

output_folder – What to name the experiment and output folder where results are saved.
n_epochs – Number of epochs for training.
batch_size – Size of batches during training.
valid_size – Size if the validation set, if float then uses a percentage. If int, then raw counts.
manual_valid_ids_file – File with IDs of those samples to manually use as the validation set. Should be one ID per line in the file.
dataloader_workers – Number of workers for multiprocess training and validation data loading.
device – Device to run the training on (e.g. ‘cuda:0’ / ‘cpu’ / ‘mps’). ‘mps’ is currently experimental, and might not work for all models.
n_iter_before_swa – Number of iterations to run before activating Stochastic Weight Averaging (SWA).
amp – Whether to use Automatic Mixed Precision. Currently only supported when training on GPUs.
compile_model – Whether to compile the model before training. This can be useful to speed up training, but may not work for all models.
weighted_sampling_columns – Target column to apply weighted sampling on. Only applies to categorical columns. Passing in ‘all’ here will use an average of all the target columns.
lr – Base learning rate for optimizer.
lr_lb – Lower bound for learning rate when using LR scheduling
find_lr – Whether to perform a range test of different learning rates, with the lower limit being what is passed in for the –lr flag. Produces a plot and exits with status 0 before training if this flag is active.
lr_schedule – Whether to use cyclical, cosine or reduce on plateau learning rate schedule. Otherwise keeps same learning rate
lr_plateau_patience – Number of validation performance steps without improvement over best performance before reducing LR (only relevant when –lr_schedule is ‘plateau’.
lr_plateau_factor – Factor to reduce LR when running with plateau schedule.
gradient_clipping – Max norm used for gradient clipping, with p=2.
gradient_accumulation_steps – Number of steps to use for gradient accumulation.
gradient_noise – Gradient noise to inject during training.
cat_averaging_metrics – Which metrics to use for averaging categorical targets. If not set, will use the default metrics for the task type.
con_averaging_metrics – Which metrics to use for averaging continuous targets. If not set, will use the default metrics for the task type.
early_stopping_patience – Number of validation performance steps without improvement over best performance before terminating run.
early_stopping_buffer – Number of iterations to run before activating early stopping checks, useful if networks take a while to ‘kick into gear’.
warmup_steps – How many steps to use in warmup. If not set, will automatically compute the number of steps if using an adaptive optimizer, otherwise use 2000.
optimizer – What optimizer to use.
b1 – Decay of first order momentum of gradient for relevant optimizers.
b2 – Decay of second order momentum of gradient for relevant optimizers.
wd – Weight decay.
memory_dataset – Whether to load all sample into memory during training.
sample_interval – Iteration interval to perform validation and possibly attribution analysis if set.
save_evaluation_sample_results – Whether to save evaluation results (e.g. confusion matrix for classification tasks, regression plot and predictions for regression tasks). Setting to False can be useful to save space during large scale experiments.
checkpoint_interval – Iteration interval to checkpoint (i.e. save) model.
n_saved_models – Number of top N models to saved during training.
compute_attributions – Whether to compute attributions / feature importance scores (using integrated gradients) assigned by the model with respect to the input features.
max_attributions_per_class – Maximum number of samples per class to gather for attribution / feature importance analysis. Good to use when modelling on imbalanced data.
attributions_every_sample_factor – Controls whether the attributions / feature importance values are computed at every sample interval (=1), every other sample interval (=2), etc. Useful when computing the attributions takes a long time and we don’t want to do it every time we evaluate.
attribution_background_samples – Number of samples to use for the background in attribution / feature importance computations.
plot_lr_schedule – Whether to run LR search, plot the results and exit with status 0.
no_pbar – Whether to not use progress bars. Useful when stdout/stderr is written to files.
log_level – Logging level to use. Can be one of ‘debug’, ‘info’, ‘warning’, ‘error’, ‘critical’.
mixing_alpha – Alpha parameter used for mixing (higher means more mixing).
plot_skip_steps – How many iterations to skip in plots.
pretrained_checkpoint – Path to a pretrained checkpoint model file (under saved_models/ in the experiment output folder) to load and use as a starting point for training.
strict_pretrained_loading – Whether to enforce that the loaded pretrained model exactly the same architecture as the current model. If False, will only load the layers that match between the two models.
latent_sampling – Configuration to use for latent sampling.

Input Configurations 

Parameters:

input_info – Information about the input source, name and type.
input_type_info – Information specific to the input type, e.g. some augmentations are only relevant for omics input. Another example is the type of model to apply to the input.
model_config – Configuration for the chosen model (i.e. feature extractor) for this input.
pretrained_config – Configuration for using leveraging pretraining from a previous experiment.
interpretation_config – Configuration for interpretation analysis when applicable.

Input Data Configuration 

class eir.setup.schemas.InputDataConfig(input_source: str, input_name: str, input_type: Literal['omics', 'tabular', 'sequence', 'image', 'bytes', 'array'], input_inner_key: None | str = None)

Parameters:

input_source – Where on the filesystem to locate the input.
input_name – Name to identify the input.
input_type – Type of the input.
input_inner_key – Inner key to use for the input. Only used when input_source is a deeplake dataset.

Input Type Configurations 

class eir.setup.schemas.OmicsInputDataConfig(snp_file: str | None = None, subset_snps_file: str | None = None, na_augment_alpha: float = 1.0, na_augment_beta: float = 5.0, shuffle_augment_alpha: float = 0.0, shuffle_augment_beta: float = 0.0, omics_format: Literal['one-hot'] = 'one-hot', mixing_subtype: Literal['mixup', 'cutmix-block', 'cutmix-uniform'] = 'mixup', modality_dropout_rate: float = 0.0)

Parameters:

snp_file – Path to the relevant .bim file, used for attribution analysis.
subset_snps_file – Path to a file with corresponding SNP IDs to subset from the main arrays for the modelling. Requires the snp_file parameter to be passed in.
na_augment_alpha –
Used to control the extent of missing data augmentation in the omics data. A value is sampled from a beta distribution, and the sampled value is used to set a percentage of the SNPs to be ‘missing’.

The alpha (α) parameter of the beta distribution, influencing the shape of the distribution towards 1. Higher values of alpha (compared to beta) bias the distribution to sample larger percentages of SNPs to be set as ‘missing’, leading to a higher likelihood of missingness. Conversely, lower values of alpha (compared to beta) result in sampling lower percentages, thus reducing the probability and extent of missingness. For example, setting alpha to 1.0 and beta to 5.0 will skew the distribution towards lower percentages of missingness, since beta is significantly larger. Setting alpha to 5.0 and beta to 1.0 will skew the distribution towards higher percentages of missingness, since alpha is significantly larger. Examples: - alpha = 1.0, beta = 9.0: μ=E(X)=0.05, σ=SD(X)=0.0476 (avg 5% missing) - alpha = 1.0, beta = 4.0: μ=E(X)=0.2, σ=SD(X)=0.1633 (avg 20% missing)
na_augment_beta –
Used to control the extent of missing data augmentation in the omics data. A value is sampled from a beta distribution, and the sampled value is used to set a percentage of the SNPs to be ‘missing’.

Beta (β) parameter of the beta distribution, influencing the shape of the distribution towards 0. Higher values of beta (compared to alpha) bias the distribution to sample smaller percentages of SNPs to be set as ‘missing’, leading to a lower likelihood and extent of missingness. Conversely, lower values of beta (compared to alpha) result in sampling larger percentages, thus increasing the probability and extent of missingness.
shuffle_augment_alpha –
Used to control the extent of shuffling data augmentation in the omics data. A value is sampled from a beta distribution, and the sampled value is used to determine the percentage of the SNPs to be shuffled.

The alpha (α) parameter of the beta distribution, influencing the shape of the distribution towards 1. Higher values of alpha (compared to beta) bias the distribution to sample larger percentages of SNPs to be shuffled, leading to a higher likelihood of extensive shuffling. Conversely, lower values of alpha (compared to beta) result in sampling lower percentages, thus reducing the extent of shuffling. Setting alpha to a significantly larger value than beta will skew the distribution towards higher percentages of shuffling. Examples: - alpha = 1.0, beta = 9.0: μ=E(X)=0.05, σ=SD(X)=0.0476 (avg 5% shuffled) - alpha = 1.0, beta = 4.0: μ=E(X)=0.2, σ=SD(X)=0.1633 (avg 20% shuffled)
shuffle_augment_beta –
Used to control the extent of shuffling data augmentation in the omics data. A value is sampled from a beta distribution, and the sampled value is used to determine the percentage of the SNPs to be shuffled.

Beta (β) parameter of the beta distribution, influencing the shape of the distribution towards 0. Higher values of beta (compared to alpha) bias the distribution to sample smaller percentages of SNPs to be shuffled, leading to a lower likelihood and extent of shuffling. Conversely, lower values of beta (compared to alpha) result in sampling larger percentages, thus increasing the likelihood and extent of shuffling.
omics_format – Currently unsupported (i.e. does nothing), which format the omics data is in.
mixing_subtype – Which type of mixing to use on the omics data given that mixing_alpha is set >0.0 in the global configuration.
modality_dropout_rate – Dropout rate to apply to the modality, e.g. 0.2 means that 20% of the time, this modality will be dropped out during training.

class eir.setup.schemas.TabularInputDataConfig(input_cat_columns: ~typing.Sequence[str] = <factory>, input_con_columns: ~typing.Sequence[str] = <factory>, label_parsing_chunk_size: None | int = None, mixing_subtype: ~typing.Literal['mixup'] = 'mixup', modality_dropout_rate: float = 0.0)

Parameters:

input_cat_columns – Which columns to use as a categorical inputs from the input_source specified in the input_info field of the relevant .yaml.
input_con_columns – Which columns to use as a continuous inputs from the input_source specified in the input_info field of the relevant .yaml.
label_parsing_chunk_size – Number of rows to process at time when loading in the input_source. Useful when RAM is limited.
mixing_subtype – Which type of mixing to use on the tabular data given that mixing_alpha is set >0.0 in the global configuration.
modality_dropout_rate – Dropout rate to apply to the modality, e.g. 0.2 means that 20% of the time, this modality will be dropped out during training.

class eir.setup.schemas.SequenceInputDataConfig(vocab_file: None | str = None, max_length: int | Literal['max', 'average'] = 'average', sampling_strategy_if_longer: Literal['from_start', 'uniform'] = 'uniform', min_freq: int = 10, split_on: str | None = ' ', tokenizer: Union[Literal['basic_english'], Literal['spacy'], Literal['moses'], Literal['toktok'], Literal['revtok'], Literal['subword'], Literal['bpe'], NoneType] = None, tokenizer_language: str | None = None, adaptive_tokenizer_max_vocab_size: int | None = None, mixing_subtype: Literal['mixup'] = 'mixup', modality_dropout_rate: float = 0.0)

Parameters:

vocab_file – An optional text file containing pre-defined vocabulary to use for the training. If this is not passed in, the framework will automatically build the vocabulary from the training data. Passing in a vocabulary file is therefore useful if (a) you want to manually specify / limit the vocabulary used and/or (b) you want to save time by pre-computing the vocabulary.
max_length – Maximum length to truncate/pad sequences to. This can be an integer or the values ‘max’ or ‘average’. The ‘max’ keyword will use the maximum sequence length found in the training data, while the ‘average’ will use the average length across all training samples.
sampling_strategy_if_longer – Controls how sequences are truncated if they are longer than the specified max_length parameter. Using ‘from_start’ will always truncate from the beginning of the sequence, ensuring the the samples will always be the same during training. Setting this parameter to uniform will uniformly sample a slice of a given sample sequence during training. Note that for consistency, the validation/test set samples always use the from_start setting when truncating.
min_freq – Minimum number of times a token must appear in the total training data to be included in the vocabulary. Note that this setting will not do anything if passing in vocab_file.
split_on – Which token to split the sequence on to generate separate tokens for the vocabulary.
tokenizer – Which tokenizer to use. Relevant if modelling on language, but not as much when doing it on other arbitrary sequences.
tokenizer_language – Which language rules the tokenizer should apply when tokenizing the raw data.
adaptive_tokenizer_max_vocab_size – If using an adaptive tokenizer (“bpe”), this parameter controls the maximum size of the vocabulary.
mixing_subtype – Which type of mixing to use on the sequence data given that mixing_alpha is set >0.0 in the global configuration.
modality_dropout_rate – Dropout rate to apply to the modality, e.g. 0.2 means that 20% of the time, this modality will be dropped out during training.

class eir.setup.schemas.ByteInputDataConfig(max_length: int = 256, byte_encoding: Literal['uint8'] = 'uint8', sampling_strategy_if_longer: Literal['from_start', 'uniform'] = 'uniform', mixing_subtype: Literal['mixup'] = 'mixup', modality_dropout_rate: float = 0.0)

Parameters:

byte_encoding – Which byte encoding to use when reading the binary data, currently only support uint8.
max_length – Maximum length to truncate/pad sequences to. While in sequence models this generally refers to words, here we are referring to number of bytes.
sampling_strategy_if_longer – Controls how sequences are truncated if they are longer than the specified max_length parameter. Using ‘from_start’ will always truncate from the beginning of the byte sequence, ensuring the the samples will always be the same during training. Setting this parameter to uniform will uniformly sample a slice of a given sample sequence during training. Note that for consistency, the validation/test set samples always use the from_start setting when truncating.
mixing_subtype – Which type of mixing to use on the bytes data given that mixing_alpha is set >0.0 in the global configuration.
modality_dropout_rate – Dropout rate to apply to the modality, e.g. 0.2 means that 20% of the time, this modality will be dropped out during training.

class eir.setup.schemas.ImageInputDataConfig(auto_augment: bool = True, size: Sequence[int] = (64,), resize_approach: Literal['resize', 'randomcrop', 'centercrop'] = 'resize', mean_normalization_values: None | Sequence[float] = None, stds_normalization_values: None | Sequence[float] = None, num_channels: int | None = None, mixing_subtype: Literal['mixup'] | Literal['cutmix'] = 'mixup', modality_dropout_rate: float = 0.0)

Parameters:

auto_augment – Setting this to True will use TrivialAugment Wide augmentation.
size – Target size of the images for training. If size is a sequence like (h, w), output size will be matched to this. If size is an int, the image will be resized to (size, size).
resize_approach – The method used for resizing the images. Options are: - “resize”: Directly resize the image to the target size. - “randomcrop”: Resize the image to a larger size than the target and then apply a random crop to the target size. - “centercrop”: Resize the image to a larger size than the target and then apply a center crop to the target size.
mean_normalization_values – Average channel values to normalize images with. This can be a sequence matching the number of channels, or None. If None and using a pretrained model, the values used for the model pretraining will be used. If None and training from scratch, will iterate over training data and compute the running average per channel.
stds_normalization_values – Standard deviation channel values to normalize images with. This can be a sequence mathing the number of channels, or None. If None and using a pretrained model, the values used for the model pretraining will be used. If None and training from scratch, will iterate over training data and compute the running average per channel.
num_channels – Number of channels in the images. If None, will try to infer the number of channels from a random image in the training data.
mixing_subtype – Which type of mixing to use on the image data given that mixing_alpha is set >0.0 in the global configuration.
modality_dropout_rate – Dropout rate to apply to the modality, e.g. 0.2 means that 20% of the time, this modality will be dropped out during training.

class eir.setup.schemas.ArrayInputDataConfig(mixing_subtype: Literal['mixup'] = 'mixup', modality_dropout_rate: float = 0.0, normalization: Literal['element', 'channel'] | None = 'channel', adaptive_normalization_max_samples: int | None = None)

Parameters:

mixing_subtype – Which type of mixing to use on the image data given that mixing_alpha is set >0.0 in the global configuration.
modality_dropout_rate – Dropout rate to apply to the modality, e.g. 0.2 means that 20% of the time, this modality will be dropped out during training.
normalization – Which type of normalization to apply to the array data. If element, will normalize each element in the array independently. If channel, will normalize each channel in the array independently. For ‘channel’, assumes PyTorch format where the channel dimension is the first dimension.
adaptive_normalization_max_samples – If using adaptive normalization (channel / element), how many samples to use to compute the normalization parameters. If None, will use all samples.

Input Model Configurations 

These configurations are used to specify the input feature extractor architecture, as well as paramters that can be common between different feature extractors. For a given feature extractor (specified with the model_type field), there are there are various configurations available through the model_init_config field. The documentation below contains more details about the different configurations available for each feature extractor.

class eir.models.input.omics.omics_models.OmicsModelConfig(model_type: Literal['cnn', 'linear', 'lcl-simple', 'genome-local-net'], model_init_config: CNNModelConfig | LinearModelConfig | SimpleLCLModelConfig | LCLModelConfig | IdentityModelConfig)

Parameters:

model_type – Which type of image model to use.
model_init_config – Configuration used to initialise model.

class eir.models.input.tabular.tabular.TabularModelConfig(model_init_config: SimpleTabularModelConfig, model_type: Literal['tabular'] = 'tabular')

Parameters:

model_type – Which type of image model to use.
model_init_config – Configuration / arguments used to initialise model.

class eir.models.input.sequence.transformer_models.SequenceModelConfig(model_init_config: BasicTransformerFeatureExtractorModelConfig | Dict, model_type: Literal['sequence-default'] | str = 'sequence-default', embedding_dim: int = 64, position: Literal['encode', 'embed'] = 'encode', position_dropout: float = 0.1, window_size: int = 0, pool: Literal['avg'] | Literal['max'] | None = None, pretrained_model: bool = False, freeze_pretrained_model: bool = False)

Parameters:

model_init_config – Configuration / arguments used to initialise model.
model_type – Which type of image model to use.
embedding_dim – Which dimension to use for the embeddings. If None, will automatically set this value based on the number of tokens and attention heads.
position – Whether to encode the token position or use learnable position embeddings.
position_dropout – Dropout for the positional encoding / embedding.
window_size – If set to more than 0, will apply a sliding window of feature extraction over the input, meaning the model (e.g. transformer) will only see a part of the input at a time. Can be Useful to avoid the O(n²) complexity of transformers, as it becomes O(window_size² * n_windows) instead.
pool – Whether and how to pool (max / avg) the final feature maps before being passed to the final fusion module / predictor. Meaning we pool over the sequence (i.e. time) dimension, so the resulting dimensions is embedding_dim instead of sequence_length * embedding_dim. If using windowed / conv transformers, this becomes embedding_dim * number_of_chunks.
pretrained_model – Specify whether the model type is assumed to be pretrained and from the Pytorch Image Models repository.
freeze_pretrained_model – Whether to freeze the pretrained model weights.

See Sequence Models for more details about available external sequence models.

class eir.models.input.image.image_models.ImageModelConfig(model_type: Literal['cnn'] | str, model_init_config: CNNModelConfig | Dict[str, Any], num_output_features: int = 256, pretrained_model: bool = False, freeze_pretrained_model: bool = False)

Parameters:

model_type – Which type of image model to use.
model_init_config – Configuration / arguments used to initialise model.
num_output_features – Number of output final output features from image feature extractor, which get passed to fusion module.
pretrained_model – Specify whether the model type is assumed to be pretrained and from the Pytorch Image Models repository.
freeze_pretrained_model – Whether to freeze the pretrained model weights.

See Image Models for more details about available external image models.

class eir.models.input.array.array_models.ArrayModelConfig(model_type: Literal['cnn', 'lcl'], model_init_config: CNNModelConfig | LCLModelConfig | ArrayTransformerConfig, pre_normalization: Literal['instancenorm', 'layernorm'] | None = None)

Parameters:

model_type – Which type of image model to use.
model_init_config – Configuration used to initialise model.

Interpretation Configurations 

Parameters to have basic control over how interpretation is done. Currently only supported for sequence and image data.

class eir.setup.schemas.BasicInterpretationConfig(interpretation_sampling_strategy: Literal['first_n', 'random_sample'] = 'first_n', num_samples_to_interpret: int = 10, manual_samples_to_interpret: Sequence[str] | None = None)

Parameters:

interpretation_sampling_strategy – How to sample sequences for attribution analysis. first_n always grabs the same first n values from the beginning of the dataset to interpret, while random_sample will sample uniformly from the whole dataset without replacement.
num_samples_to_interpret – How many samples to interpret.
manual_samples_to_interpret – IDs of samples to always interpret, irrespective of interpretation_sampling_strategy and num_samples_to_interpret. A caveat here is that they must be present in the dataset that is being interpreted (e.g. validation / test dataset), meaning that adding IDs here that happen to be in the training dataset will not work.

Feature Extractor Configurations 

The documentation below details what the parameters passed to the respective models (trough the model_init_config field in the --input_configs .yaml files).

Omics Feature Extractors 

class eir.models.input.array.models_cnn.CNNModelConfig(layers: None | List[int] = None, num_output_features: int = 256, channel_exp_base: int = 2, first_channel_expansion: int = 1, kernel_width: int = 12, first_kernel_expansion_width: int = 1, down_stride_width: int = 4, first_stride_expansion_width: int = 1, dilation_factor_width: int = 1, kernel_height: int = 4, first_kernel_expansion_height: int = 1, down_stride_height: int = 1, first_stride_expansion_height: int = 1, dilation_factor_height: int = 1, cutoff: int = 32, rb_do: float = 0.0, stochastic_depth_p: float = 0.0, attention_inclusion_cutoff: int = 0, l1: float = 0.0)

Parameters:

layers –
A list that controls the number of layers and channels in the model. Each element in the list represents a layer group with a specified number of layers and channels. Specifically,
- The first element in the list refers to the number of layers with the number of channels exactly as specified by the channel_exp_base parameter.
- The subsequent elements in the list correspond to an increased number of channels, doubling with each step. For instance, if channel_exp_base=3 (i.e., 2**3=8 channels), and the layers list is [5, 3, 2], the model would be constructed as follows,
  - First case: 5 layers with 8 channels
  - Second case: 3 layers with 16 channels (doubling from the previous case)
  - Third case: 2 layers with 32 channels (doubling from the previous case)
- The model currently supports a maximum of 4 elements in the list.
- If set to None, the model will automatically set up the number of layer groups until a certain width and height (stride * 8 for both) are met. In this automatic setup, channels will be increased as the input gets propagated through the network, while the width/height get reduced due to stride.
Future work includes adding a parameter to control the target width and height.
num_output_features – Output dimension of the last FC layer in the network which accepts the outputs from the convolutional layer.
channel_exp_base – Which power of 2 to use in order to set the number of channels in the network. For example, setting channel_exp_base=3 means that 2**3=8 channels will be used.
first_channel_expansion – Factor to extend the first layer channels.
kernel_width – Base kernel width of the convolutions.
first_kernel_expansion_width – Factor to extend the first kernel’s width.
down_stride_width – Down stride of the convolutional layers along the width.
first_stride_expansion_width – Factor to extend the first layer stride along the width.
dilation_factor_width – Base dilation factor of the convolutions along the width in the network.
kernel_height – Base kernel height of the convolutions.
first_kernel_expansion_height – Factor to extend the first kernel’s height.
down_stride_height – Down stride of the convolutional layers along the height.
first_stride_expansion_height – Factor to extend the first layer stride along the height.
dilation_factor_height – Base dilation factor of the convolutions along the height in the network.
cutoff – If the resulting dimension of width * height of adding a successive block is less than this value, will stop adding residual blocks to the model in the automated case (i.e., if the layers argument is not specified).
rb_do – Dropout in the convolutional residual blocks.
stochastic_depth_p – Probability of dropping input.
attention_inclusion_cutoff – If the dimension of width * height is less than this value, attention will be included in the model across channels and width * height as embedding dimension after that point (with the channels representing the length of the sequence).
l1 – L1 regularization to apply to the first layer.

class eir.models.input.array.models_identity.IdentityModelConfig(flatten: bool = True, flatten_shape: Literal['c', 'fortran'] = 'c')

Parameters:

flatten – Whether to flatten the input.
flatten_shape – What column-row order to flatten the input in.

class eir.models.input.array.models_locally_connected.SimpleLCLModelConfig(fc_repr_dim: int = 12, num_lcl_chunks: int = 64, l1: float = 0.0)

Parameters:

fc_repr_dim – Controls the number of output sets in the first and only split layer. Analogous to channels in CNNs.
num_lcl_chunks – Controls the number of splits applied to the input. E.g. with a input with of 800, using num_lcl_chunks=100 will result in a kernel width of 8, meaning 8 elements in the flattened input. If using a SNP inputs with a one-hot encoding of 4 possible values, this will result in 8/2 = 2 SNPs per locally connected area.
l1 – L1 regularization applied to the first and only locally connected layer.

class eir.models.input.array.models_locally_connected.LCLModelConfig(patch_size: tuple[int, int, int] | None = None, layers: None | List[int] = None, kernel_width: int | Literal['patch'] = 16, first_kernel_expansion: int = -2, channel_exp_base: int = 2, first_channel_expansion: int = 1, num_lcl_chunks: None | int = None, rb_do: float = 0.1, stochastic_depth_p: float = 0.0, l1: float = 0.0, cutoff: int | Literal['auto'] = 1024, direction: Literal['down', 'up'] = 'down', attention_inclusion_cutoff: int | None = None)

Note that when using the automatic network setup, kernel widths will get expanded to ensure that the feature representations become smaller as they are propagated through the network.

Parameters:

patch_size – Controls the size of the patches used in the first layer. If set to None, the input is flattened according to the torch flatten function. Note that when using this parameter, we generally want the kernel width to be set to the multiplication of the patch size. Order follows PyTorch convention, i.e., [channels, height, width].
layers – Controls the number of layers in the model. If set to None, the model will automatically set up the number of layers according to the cutoff parameter value.
kernel_width – With of the locally connected kernels. Note that in the context of genomic inputs this refers to the flattened input, meaning that if we have a one-hot encoding of 4 values (e.g. SNPs), 12 refers to 12/4 = 3 SNPs per locally connected window. Can be set to None if the num_lcl_chunks parameter is set, which means that the kernel width will be set automatically according to
first_kernel_expansion – Factor to extend the first kernel. This value can both be positive or negative. For example in the case of kernel_width=12, setting first_kernel_expansion=2 means that the first kernel will have a width of 24, whereas other kernels will have a width of 12. When using a negative value, divides the first kernel by the value instead of multiplying.
channel_exp_base – Which power of 2 to use in order to set the number of channels/weight sets in the network. For example, setting channel_exp_base=3 means that 2**3=8 weight sets will be used.
first_channel_expansion – Whether to expand / shrink the number of channels in the first layer as compared to other layers in the network. Works analogously to the first_kernel_expansion parameter.
num_lcl_chunks – Controls the number of splits applied to the input. E.g. with a input width of 800, using num_lcl_chunks=100 will result in a kernel width of 8, meaning 8 elements in the flattened input. If using a SNP inputs with a one-hot encoding of 4 possible values, this will result in 8/2 = 2 SNPs per locally connected area.
rb_do – Dropout in the residual blocks.
stochastic_depth_p – Probability of dropping input.
l1 – L1 regularization applied to the first layer in the network.
cutoff – Feature dimension cutoff where the automatic network setup stops adding layers. The ‘auto’ option is only supported when using the model for array outputs, and will set the cutoff to roughly the number of output features.
direction – Whether to use a “down” or “up” network. “Down” means that the feature representation will get smaller as it is propagated through the network, whereas “up” means that the feature representation will get larger.
attention_inclusion_cutoff – Cutoff to start including attention blocks in the network. If set to None, no attention blocks will be included. The cutoff here refers to the “length” dimension of the input after reshaping according to the output_feature_sets in the preceding layer. For example, if we 1024 output features, and we have 4 output feature sets, the length dimension will be 1024/4 = 256. With an attention cutoff >= 256, the attention block will be included.

class eir.models.input.array.models_linear.LinearModelConfig(fc_repr_dim: int = 32, l1: float = 0.0)

Parameters:

fc_repr_dim – Number of output nodes in the first and only hidden layer.
l1 – L1 regularisation to apply to the first layer.

Tabular Feature Extractors 

class eir.models.input.tabular.tabular.SimpleTabularModelConfig(l1: float = 0.0, fc_layer: bool = False)

Parameters:

l1 – L1 regularization applied to the embeddings for categorical tabular inputs.
fc_layer – Whether to add a single fully-connected layer to the model, alternative to looking up and passing the inputs through directly.

Sequence and Binary Feature Extractors 

Built-in Sequence Feature Extractors

class eir.models.input.sequence.transformer_models.BasicTransformerFeatureExtractorModelConfig(num_heads: int = 8, num_layers: int = 2, dim_feedforward: int | Literal['auto'] = 'auto', dropout: float = 0.1)

Parameters:

num_heads – The number of heads in the multi-head attention models
num_layers – The number of encoder blocks in the transformer model.
dim_feedforward – The dimension of the feedforward layers in the transformer model.
dropout – Dropout value to use in the encoder layers.

External Sequence Feature Extractors

Please refer to Sequence Models for more details about the external image models.

Image Feature Extractors 

Built-in Image Feature Extractors

class eir.models.input.array.models_cnn.CNNModelConfig(layers: None | List[int] = None, num_output_features: int = 256, channel_exp_base: int = 2, first_channel_expansion: int = 1, kernel_width: int = 12, first_kernel_expansion_width: int = 1, down_stride_width: int = 4, first_stride_expansion_width: int = 1, dilation_factor_width: int = 1, kernel_height: int = 4, first_kernel_expansion_height: int = 1, down_stride_height: int = 1, first_stride_expansion_height: int = 1, dilation_factor_height: int = 1, cutoff: int = 32, rb_do: float = 0.0, stochastic_depth_p: float = 0.0, attention_inclusion_cutoff: int = 0, l1: float = 0.0)

Parameters:

layers –
A list that controls the number of layers and channels in the model. Each element in the list represents a layer group with a specified number of layers and channels. Specifically,
- The first element in the list refers to the number of layers with the number of channels exactly as specified by the channel_exp_base parameter.
- The subsequent elements in the list correspond to an increased number of channels, doubling with each step. For instance, if channel_exp_base=3 (i.e., 2**3=8 channels), and the layers list is [5, 3, 2], the model would be constructed as follows,
  - First case: 5 layers with 8 channels
  - Second case: 3 layers with 16 channels (doubling from the previous case)
  - Third case: 2 layers with 32 channels (doubling from the previous case)
- The model currently supports a maximum of 4 elements in the list.
- If set to None, the model will automatically set up the number of layer groups until a certain width and height (stride * 8 for both) are met. In this automatic setup, channels will be increased as the input gets propagated through the network, while the width/height get reduced due to stride.
Future work includes adding a parameter to control the target width and height.
num_output_features – Output dimension of the last FC layer in the network which accepts the outputs from the convolutional layer.
channel_exp_base – Which power of 2 to use in order to set the number of channels in the network. For example, setting channel_exp_base=3 means that 2**3=8 channels will be used.
first_channel_expansion – Factor to extend the first layer channels.
kernel_width – Base kernel width of the convolutions.
first_kernel_expansion_width – Factor to extend the first kernel’s width.
down_stride_width – Down stride of the convolutional layers along the width.
first_stride_expansion_width – Factor to extend the first layer stride along the width.
dilation_factor_width – Base dilation factor of the convolutions along the width in the network.
kernel_height – Base kernel height of the convolutions.
first_kernel_expansion_height – Factor to extend the first kernel’s height.
down_stride_height – Down stride of the convolutional layers along the height.
first_stride_expansion_height – Factor to extend the first layer stride along the height.
dilation_factor_height – Base dilation factor of the convolutions along the height in the network.
cutoff – If the resulting dimension of width * height of adding a successive block is less than this value, will stop adding residual blocks to the model in the automated case (i.e., if the layers argument is not specified).
rb_do – Dropout in the convolutional residual blocks.
stochastic_depth_p – Probability of dropping input.
attention_inclusion_cutoff – If the dimension of width * height is less than this value, attention will be included in the model across channels and width * height as embedding dimension after that point (with the channels representing the length of the sequence).
l1 – L1 regularization to apply to the first layer.

External Image Feature Extractors

Please refer to Image Models for more details about the external image models.

Array Feature Extractors 

class eir.models.input.array.models_cnn.CNNModelConfig(layers: None | List[int] = None, num_output_features: int = 256, channel_exp_base: int = 2, first_channel_expansion: int = 1, kernel_width: int = 12, first_kernel_expansion_width: int = 1, down_stride_width: int = 4, first_stride_expansion_width: int = 1, dilation_factor_width: int = 1, kernel_height: int = 4, first_kernel_expansion_height: int = 1, down_stride_height: int = 1, first_stride_expansion_height: int = 1, dilation_factor_height: int = 1, cutoff: int = 32, rb_do: float = 0.0, stochastic_depth_p: float = 0.0, attention_inclusion_cutoff: int = 0, l1: float = 0.0)

Parameters:

layers –
A list that controls the number of layers and channels in the model. Each element in the list represents a layer group with a specified number of layers and channels. Specifically,
- The first element in the list refers to the number of layers with the number of channels exactly as specified by the channel_exp_base parameter.
- The subsequent elements in the list correspond to an increased number of channels, doubling with each step. For instance, if channel_exp_base=3 (i.e., 2**3=8 channels), and the layers list is [5, 3, 2], the model would be constructed as follows,
  - First case: 5 layers with 8 channels
  - Second case: 3 layers with 16 channels (doubling from the previous case)
  - Third case: 2 layers with 32 channels (doubling from the previous case)
- The model currently supports a maximum of 4 elements in the list.
- If set to None, the model will automatically set up the number of layer groups until a certain width and height (stride * 8 for both) are met. In this automatic setup, channels will be increased as the input gets propagated through the network, while the width/height get reduced due to stride.
Future work includes adding a parameter to control the target width and height.
num_output_features – Output dimension of the last FC layer in the network which accepts the outputs from the convolutional layer.
channel_exp_base – Which power of 2 to use in order to set the number of channels in the network. For example, setting channel_exp_base=3 means that 2**3=8 channels will be used.
first_channel_expansion – Factor to extend the first layer channels.
kernel_width – Base kernel width of the convolutions.
first_kernel_expansion_width – Factor to extend the first kernel’s width.
down_stride_width – Down stride of the convolutional layers along the width.
first_stride_expansion_width – Factor to extend the first layer stride along the width.
dilation_factor_width – Base dilation factor of the convolutions along the width in the network.
kernel_height – Base kernel height of the convolutions.
first_kernel_expansion_height – Factor to extend the first kernel’s height.
down_stride_height – Down stride of the convolutional layers along the height.
first_stride_expansion_height – Factor to extend the first layer stride along the height.
dilation_factor_height – Base dilation factor of the convolutions along the height in the network.
cutoff – If the resulting dimension of width * height of adding a successive block is less than this value, will stop adding residual blocks to the model in the automated case (i.e., if the layers argument is not specified).
rb_do – Dropout in the convolutional residual blocks.
stochastic_depth_p – Probability of dropping input.
attention_inclusion_cutoff – If the dimension of width * height is less than this value, attention will be included in the model across channels and width * height as embedding dimension after that point (with the channels representing the length of the sequence).
l1 – L1 regularization to apply to the first layer.

class eir.models.input.array.models_locally_connected.LCLModelConfig(patch_size: tuple[int, int, int] | None = None, layers: None | List[int] = None, kernel_width: int | Literal['patch'] = 16, first_kernel_expansion: int = -2, channel_exp_base: int = 2, first_channel_expansion: int = 1, num_lcl_chunks: None | int = None, rb_do: float = 0.1, stochastic_depth_p: float = 0.0, l1: float = 0.0, cutoff: int | Literal['auto'] = 1024, direction: Literal['down', 'up'] = 'down', attention_inclusion_cutoff: int | None = None)

Note that when using the automatic network setup, kernel widths will get expanded to ensure that the feature representations become smaller as they are propagated through the network.

Parameters:

patch_size – Controls the size of the patches used in the first layer. If set to None, the input is flattened according to the torch flatten function. Note that when using this parameter, we generally want the kernel width to be set to the multiplication of the patch size. Order follows PyTorch convention, i.e., [channels, height, width].
layers – Controls the number of layers in the model. If set to None, the model will automatically set up the number of layers according to the cutoff parameter value.
kernel_width – With of the locally connected kernels. Note that in the context of genomic inputs this refers to the flattened input, meaning that if we have a one-hot encoding of 4 values (e.g. SNPs), 12 refers to 12/4 = 3 SNPs per locally connected window. Can be set to None if the num_lcl_chunks parameter is set, which means that the kernel width will be set automatically according to
first_kernel_expansion – Factor to extend the first kernel. This value can both be positive or negative. For example in the case of kernel_width=12, setting first_kernel_expansion=2 means that the first kernel will have a width of 24, whereas other kernels will have a width of 12. When using a negative value, divides the first kernel by the value instead of multiplying.
channel_exp_base – Which power of 2 to use in order to set the number of channels/weight sets in the network. For example, setting channel_exp_base=3 means that 2**3=8 weight sets will be used.
first_channel_expansion – Whether to expand / shrink the number of channels in the first layer as compared to other layers in the network. Works analogously to the first_kernel_expansion parameter.
num_lcl_chunks – Controls the number of splits applied to the input. E.g. with a input width of 800, using num_lcl_chunks=100 will result in a kernel width of 8, meaning 8 elements in the flattened input. If using a SNP inputs with a one-hot encoding of 4 possible values, this will result in 8/2 = 2 SNPs per locally connected area.
rb_do – Dropout in the residual blocks.
stochastic_depth_p – Probability of dropping input.
l1 – L1 regularization applied to the first layer in the network.
cutoff – Feature dimension cutoff where the automatic network setup stops adding layers. The ‘auto’ option is only supported when using the model for array outputs, and will set the cutoff to roughly the number of output features.
direction – Whether to use a “down” or “up” network. “Down” means that the feature representation will get smaller as it is propagated through the network, whereas “up” means that the feature representation will get larger.
attention_inclusion_cutoff – Cutoff to start including attention blocks in the network. If set to None, no attention blocks will be included. The cutoff here refers to the “length” dimension of the input after reshaping according to the output_feature_sets in the preceding layer. For example, if we 1024 output features, and we have 4 output feature sets, the length dimension will be 1024/4 = 256. With an attention cutoff >= 256, the attention block will be included.

class eir.models.input.array.models_transformers.ArrayTransformerConfig(patch_size: tuple[int, ...], embedding_dim: int, num_heads: int = 8, num_layers: int = 2, dim_feedforward: int | Literal['auto'] = 'auto', dropout: float = 0.1, position: Literal['encode', 'embed'] = 'encode', position_dropout: float = 0.1)

Parameters:

patch_size – Controls the size of the patches used in the first layer. If set to None, the input is flattened according to the torch flatten function. Note that when using this parameter, we generally want the kernel width to be set to the multiplication of the patch size. Order follows PyTorch convention, i.e., [channels, height, width].
embedding_dim – The embedding dimension each patch is projected to. This is also the dimension of the transformer encoder layers.
num_heads – The number of heads in the multi-head attention layers.
num_layers – The number of transformer encoder layers.
dim_feedforward – The dimension of the feedforward layers in the transformer model.
dropout – The dropout rate to use in the transformer encoder layers.
position – Whether to encode the token position or use learnable position embeddings.
position_dropout – The dropout rate to use in the position encoding/embedding.

Fusion Configurations 

class eir.setup.schemas.FusionConfig(model_type: Literal['mlp-residual', 'identity', 'mgmoe', 'pass-through'], model_config: ResidualMLPConfig | IdentityConfig | MGMoEModelConfig)

Parameters:

model_type – Which type of fusion model to use.
model_config – Fusion model configuration.

Fusion Module Configuration 

class eir.models.fusion.fusion_default.ResidualMLPConfig(layers: ~typing.List[int] = <factory>, fc_task_dim: int = 256, rb_do: float = 0.1, fc_do: float = 0.1, stochastic_depth_p: float = 0.1)

Parameters:

layers – Number of residual MLP layers to use in for each output predictor after fusing.
fc_task_dim – Number of hidden nodes in each MLP residual block.
rb_do – Dropout in each MLP residual block.
fc_do – Dropout before final layer.
stochastic_depth_p – Probability of dropping input.

class eir.models.fusion.fusion_mgmoe.MGMoEModelConfig(layers: ~typing.Sequence[int] = <factory>, fc_task_dim: int = 64, mg_num_experts: int = 8, rb_do: float = 0.0, fc_do: float = 0.0, stochastic_depth_p: float = 0.0)

Parameters:

layers – A sequence of two int values controlling the number of residual MLP blocks in the network. The first item (i.e. layers[0]) refers to the number of blocks in the expert branches. The second item (i.e. layers[1]) refers to the number of blocks in the predictor branches.
fc_task_dim – Number of hidden nodes in all residual blocks (both expert and predictor) of the network.
mg_num_experts – Number of multi gate experts to use.
rb_do – Dropout in all MLP residual blocks (both expert and predictor).
fc_do – Dropout before the last FC layer.
stochastic_depth_p – Probability of dropping input.

class eir.models.fusion.fusion_identity.IdentityConfig

Output Configurations 

class eir.setup.schemas.OutputConfig(output_info: OutputInfoConfig, output_type_info: TabularOutputTypeConfig | SequenceOutputTypeConfig | ArrayOutputTypeConfig, model_config: TabularOutputModuleConfig | SequenceOutputModuleConfig | ArrayOutputModuleConfig, sampling_config: SequenceOutputSamplingConfig | ArrayOutputSamplingConfig | dict | None = None)

Parameters:

output_info – Information about the output source, name and type.
output_type_info – Information specific to the output type, e.g. which columns to predict from a tabular file.
model_config – Configuration for the chosen model (i.e. output module after fusion) for this output.
sampling_config – Configuration for how to sample results from the output module.

Output Info Configuration 

class eir.setup.schemas.OutputInfoConfig(output_source: str, output_name: str, output_type: Literal['tabular', 'sequence', 'array'], output_inner_key: str | None = None)

Parameters:

output_source – Where on the filesystem to locate the output (if applicable)
output_name – Name to identify the output.
output_type – Type of the output.

Output Type Configuration 

class eir.setup.schemas.TabularOutputTypeConfig(target_cat_columns: ~typing.Sequence[str] = <factory>, target_con_columns: ~typing.Sequence[str] = <factory>, label_parsing_chunk_size: None | int = None, cat_label_smoothing: float = 0.0, cat_loss_name: ~typing.Literal['CrossEntropyLoss'] = 'CrossEntropyLoss', con_loss_name: ~typing.Literal['MSELoss', 'L1Loss', 'SmoothL1Loss', 'PoissonNLLLoss', 'HuberLoss'] = 'MSELoss', uncertainty_weighted_mt_loss: bool = True)

Parameters:

target_cat_columns – Which columns from label_file to use as categorical targets.
target_con_columns – Which columns from label_file to use as continuous targets.
label_parsing_chunk_size – Number of rows to process at time when loading in the input_source. Useful when RAM is limited.
cat_label_smoothing – Label smoothing to apply to categorical targets.
uncertainty_weighted_mt_loss – Whether to use uncertainty weighted loss for multitask / multilabel learning.

class eir.setup.schema_modules.output_schemas_sequence.SequenceOutputTypeConfig(vocab_file: None | str = None, max_length: al_max_sequence_length = 'average', sampling_strategy_if_longer: Literal['from_start', 'uniform'] = 'uniform', min_freq: int = 10, split_on: str | None = ' ', tokenizer: al_tokenizer_choices = None, tokenizer_language: str | None = None, adaptive_tokenizer_max_vocab_size: int | None = None, sequence_operation: Literal['autoregressive', 'mlm'] = 'autoregressive')

Parameters:

vocab_file – An optional text file containing pre-defined vocabulary to use for the training. If this is not passed in, the framework will automatically build the vocabulary from the training data. Passing in a vocabulary file is therefore useful if (a) you want to manually specify / limit the vocabulary used and/or (b) you want to save time by pre-computing the vocabulary.
max_length – Maximum length to truncate/pad sequences to. This can be an integer or the values ‘max’ or ‘average’. The ‘max’ keyword will use the maximum sequence length found in the training data, while the ‘average’ will use the average length across all training samples.
sampling_strategy_if_longer – Controls how sequences are truncated if they are longer than the specified max_length parameter. Using ‘from_start’ will always truncate from the beginning of the sequence, ensuring the the samples will always be the same during training. Setting this parameter to uniform will uniformly sample a slice of a given sample sequence during training. Note that for consistency, the validation/test set samples always use the from_start setting when truncating.
min_freq – Minimum number of times a token must appear in the total training data to be included in the vocabulary. Note that this setting will not do anything if passing in vocab_file.
split_on – Which token to split the sequence on to generate separate tokens for the vocabulary.
tokenizer – Which tokenizer to use. Relevant if modelling on language, but not as much when doing it on other arbitrary sequences.
tokenizer_language – Which language rules the tokenizer should apply when tokenizing the raw data.
adaptive_tokenizer_max_vocab_size – If using an adaptive tokenizer (“bpe”), this parameter controls the maximum size of the vocabulary.
sequence_operation – Which operation to perform on the sequence. Currently only autoregressive is supported, which means that the model will be trained to predict the next token in the sequence given the previous tokens.

class eir.setup.schema_modules.output_schemas_array.ArrayOutputTypeConfig(normalization: Literal['element', 'channel'] | None = 'channel', adaptive_normalization_max_samples: int | None = None)

Parameters:

normalization – Which type of normalization to apply to the array data. If element, will normalize each element in the array independently. If channel, will normalize each channel in the array independently. For ‘channel’, assumes PyTorch format where the channel dimension is the first dimension.
adaptive_normalization_max_samples – If using adaptive normalization (channel / element), how many samples to use to compute the normalization parameters. If None, will use all samples.

Output Module Configuration 

Tabular Output Modules

class eir.models.output.tabular.tabular_output_modules.TabularOutputModuleConfig(model_init_config: ResidualMLPOutputModuleConfig | LinearOutputModuleConfig, model_type: Literal['mlp_residual', 'linear'] = 'mlp_residual')

Parameters:

model_init_config – Configuration / arguments used to initialise model.
model_type – Which type of image model to use.

The documentation below details what the parameters passed to the respective output output heads of the tabular output model. (trough the model_init_config field in the --output_configs .yaml files).

class eir.models.output.tabular.mlp_residual.ResidualMLPOutputModuleConfig(layers: ~typing.List[int] = <factory>, fc_task_dim: int = 256, rb_do: float = 0.1, fc_do: float = 0.1, stochastic_depth_p: float = 0.1, final_layer_type: ~typing.Literal['linear'] | ~typing.Literal['mlp_residual'] = 'linear')

Parameters:

layers – Number of residual MLP residual blocks to use in the output module.
fc_task_dim – Number of hidden nodes in each MLP residual block.
rb_do – Dropout in each MLP residual block.
fc_do – Dropout before final layer.
stochastic_depth_p – Stochastic depth probability (probability of dropping input) for each residual block.
final_layer_type – Which type of final layer to use to construct tabular output prediction.

class eir.models.output.tabular.linear.LinearOutputModuleConfig

Sequence Output Modules

class eir.models.output.sequence.sequence_output_modules.SequenceOutputModuleConfig(model_init_config: TransformerSequenceOutputModuleConfig, model_type: Literal['sequence'] = 'sequence', embedding_dim: int = 64, position: Literal['encode', 'embed'] = 'encode', position_dropout: float = 0.1, projection_layer_type: Literal['auto', 'lcl', 'lcl_residual', 'linear'] = 'auto')

Parameters:

model_init_config – Configuration / arguments used to initialise model.
model_type – Which type of image model to use.
embedding_dim – Which dimension to use for the embeddings. If None, will automatically set this value based on the number of tokens and attention heads.
position – Whether to encode the token position or use learnable position embeddings.
position_dropout – Dropout for the positional encoding / embedding.

Array Output Modules

class eir.models.output.array.array_output_modules.ArrayOutputModuleConfig(model_type: Literal['lcl', 'cnn'], model_init_config: LCLOutputModelConfig, pre_normalization: Literal['instancenorm', 'layernorm'] | None = None)

Parameters:

model_type – Which type of image model to use.
model_init_config – Configuration used to initialise model.

Output Sampling Configuration 

class eir.setup.schema_modules.output_schemas_sequence.SequenceOutputSamplingConfig(manual_inputs: Sequence[Dict[str, str]] = (), n_eval_inputs: int = 10, generated_sequence_length: int = 64, top_k: int = 20, top_p: float = 0.9)

Parameters:

manual_inputs –
Manually specified inputs to use for sequence generation. This is useful if you want to generate sequences based on a specific input. Depending on the input type, different formats are expected:
- sequence: A string written directly in the .yaml file.
- omics: A file path to NumPy array of shape (4, n_SNPs) on disk.
- image: An image file path on disk.
- tabular: A mapping of (column key: value) written directly in the .yaml file.
- array: A file path to NumPy array on disk.
- bytes: A file path to a file on disk.
n_eval_inputs – The number of inputs automatically sampled from the validation set for sequence generation.
generated_sequence_length – The length of the output sequences that are generated.
top_k – The number of top candidates to consider when sampling the next token in an output sequence. By default, the model considers the top 20 candidates
top_p – The cumulative probability of the top candidates to consider when sampling the next token in an output sequence. For example, if top_p is 0.9, the model will stop sampling candidates once the cumulative probability of the most likely candidates reaches 0.9.

class eir.setup.schema_modules.output_schemas_array.ArrayOutputSamplingConfig(manual_inputs: Sequence[dict[str, str]] = (), n_eval_inputs: int = 10)

Parameters:

manual_inputs –
Manually specified inputs to use for sequence generation. This is useful if you want to generate sequences based on a specific input. Depending on the input type, different formats are expected:
- sequence: A string written directly in the .yaml file.
- omics: A file path to NumPy array of shape (4, n_SNPs) on disk.
- image: An image file path on disk.
- tabular: A mapping of (column key: value) written directly in the .yaml file.
- array: A file path to NumPy array on disk.
- bytes: A file path to a file on disk.
n_eval_inputs – The number of inputs automatically sampled from the validation set for sequence generation.