Array Data Configuration

Configuration guide for multi-dimensional array and tensor data. Generally refers to NumPy arrays stored on disk (or streamed if using the streaming functionality), all of them with the same shape and type.

Overview

Array data in EIR handles multi-dimensional numerical data:

  • Scientific data - Sensor readings, measurements

  • Signal processing - Audio spectrograms, time-frequency data

  • Multi-dimensional features - Engineered feature matrices

  • Tensor data - Any N-dimensional numerical array

Quick Example

input_info:
  input_source: "my/array/data/folder/"
  input_name: "sensor_data"
  input_type: "array"
model_config:
  model_type: "cnn"
  model_init_config:
    channel_exp_base: 3
    kernel_width: 5
    kernel_height: 1

Input Data Configuration

Base Configuration

class eir.setup.schemas.ArrayInputDataConfig(
mixing_subtype: Literal['mixup'] = 'mixup',
modality_dropout_rate: float = 0.0,
normalization: Literal['element', 'channel'] | None = 'channel',
adaptive_normalization_max_samples: int | None = None,
)
Parameters:
  • mixing_subtype – Which type of mixing to use on the image data given that mixing_alpha is set >0.0 in the global configuration.

  • modality_dropout_rate – Dropout rate to apply to the modality, e.g., 0.2 means that 20% of the time, this modality will be dropped out during training.

  • normalization – Which type of normalization to apply to the array data. If element, will normalize each element in the array independently. If channel, will normalize each channel in the array independently. For ‘channel’, assumes PyTorch format where the channel dimension is the first dimension.

  • adaptive_normalization_max_samples – If using adaptive normalization (channel / element), how many samples to use to compute the normalization parameters. If None, uses all samples.

Model Selection

class eir.models.input.array.array_models.ArrayModelConfig(
model_type: Literal['cnn', 'lcl', 'lcl-informed-moe', 'transformer'],
model_init_config: CNNModelConfig | LCLModelConfig | LCLInformedMoEModelConfig | ArrayTransformerConfig,
pre_normalization: Literal['instancenorm', 'layernorm'] | None = None,
)
Parameters:
  • model_type – Which type of image model to use.

  • model_init_config – Configuration used to initialise model.

Available Feature Extractors

CNN Models

class eir.models.input.array.models_cnn.CNNModelConfig(
layers: None | list[int] = None,
num_output_features: int = 0,
channel_exp_base: int = 2,
first_channel_expansion: int = 1,
kernel_width: int = 12,
first_kernel_expansion_width: float = 1.0,
down_stride_width: int = 4,
first_stride_expansion_width: float = 1.0,
dilation_factor_width: int = 1,
kernel_height: int = 4,
first_kernel_expansion_height: float = 1.0,
down_stride_height: int = 1,
first_stride_expansion_height: float = 1.0,
dilation_factor_height: int = 1,
allow_first_conv_size_reduction: bool = True,
down_sample_every_n_blocks: int | None = 2,
cutoff: int = 32,
rb_do: float = 0.0,
stochastic_depth_p: float = 0.0,
attention_inclusion_cutoff: int = 256,
l1: float = 0.0,
)
Parameters:
  • layers

    A list that controls the number of layers and channels in the model. Each element in the list represents a layer group with a specified number of layers and channels. Specifically,

    • The first element in the list refers to the number of layers with the number of channels exactly as specified by the channel_exp_base parameter.

    • The subsequent elements in the list correspond to an increased number of channels, doubling with each step. For instance, if channel_exp_base=3 (i.e., 2**3=8 channels), and the layers list is [5, 3, 2], the model would be constructed as follows,

      • First case: 5 layers with 8 channels

      • Second case: 3 layers with 16 channels (doubling from the previous case)

      • Third case: 2 layers with 32 channels (doubling from the previous case)

    • The model currently supports a maximum of 4 elements in the list.

    • If set to None, the model will automatically set up the number of layer groups until a certain width and height (stride * 8 for both) are met. In this automatic setup, channels will be increased as the input gets propagated through the network, while the width/height get reduced due to stride.

    Future work includes adding a parameter to control the target width and height.

  • num_output_features – Output dimension of the last FC layer in the network which accepts the outputs from the convolutional layer. If set to 0, the output will be passed through directly to the fusion module.

  • channel_exp_base – Which power of 2 to use in order to set the number of channels in the network. For example, setting channel_exp_base=3 means that 2**3=8 channels will be used.

  • first_channel_expansion – Factor to extend the first layer channels.

  • kernel_width – Base kernel width of the convolutions.

  • first_kernel_expansion_width – Factor to extend the first kernel’s width. The result of the multiplication will be rounded to the nearest integer.

  • down_stride_width – Down stride of the convolutional layers along the width.

  • first_stride_expansion_width – Factor to extend the first layer stride along the width. The result of the multiplication will be rounded to the nearest integer.

  • dilation_factor_width – Base dilation factor of the convolutions along the width in the network.

  • kernel_height – Base kernel height of the convolutions.

  • first_kernel_expansion_height – Factor to extend the first kernel’s height. The result of the multiplication will be rounded to the nearest integer.

  • down_stride_height – Down stride of the convolutional layers along the height.

  • first_stride_expansion_height – Factor to extend the first layer stride along the height. The result of the multiplication will be rounded to the nearest integer.

  • dilation_factor_height – Base dilation factor of the convolutions along the height in the network.

  • allow_first_conv_size_reduction – If set to False, will not allow the first convolutional layer to reduce the size of the input. Setting this is true if you want to ensure that the first convolutional layer reduces the size of the input, for example when the input is very large, and we want to compress it early.

  • cutoff – If the resulting dimension of width * height of adding a successive block is less than this value, will stop adding residual blocks to the model in the automated case (i.e., if the layers argument is not specified).

  • rb_do – Dropout in the convolutional residual blocks.

  • stochastic_depth_p – Probability of dropping input.

  • attention_inclusion_cutoff – If the dimension of width * height is less than this value, attention will be included in the model across channels and width * height as embedding dimension after that point (with the channels representing the length of the sequence).

  • l1 – L1 regularization to apply to the first layer.

Locally Connected Models

class eir.models.input.array.models_locally_connected.LCLModelConfig(
patch_size: tuple[int, int, int] | None = None,
layers: None | list[int] = None,
kernel_width: int | Literal['patch'] = 12,
first_kernel_expansion: int = -2,
channel_exp_base: int = 2,
first_channel_expansion: int = 1,
num_lcl_chunks: None | int = None,
rb_do: float = 0.1,
stochastic_depth_p: float = 0.0,
l1: float = 0.0,
cutoff: int | Literal['auto'] = 1024,
direction: Literal['down', 'up'] = 'down',
attention_inclusion_cutoff: int | None = None,
)

This is what the "genome-local-net" model refers to. See https://academic.oup.com/nar/article/51/12/e67/7177885 for more details on the model architecture.

Note that when using the automatic network setup, kernel widths will get expanded to ensure that the feature representations become smaller as they are propagated through the network.

Parameters:
  • patch_size – Controls the size of the patches used in the first layer. If set to None, the input is flattened according to the torch flatten function. Note that when using this parameter, we generally want the kernel width to be set to the multiplication of the patch size. Order follows PyTorch convention, i.e., [channels, height, width].

  • layers – Controls the number of layers in the model. If set to None, the model will automatically set up the number of layers according to the cutoff parameter value.

  • kernel_width – With of the locally connected kernels. Note that in the context of genomic inputs this refers to the flattened input, meaning that if we have a one-hot encoding of 4 values (e.g. SNPs), 12 refers to 12/4 = 3 SNPs per locally connected window. Can be set to None if the num_lcl_chunks parameter is set, which means that the kernel width will be set automatically according to

  • first_kernel_expansion – Factor to extend the first kernel. This value can both be positive or negative. For example in the case of kernel_width=12, setting first_kernel_expansion=2 means that the first kernel will have a width of 24, whereas other kernels will have a width of 12. When using a negative value, divides the first kernel by the value instead of multiplying.

  • channel_exp_base – Which power of 2 to use in order to set the number of channels/weight sets in the network. For example, setting channel_exp_base=3 means that 2**3=8 weight sets will be used.

  • first_channel_expansion – Whether to expand / shrink the number of channels in the first layer as compared to other layers in the network. Works analogously to the first_kernel_expansion parameter.

  • num_lcl_chunks – Controls the number of splits applied to the input. E.g. with a input width of 800, using num_lcl_chunks=100 will result in a kernel width of 8, meaning 8 elements in the flattened input. If using a SNP inputs with a one-hot encoding of 4 possible values, this will result in 8/2 = 2 SNPs per locally connected area.

  • rb_do – Dropout in the residual blocks.

  • stochastic_depth_p – Probability of dropping input.

  • l1 – L1 regularization applied to the first layer in the network.

  • cutoff – Feature dimension cutoff where the automatic network setup stops adding layers. The ‘auto’ option is only supported when using the model for array outputs, and will set the cutoff to roughly the number of output features.

  • direction – Whether to use a “down” or “up” network. “Down” means that the feature representation will get smaller as it is propagated through the network, whereas “up” means that the feature representation will get larger.

  • attention_inclusion_cutoff – Cutoff to start including attention blocks in the network. If set to None, no attention blocks will be included. The cutoff here refers to the “length” dimension of the input after reshaping according to the output_feature_sets in the preceding layer. For example, if we 1024 output features, and we have 4 output feature sets, the length dimension will be 1024/4 = 256. With an attention cutoff >= 256, the attention block will be included.

Transformer Models

class eir.models.input.array.models_transformers.ArrayTransformerConfig(
patch_size: tuple[int, int, int],
embedding_dim: int,
num_heads: int = 8,
num_layers: int = 2,
dim_feedforward: int | Literal['auto'] = 'auto',
dropout: float = 0.1,
position: Literal['encode', 'embed'] = 'encode',
position_dropout: float = 0.1,
)
Parameters:
  • patch_size – Controls the size of the patches used in the first layer. If set to None, the input is flattened according to the torch flatten function. Note that when using this parameter, we generally want the kernel width to be set to the multiplication of the patch size. Order follows PyTorch convention, i.e., [channels, height, width].

  • embedding_dim – The embedding dimension each patch is projected to. This is also the dimension of the transformer encoder layers.

  • num_heads – The number of heads in the multi-head attention layers.

  • num_layers – The number of transformer encoder layers.

  • dim_feedforward – The dimension of the feedforward layers in the transformer model.

  • dropout – The dropout rate to use in the transformer encoder layers.

  • position – Whether to encode the token position or use learnable position embeddings.

  • position_dropout – The dropout rate to use in the position encoding/embedding.