Image Data Configuration

Complete configuration guide for image and visual data processing.

Quick Example 

input_info:
  input_source: "my/image/folder/"
  input_name: "chest_xray"
  input_type: "image"
input_type_info:
  size: [224, 224]
  num_channels: 1
model_config:
  model_type: "cnn"
  model_init_config:
    channel_exp_base: 4
    down_stride_average: True
    kernel_width: 3

class eir.setup.schemas.ImageInputDataConfig( auto_augment: bool = True, size: Sequence[int] = (64,), resize_approach: Literal['resize', 'randomcrop', 'centercrop'] = 'resize', adaptive_normalization_max_samples: int | None = None, mean_normalization_values: None | Sequence[float] = None, stds_normalization_values: None | Sequence[float] = None, mode: Literal['RGB', 'L', 'RGBA'] | None = None, num_channels: int | None = None, mixing_subtype: Literal['mixup', 'cutmix'] = 'mixup', modality_dropout_rate: float = 0.0, )

Parameters:

auto_augment – Setting this to True will use TrivialAugment Wide augmentation.
size – Target size of the images for training. If size is a sequence like (h, w), output size will be matched to this. If size is an int, the image will be resized to (size, size).
resize_approach –
The method used for resizing the images. Options are:
- resize: Directly resize the image to the target size.
- randomcrop: Resize the image to a larger size than the target and then apply a random crop to the target size.
- centercrop: Resize the image to a larger size than the target and then apply a center crop to the target size.
adaptive_normalization_max_samples – If using adaptive normalization (channel), how many samples to use to compute the normalization parameters. If None, uses all samples.
mean_normalization_values – Average channel values to normalize images with. This can be a sequence matching the number of channels, or None. If None and using a pretrained model, the values used for the model pretraining will be used. If None and training from scratch, will iterate over training data and compute the running average per channel.
stds_normalization_values – Standard deviation channel values to normalize images with. This can be a sequence mathing the number of channels, or None. If None and using a pretrained model, the values used for the model pretraining will be used. If None and training from scratch, will iterate over training data and compute the running average per channel.
mode –
An explicit mode to convert loaded images to. Useful when working with input data with a mixed number of channels, or you want to convert images to a specific mode. Options are
- RGB: Red, Green, Blue (channels=3)
- L: Grayscale (channels=1)
- RGBA: Red, Green, Blue, Alpha (channels=4)
num_channels – Number of channels in the images. If None, tries to infer the number of channels from a random image in the training data. Useful when known ahead of time how many channels the images have, will raise an error if an image with a different number of channels is encountered.
mixing_subtype – Which type of mixing to use on the image data given that mixing_alpha is set >0.0 in the global configuration.
modality_dropout_rate – Dropout rate to apply to the modality, e.g., 0.2 means that 20% of the time, this modality will be dropped out during training.

Model Selection 

class eir.models.input.image.image_models.ImageModelConfig( model_type: Literal['cnn'] | str, model_init_config: CNNModelConfig | dict[str, Any], num_output_features: int = 0, pretrained_model: bool = False, freeze_pretrained_model: bool = False, )

Parameters:

model_type – Which type of image model to use.
model_init_config – Configuration / arguments used to initialise model.
num_output_features – Number of output final output features from image feature extractor, projected with a linear layer, which get passed to fusion module. If set to 0, the output from the feature extractor is passed directly as is to the fusion module.
pretrained_model – Specify whether the model type is assumed to be pretrained and from the Pytorch Image Models repository.
freeze_pretrained_model – Whether to freeze the pretrained model weights.

Available Feature Extractors 

Built-in Image Models 

CNN Architecture

class eir.models.input.array.models_cnn.CNNModelConfig( layers: None | list[int] = None, num_output_features: int = 0, channel_exp_base: int = 2, first_channel_expansion: int = 1, kernel_width: int = 12, first_kernel_expansion_width: float = 1.0, down_stride_width: int = 4, first_stride_expansion_width: float = 1.0, dilation_factor_width: int = 1, kernel_height: int = 4, first_kernel_expansion_height: float = 1.0, down_stride_height: int = 1, first_stride_expansion_height: float = 1.0, dilation_factor_height: int = 1, allow_first_conv_size_reduction: bool = True, down_sample_every_n_blocks: int | None = 2, cutoff: int = 32, rb_do: float = 0.0, stochastic_depth_p: float = 0.0, attention_inclusion_cutoff: int = 256, l1: float = 0.0, )

Parameters:

layers –
A list that controls the number of layers and channels in the model. Each element in the list represents a layer group with a specified number of layers and channels. Specifically,
- The first element in the list refers to the number of layers with the number of channels exactly as specified by the channel_exp_base parameter.
- The subsequent elements in the list correspond to an increased number of channels, doubling with each step. For instance, if channel_exp_base=3 (i.e., 2**3=8 channels), and the layers list is [5, 3, 2], the model would be constructed as follows,
  - First case: 5 layers with 8 channels
  - Second case: 3 layers with 16 channels (doubling from the previous case)
  - Third case: 2 layers with 32 channels (doubling from the previous case)
- The model currently supports a maximum of 4 elements in the list.
- If set to None, the model will automatically set up the number of layer groups until a certain width and height (stride * 8 for both) are met. In this automatic setup, channels will be increased as the input gets propagated through the network, while the width/height get reduced due to stride.
Future work includes adding a parameter to control the target width and height.
num_output_features – Output dimension of the last FC layer in the network which accepts the outputs from the convolutional layer. If set to 0, the output will be passed through directly to the fusion module.
channel_exp_base – Which power of 2 to use in order to set the number of channels in the network. For example, setting channel_exp_base=3 means that 2**3=8 channels will be used.
first_channel_expansion – Factor to extend the first layer channels.
kernel_width – Base kernel width of the convolutions.
first_kernel_expansion_width – Factor to extend the first kernel’s width. The result of the multiplication will be rounded to the nearest integer.
down_stride_width – Down stride of the convolutional layers along the width.
first_stride_expansion_width – Factor to extend the first layer stride along the width. The result of the multiplication will be rounded to the nearest integer.
dilation_factor_width – Base dilation factor of the convolutions along the width in the network.
kernel_height – Base kernel height of the convolutions.
first_kernel_expansion_height – Factor to extend the first kernel’s height. The result of the multiplication will be rounded to the nearest integer.
down_stride_height – Down stride of the convolutional layers along the height.
first_stride_expansion_height – Factor to extend the first layer stride along the height. The result of the multiplication will be rounded to the nearest integer.
dilation_factor_height – Base dilation factor of the convolutions along the height in the network.
allow_first_conv_size_reduction – If set to False, will not allow the first convolutional layer to reduce the size of the input. Setting this is true if you want to ensure that the first convolutional layer reduces the size of the input, for example when the input is very large, and we want to compress it early.
cutoff – If the resulting dimension of width * height of adding a successive block is less than this value, will stop adding residual blocks to the model in the automated case (i.e., if the layers argument is not specified).
rb_do – Dropout in the convolutional residual blocks.
stochastic_depth_p – Probability of dropping input.
attention_inclusion_cutoff – If the dimension of width * height is less than this value, attention will be included in the model across channels and width * height as embedding dimension after that point (with the channels representing the length of the sequence).
l1 – L1 regularization to apply to the first layer.

External Image Models 

For pre-trained vision models (ResNet, Vision Transformers, etc.), please refer to Image Models for detailed configuration options.

Interpretation Support 

class eir.setup.schemas.BasicInterpretationConfig( interpretation_sampling_strategy: Literal['first_n', 'random_sample'] = 'first_n', num_samples_to_interpret: int = 10, manual_samples_to_interpret: Sequence[str] | None = None, )

Parameters:

interpretation_sampling_strategy – How to sample sequences for attribution analysis. first_n always grabs the same first n values from the beginning of the dataset to interpret, while random_sample will sample uniformly from the whole dataset without replacement.
num_samples_to_interpret – How many samples to interpret.
manual_samples_to_interpret – IDs of samples to always interpret, irrespective of interpretation_sampling_strategy and num_samples_to_interpret. A caveat here is that they must be present in the dataset that is being interpreted (e.g., validation / test dataset), meaning that adding IDs here that happen to be in the training dataset will not work.

Image Data Configuration

Quick Example 

Input Data Configuration 

Base Configuration 

Model Selection 

Available Feature Extractors 

Built-in Image Models 

CNN Architecture

External Image Models 

Interpretation Support 