Image Data Configuration
Complete configuration guide for image and visual data processing.
Quick Example
input_info:
input_source: "my/image/folder/"
input_name: "chest_xray"
input_type: "image"
input_type_info:
size: [224, 224]
num_channels: 1
model_config:
model_type: "cnn"
model_init_config:
channel_exp_base: 4
down_stride_average: True
kernel_width: 3
Input Data Configuration
Base Configuration
- class eir.setup.schemas.ImageInputDataConfig(
- auto_augment: bool = True,
- size: Sequence[int] = (64,),
- resize_approach: Literal['resize', 'randomcrop', 'centercrop'] = 'resize',
- adaptive_normalization_max_samples: int | None = None,
- mean_normalization_values: None | Sequence[float] = None,
- stds_normalization_values: None | Sequence[float] = None,
- mode: Literal['RGB', 'L', 'RGBA'] | None = None,
- num_channels: int | None = None,
- mixing_subtype: Literal['mixup', 'cutmix'] = 'mixup',
- modality_dropout_rate: float = 0.0,
- Parameters:
auto_augment – Setting this to
Truewill use TrivialAugment Wide augmentation.size – Target size of the images for training. If size is a sequence like (h, w), output size will be matched to this. If size is an int, the image will be resized to (size, size).
resize_approach –
The method used for resizing the images. Options are:
resize: Directly resize the image to the target size.randomcrop: Resize the image to a larger size than the target and then apply a random crop to the target size.centercrop: Resize the image to a larger size than the target and then apply a center crop to the target size.
adaptive_normalization_max_samples – If using adaptive normalization (channel), how many samples to use to compute the normalization parameters. If
None, uses all samples.mean_normalization_values – Average channel values to normalize images with. This can be a sequence matching the number of channels, or
None. IfNoneand using a pretrained model, the values used for the model pretraining will be used. IfNoneand training from scratch, will iterate over training data and compute the running average per channel.stds_normalization_values – Standard deviation channel values to normalize images with. This can be a sequence mathing the number of channels, or
None. IfNoneand using a pretrained model, the values used for the model pretraining will be used. IfNoneand training from scratch, will iterate over training data and compute the running average per channel.mode –
An explicit mode to convert loaded images to. Useful when working with input data with a mixed number of channels, or you want to convert images to a specific mode. Options are
RGB: Red, Green, Blue (channels=3)L: Grayscale (channels=1)RGBA: Red, Green, Blue, Alpha (channels=4)
num_channels – Number of channels in the images. If
None, tries to infer the number of channels from a random image in the training data. Useful when known ahead of time how many channels the images have, will raise an error if an image with a different number of channels is encountered.mixing_subtype – Which type of mixing to use on the image data given that
mixing_alphais set >0.0 in the global configuration.modality_dropout_rate – Dropout rate to apply to the modality, e.g.,
0.2means that 20% of the time, this modality will be dropped out during training.
Model Selection
- class eir.models.input.image.image_models.ImageModelConfig(
- model_type: Literal['cnn'] | str,
- model_init_config: CNNModelConfig | dict[str, Any],
- num_output_features: int = 0,
- pretrained_model: bool = False,
- freeze_pretrained_model: bool = False,
- Parameters:
model_type – Which type of image model to use.
model_init_config – Configuration / arguments used to initialise model.
num_output_features – Number of output final output features from image feature extractor, projected with a linear layer, which get passed to fusion module. If set to 0, the output from the feature extractor is passed directly as is to the fusion module.
pretrained_model – Specify whether the model type is assumed to be pretrained and from the Pytorch Image Models repository.
freeze_pretrained_model – Whether to freeze the pretrained model weights.
Available Feature Extractors
Built-in Image Models
CNN Architecture
- class eir.models.input.array.models_cnn.CNNModelConfig(
- layers: None | list[int] = None,
- num_output_features: int = 0,
- channel_exp_base: int = 2,
- first_channel_expansion: int = 1,
- kernel_width: int = 12,
- first_kernel_expansion_width: float = 1.0,
- down_stride_width: int = 4,
- first_stride_expansion_width: float = 1.0,
- dilation_factor_width: int = 1,
- kernel_height: int = 4,
- first_kernel_expansion_height: float = 1.0,
- down_stride_height: int = 1,
- first_stride_expansion_height: float = 1.0,
- dilation_factor_height: int = 1,
- allow_first_conv_size_reduction: bool = True,
- down_sample_every_n_blocks: int | None = 2,
- cutoff: int = 32,
- rb_do: float = 0.0,
- stochastic_depth_p: float = 0.0,
- attention_inclusion_cutoff: int = 256,
- l1: float = 0.0,
- Parameters:
layers –
A list that controls the number of layers and channels in the model. Each element in the list represents a layer group with a specified number of layers and channels. Specifically,
The first element in the list refers to the number of layers with the number of channels exactly as specified by the
channel_exp_baseparameter.The subsequent elements in the list correspond to an increased number of channels, doubling with each step. For instance, if
channel_exp_base=3(i.e.,2**3=8channels), and thelayerslist is[5, 3, 2], the model would be constructed as follows,First case: 5 layers with 8 channels
Second case: 3 layers with 16 channels (doubling from the previous case)
Third case: 2 layers with 32 channels (doubling from the previous case)
The model currently supports a maximum of 4 elements in the list.
If set to
None, the model will automatically set up the number of layer groups until a certain width and height (stride * 8for both) are met. In this automatic setup, channels will be increased as the input gets propagated through the network, while the width/height get reduced due to stride.
Future work includes adding a parameter to control the target width and height.
num_output_features – Output dimension of the last FC layer in the network which accepts the outputs from the convolutional layer. If set to 0, the output will be passed through directly to the fusion module.
channel_exp_base – Which power of 2 to use in order to set the number of channels in the network. For example, setting
channel_exp_base=3means that 2**3=8 channels will be used.first_channel_expansion – Factor to extend the first layer channels.
kernel_width – Base kernel width of the convolutions.
first_kernel_expansion_width – Factor to extend the first kernel’s width. The result of the multiplication will be rounded to the nearest integer.
down_stride_width – Down stride of the convolutional layers along the width.
first_stride_expansion_width – Factor to extend the first layer stride along the width. The result of the multiplication will be rounded to the nearest integer.
dilation_factor_width – Base dilation factor of the convolutions along the width in the network.
kernel_height – Base kernel height of the convolutions.
first_kernel_expansion_height – Factor to extend the first kernel’s height. The result of the multiplication will be rounded to the nearest integer.
down_stride_height – Down stride of the convolutional layers along the height.
first_stride_expansion_height – Factor to extend the first layer stride along the height. The result of the multiplication will be rounded to the nearest integer.
dilation_factor_height – Base dilation factor of the convolutions along the height in the network.
allow_first_conv_size_reduction – If set to False, will not allow the first convolutional layer to reduce the size of the input. Setting this is true if you want to ensure that the first convolutional layer reduces the size of the input, for example when the input is very large, and we want to compress it early.
cutoff – If the resulting dimension of width * height of adding a successive block is less than this value, will stop adding residual blocks to the model in the automated case (i.e., if the layers argument is not specified).
rb_do – Dropout in the convolutional residual blocks.
stochastic_depth_p – Probability of dropping input.
attention_inclusion_cutoff – If the dimension of width * height is less than this value, attention will be included in the model across channels and width * height as embedding dimension after that point (with the channels representing the length of the sequence).
l1 – L1 regularization to apply to the first layer.
External Image Models
For pre-trained vision models (ResNet, Vision Transformers, etc.), please refer to Image Models for detailed configuration options.
Interpretation Support
- class eir.setup.schemas.BasicInterpretationConfig(
- interpretation_sampling_strategy: Literal['first_n', 'random_sample'] = 'first_n',
- num_samples_to_interpret: int = 10,
- manual_samples_to_interpret: Sequence[str] | None = None,
- Parameters:
interpretation_sampling_strategy – How to sample sequences for attribution analysis.
first_nalways grabs the same first n values from the beginning of the dataset to interpret, whilerandom_samplewill sample uniformly from the whole dataset without replacement.num_samples_to_interpret – How many samples to interpret.
manual_samples_to_interpret – IDs of samples to always interpret, irrespective of
interpretation_sampling_strategyandnum_samples_to_interpret. A caveat here is that they must be present in the dataset that is being interpreted (e.g., validation / test dataset), meaning that adding IDs here that happen to be in the training dataset will not work.