Tabular Data Configuration

Complete configuration guide for structured tabular data (CSV).

Overview 

Tabular data in EIR handles structured data with rows and columns, supporting both numerical and categorical features with automatic preprocessing and encoding.

Quick Example 

input_info:
  input_source: "my_csv_file.csv"
  input_name: "patient_data"
  input_type: "tabular"
input_type_info:
  input_cat_columns: ["gender", "diagnosis"]
  input_con_columns: ["age", "weight", "height"]
model_config:
  model_type: "tabular"
  model_init_config:
    l1: 1e-04
    fc_repr_dim: 64

class eir.setup.schemas.TabularInputDataConfig( input_cat_columns: Sequence[str] = <factory>, input_con_columns: Sequence[str] = <factory>, label_parsing_chunk_size: None | int = None, mixing_subtype: Literal['mixup'] = 'mixup', modality_dropout_rate: float = 0.0, )

Parameters:

input_cat_columns – Which columns to use as a categorical inputs from the input_source specified in the input_info field of the relevant .yaml.
input_con_columns – Which columns to use as a continuous inputs from the input_source specified in the input_info field of the relevant .yaml.
label_parsing_chunk_size – Number of rows to process at time when loading in the input_source. Useful when RAM is limited.
mixing_subtype – Which type of mixing to use on the tabular data given that mixing_alpha is set >0.0 in the global configuration.
modality_dropout_rate – Dropout rate to apply to the modality, e.g., 0.2 means that 20% of the time, this modality will be dropped out during training.

Model Selection 

class eir.models.input.tabular.tabular.TabularModelConfig( model_init_config: SimpleTabularModelConfig, model_type: Literal['tabular'] = 'tabular', )

Parameters:

model_type – Which type of image model to use.
model_init_config – Configuration / arguments used to initialise model.

Available Feature Extractors 

Simple Tabular Model 

class eir.models.input.tabular.tabular.SimpleTabularModelConfig( l1: float = 0.0, fc_layer: bool = False, drop_prob: float = 0.0, layers: list[int] = <factory>, fc_do: float = 0.1, fc_dim: int | Literal['auto'] | None = None, )

Parameters:

l1 – L1 regularization applied to the embeddings for categorical tabular inputs.
fc_layer – Whether to add a single fully-connected layer to the model, alternative to looking up and passing the inputs through directly.
drop_prob – Probability of dropping entire branch output during training. Set to 1.0 to completely disable learning from this input (useful for testing). During eval mode, dropout is not applied.
layers – Number of MLP-residual blocks to add after the embedding/linear layer. List format for compatibility with fusion module configs (only first element used). Default is [0] (no additional blocks). Useful for learning complex tabular feature interactions before fusion.
fc_do – Dropout probability for MLP-residual blocks. Only used if layers[0] > 0.
fc_dim – Hidden dimension for MLP-residual blocks. Options: - None: uses input_dim (default behavior) - “auto”: computes closest power of 2 to 4x input_dim - int: explicit dimension When set, first block projects from input_dim to fc_dim, and subsequent blocks maintain fc_dim.

Tabular Data Configuration

Overview 

Quick Example 

Input Data Configuration 

Base Configuration 

Model Selection 

Available Feature Extractors 

Simple Tabular Model 

Tabular Data Configuration

Overview

Quick Example

Input Data Configuration

Base Configuration

Model Selection

Available Feature Extractors

Simple Tabular Model

Overview 

Quick Example 

Input Data Configuration 

Base Configuration 

Model Selection 

Available Feature Extractors 

Simple Tabular Model 