Tabular Data Configuration

Complete configuration guide for structured tabular data (CSV).

Overview

Tabular data in EIR handles structured data with rows and columns, supporting both numerical and categorical features with automatic preprocessing and encoding.

Quick Example

input_info:
  input_source: "my_csv_file.csv"
  input_name: "patient_data"
  input_type: "tabular"
input_type_info:
  input_cat_columns: ["gender", "diagnosis"]
  input_con_columns: ["age", "weight", "height"]
model_config:
  model_type: "tabular"
  model_init_config:
    l1: 1e-04
    fc_repr_dim: 64

Input Data Configuration

Base Configuration

class eir.setup.schemas.TabularInputDataConfig(
input_cat_columns: Sequence[str] = <factory>,
input_con_columns: Sequence[str] = <factory>,
label_parsing_chunk_size: None | int = None,
mixing_subtype: Literal['mixup'] = 'mixup',
modality_dropout_rate: float = 0.0,
)
Parameters:
  • input_cat_columns – Which columns to use as a categorical inputs from the input_source specified in the input_info field of the relevant .yaml.

  • input_con_columns – Which columns to use as a continuous inputs from the input_source specified in the input_info field of the relevant .yaml.

  • label_parsing_chunk_size – Number of rows to process at time when loading in the input_source. Useful when RAM is limited.

  • mixing_subtype – Which type of mixing to use on the tabular data given that mixing_alpha is set >0.0 in the global configuration.

  • modality_dropout_rate – Dropout rate to apply to the modality, e.g., 0.2 means that 20% of the time, this modality will be dropped out during training.

Model Selection

class eir.models.input.tabular.tabular.TabularModelConfig(
model_init_config: SimpleTabularModelConfig,
model_type: Literal['tabular'] = 'tabular',
)
Parameters:
  • model_type – Which type of image model to use.

  • model_init_config – Configuration / arguments used to initialise model.

Available Feature Extractors

Simple Tabular Model

class eir.models.input.tabular.tabular.SimpleTabularModelConfig(
l1: float = 0.0,
fc_layer: bool = False,
drop_prob: float = 0.0,
layers: list[int] = <factory>,
fc_do: float = 0.1,
fc_dim: int | Literal['auto'] | None = None,
)
Parameters:
  • l1 – L1 regularization applied to the embeddings for categorical tabular inputs.

  • fc_layer – Whether to add a single fully-connected layer to the model, alternative to looking up and passing the inputs through directly.

  • drop_prob – Probability of dropping entire branch output during training. Set to 1.0 to completely disable learning from this input (useful for testing). During eval mode, dropout is not applied.

  • layers – Number of MLP-residual blocks to add after the embedding/linear layer. List format for compatibility with fusion module configs (only first element used). Default is [0] (no additional blocks). Useful for learning complex tabular feature interactions before fusion.

  • fc_do – Dropout probability for MLP-residual blocks. Only used if layers[0] > 0.

  • fc_dim – Hidden dimension for MLP-residual blocks. Options: - None: uses input_dim (default behavior) - “auto”: computes closest power of 2 to 4x input_dim - int: explicit dimension When set, first block projects from input_dim to fc_dim, and subsequent blocks maintain fc_dim.