Tabular Data Configuration
Complete configuration guide for structured tabular data (CSV).
Overview
Tabular data in EIR handles structured data with rows and columns, supporting both numerical and categorical features with automatic preprocessing and encoding.
Quick Example
input_info:
input_source: "my_csv_file.csv"
input_name: "patient_data"
input_type: "tabular"
input_type_info:
input_cat_columns: ["gender", "diagnosis"]
input_con_columns: ["age", "weight", "height"]
model_config:
model_type: "tabular"
model_init_config:
l1: 1e-04
fc_repr_dim: 64
Input Data Configuration
Base Configuration
- class eir.setup.schemas.TabularInputDataConfig(
- input_cat_columns: Sequence[str] = <factory>,
- input_con_columns: Sequence[str] = <factory>,
- label_parsing_chunk_size: None | int = None,
- mixing_subtype: Literal['mixup'] = 'mixup',
- modality_dropout_rate: float = 0.0,
- Parameters:
input_cat_columns – Which columns to use as a categorical inputs from the
input_sourcespecified in theinput_infofield of the relevant.yaml.input_con_columns – Which columns to use as a continuous inputs from the
input_sourcespecified in theinput_infofield of the relevant.yaml.label_parsing_chunk_size – Number of rows to process at time when loading in the
input_source. Useful when RAM is limited.mixing_subtype – Which type of mixing to use on the tabular data given that
mixing_alphais set >0.0 in the global configuration.modality_dropout_rate – Dropout rate to apply to the modality, e.g.,
0.2means that 20% of the time, this modality will be dropped out during training.
Model Selection
- class eir.models.input.tabular.tabular.TabularModelConfig(
- model_init_config: SimpleTabularModelConfig,
- model_type: Literal['tabular'] = 'tabular',
- Parameters:
model_type – Which type of image model to use.
model_init_config – Configuration / arguments used to initialise model.
Available Feature Extractors
Simple Tabular Model
- class eir.models.input.tabular.tabular.SimpleTabularModelConfig(
- l1: float = 0.0,
- fc_layer: bool = False,
- drop_prob: float = 0.0,
- layers: list[int] = <factory>,
- fc_do: float = 0.1,
- fc_dim: int | Literal['auto'] | None = None,
- Parameters:
l1 – L1 regularization applied to the embeddings for categorical tabular inputs.
fc_layer – Whether to add a single fully-connected layer to the model, alternative to looking up and passing the inputs through directly.
drop_prob – Probability of dropping entire branch output during training. Set to 1.0 to completely disable learning from this input (useful for testing). During eval mode, dropout is not applied.
layers – Number of MLP-residual blocks to add after the embedding/linear layer. List format for compatibility with fusion module configs (only first element used). Default is [0] (no additional blocks). Useful for learning complex tabular feature interactions before fusion.
fc_do – Dropout probability for MLP-residual blocks. Only used if layers[0] > 0.
fc_dim – Hidden dimension for MLP-residual blocks. Options: - None: uses input_dim (default behavior) - “auto”: computes closest power of 2 to 4x input_dim - int: explicit dimension When set, first block projects from input_dim to fc_dim, and subsequent blocks maintain fc_dim.