Frequently Asked Questions
This guide addresses common questions and issues users encounter when working with EIR, based on real user experiences.
Table of Contents
Attribution Analysis
Q: How do I enable attribution analysis during training?
A: Add the following to your global configuration file:
attribution_analysis:
compute_attributions: true
max_attributions_per_class: 100 # Samples per class to analyze
attributions_every_sample_factor: 4 # Compute every 4th evaluation
Note: Attribution calculations are computationally expensive, especially with many output targets. Consider:
Using higher
attributions_every_sample_factorvalues (e.g., 4 or 8) to reduce computationRunning attributions only on your best model after training
Allocating more computational resources when using attributions
Q: What do the attribution values mean?
A: Attribution values represent the average influence of each feature on the model’s raw output.
Values are not normalized to sum to 1 by default
They show the feature importance using Integrated Gradients method
Higher absolute values indicate stronger influence
Can be positive (increases output) or negative (decreases output)
To convert to percentage contributions, you could for example:
Check the
feature_importance.csvfile in the attributions folderCalculate the mean attribution for each feature
Normalize to sum to 1 for relative importance
Model Performance
Q: My model starts overfitting very quickly. What can I do?
A: Try these strategies, organized by where they are configured:
Reduce batch size (in global configuration):
# In your global configuration file basic_experiment: batch_size: 32 # Reduce from default 64
Add regularization via mixing (in global configuration):
# In your global configuration file training_control: mixing_alpha: 0.2 # Mixup augmentation (0.0-1.0)
Adjust learning rate and weight decay (in global configuration):
# In your global configuration file optimization: lr: 0.0001 # Reduce from default 0.0003 wd: 0.001 # Increase weight decay from default 0.0001
Enable early stopping (in global configuration):
# In your global configuration file training_control: early_stopping_patience: 10 early_stopping_buffer: 2000 # Optional: wait before checking
Increase dropout in fusion module (in fusion configuration):
# In your fusion configuration file model_type: mlp-residual model_config: rb_do: 0.25 # Residual block dropout (default 0.1) fc_do: 0.25 # Final layer dropout (default 0.1) stochastic_depth_p: 0.2 # Stochastic depth (default 0.1)
Add dropout to output models (in output configuration):
# In your output configuration file (for tabular outputs) output_type_info: # ... other settings ... model_config: model_init_config: rb_do: 0.25 fc_do: 0.25 stochastic_depth_p: 0.2
For classification tasks, use label smoothing (in output configuration):
# In your output configuration file output_type_info: target_cat_columns: - target_column cat_label_smoothing: 0.1 # Smooths one-hot labels
For tabular inputs, add L1 regularization (in input configuration):
# In your tabular input configuration file model_config: model_type: tabular model_init_config: l1: 0.0001 # L1 penalty on embeddings
For image inputs, use data augmentation (in input configuration):
# In your image input configuration file input_type_info: mixing_subtype: "cutmix" # or "mixup" # Also uses standard augmentations by default
Reduce model complexity:
Reduce layers or hidden dimensions, fusion example shown below:
# In fusion configuration model_config: layers: [1] # Reduce from default [2] fc_task_dim: 128 # Reduce from default 256
Q: When should I stop training? How do I know my model is “good enough”?
A: Consider these indicators:
Check validation curves: Look for plateau or degradation in validation performance
Performance gap: Large gap between train/validation indicates overfitting
Task requirements: Compare performance to your domain-specific needs
Convergence: If performance is still improving at max epochs, increase
n_epochsMultiple metrics: Check ROC-AUC, MCC, and confusion matrices, not just loss
Look at files like training_curve_ROC-AUC-MACRO.pdf and
training_curve_PERF-AVERAGE.pdf in your results folder.
Q: What is this “average performance” metric?
A: EIR uses a single “average performance” metric to track overall model performance across potentially multiple tasks and output types. This metric is crucial as it controls:
Early stopping: Training stops when this metric doesn’t improve
Model checkpointing: Models are saved based on this metric (visible in filenames like
model_1000_perf-average=0.8547.pt)Learning rate scheduling: LR reduction on plateau uses this metric
How it’s calculated:
For categorical outputs (classification):
Default: Average of MCC, ROC-AUC-MACRO, and AP-MACRO
Each metric contributes equally to the average
Higher is better (range 0-1)
For continuous outputs (regression):
Default: Average of (1.0 - LOSS), PCC, and R²
Note: Loss is inverted so higher is better
Each metric contributes equally to the average
For other outputs (sequence/image generation):
Uses 1.0 - LOSS by default
Higher is better
For multi-task learning:
Averages across all tasks
Each task contributes equally regardless of output type
Customizing the metric:
You can choose which metrics to include in the average:
# In your global configuration file
metrics:
cat_averaging_metrics:
- mcc
- roc-auc-macro
# Omit 'ap-macro' to exclude it
con_averaging_metrics:
- r2
- pcc
# Omit 'loss' to exclude it
Example interpretation:
model_950_perf-average=0.3114.pt: Model at iteration 950 with average performance of 0.3114model_2000_perf-average=0.8547.pt: Model at iteration 2000 with average performance of 0.8547 (better)
Important notes:
This metric is computed on the validation set
A higher value always indicates better performance
For imbalanced datasets, this averaging might mask poor performance on rare classes
Missing Data Handling
Q: How does EIR handle missing data?
A: EIR has handling for different types of missing data, both in inputs and outputs:
Input Data - Missing Values Within a Tabular Modality:
For partially missing data within a modality (e.g., some NaN values in tabular columns):
Continuous columns: Imputed with the mean from the training set (e.g. will be 0 if data is already mean-normalized before being passed to EIR)
Categorical columns: Encoded as a special
__NULL__categoryNo manual imputation needed - EIR handles this automatically, but you can preprocess if desired
Input Data - Completely Missing Modalities:
When an entire modality is missing for a sample (e.g., no image for a specific ID):
Tabular: Uses the within-modality strategy above
Images: Filled with random noise (Gaussian distribution)
Sequences/Text: Filled with padding tokens
Omics: Filled with zeros (
0values for the one-hot encoding)Arrays: Filled with random noise (Gaussian distribution)
Output Data - Missing Target Values:
EIR excludes NaN values from loss computation - they don’t contribute to backpropagation
Supports partial outputs: Can have some target columns missing for specific samples
No imputation needed: The model learns only from available labels
Best Practices:
Preprocessing: You may still want to filter features/samples with excessive missing values
Example: Multi-modal with Missing Data
ID |
Feature1 |
Feature2 |
Feature3 |
|---|---|---|---|
sample1 |
1.5 |
NaN |
A |
sample2 |
2.3 |
0.8 |
B |
sample3 |
NaN |
1.2 |
NaN |
# images folder, note that sample2 is missing
sample1.jpg
sample3.jpg
EIR will automatically handle the NaN values in Feature2/Feature3 and the missing image for sample2.
Model Architecture
Q: What’s the difference between mlp-residual and regular MLP?
A: The mlp-residual model uses residual blocks with:
Skip connections
Layer normalization via RMSNorm
GELU activation
Stochastic depth option
LayerScale for better training stability
Q: How do I interpret the model architecture?
A: Check model_info.txt in your experiment folder.
Prediction and Configuration
Q: Why does eirpredict require the global_configs file?
A: The global configuration contains settings needed for prediction:
Batch size (might want to increase for faster inference)
Attribution settings (if computing on test set)
Dataloader workers
Other runtime parameters
These aren’t just training parameters - they affect how predictions are computed.
Q: How do I predict on data without labels?
A: Set output_source: null in your output configuration:
output_info:
output_name: my_output
output_source: null # Instead of path to labels
output_type: tabular
output_type_info:
target_cat_columns:
- target_column
Q: Which model checkpoint should I use for predictions?
A: Generally use the model with best validation performance:
Check the filename:
model_950_perf-average=0.3114.ptThe number (950) is the iteration
perf-averageshows the validation performanceHigher is better for most metrics
Data Handling
Q: How should I format time series data?
A: Time series data can for example be formatted as sequences:
ID |
Sequence |
|---|---|
sample1 |
val1 val2 val3 val4 val5 |
sample2 |
val1 val2 val3 val4 val5 |
Configuration example:
input_type_info:
max_length: 48
split_on: " "
sampling_strategy_if_longer: "from_start"
Note you can also have them as separate .txt files, filename being the
sample ID and content being the sequence values.
Validation and Testing
Q: How do I ensemble multiple model runs?
A: For better stability, train multiple models with different seeds:
EIR_SEED=0 eirtrain ...
EIR_SEED=1 eirtrain ...
EIR_SEED=2 eirtrain ...
Then average predictions across models.
Technical Issues and Performance
Q: Attribution analysis makes training very slow. What can I do?
A: Several strategies:
Increase sampling interval:
attributions_every_sample_factor: 8 # or higher
Reduce samples analyzed:
max_attributions_per_class: 50 # instead of 100+
Run post-training: Train without attributions, then run
eirpredictwith attributions enabledAllocate more resources: Increase CPU/RAM allocation on your cluster
Q: How do I reduce training time?
A: Try these optimizations:
Enable model compilation on GPU/CUDA devices:
# In your global configuration file model: compile_model: true
Use mixed precision training (especially on modern GPUs):
# In your global configuration file accelerator: precision: "16-mixed" # or "bf16-mixed" for newer GPUs
Load data into memory (if you have enough RAM):
# In your global configuration file basic_experiment: memory_dataset: true
Increase dataloader workers (for CPU-bound data loading):
# In your global configuration file basic_experiment: dataloader_workers: 8 # Adjust based on CPU cores
Use gradient accumulation (simulate larger batches without more memory):
# In your global configuration file optimization: gradient_accumulation_steps: 4 # Effective batch = batch_size * 4
Reduce evaluation frequency:
# In your global configuration file evaluation_checkpoint: sample_interval: 500 # instead of 200 checkpoint_interval: 500
Feature selection: Use fewer input features based on prior knowledge or attributions
Smaller models: Reduce layers or hidden dimensions in fusion/output configs
Early stopping: Stop when validation performance plateaus
Quick wins for GPU training:
Set
compile_model: trueandprecision: "16-mixed"Use
memory_dataset: trueif your dataset fits in RAMIncrease
dataloader_workersto 2-4
Note: Model compilation may not work with all architectures. Mixed precision can slightly affect model accuracy but usually provides significant speedup with minimal impact.
Need More Help?
Check the official documentation
Review tutorials for specific use cases
For genomics-specific tasks, consider EIR-auto-GP
Examine the generated
model_info.txtfor architecture details