.. _e-pretrained-checkpoint-tutorial: .. role:: raw-html(raw) :format: html 01 – Pretraining, Checkpointing and Continued Training ====================================================== In this tutorial, we will be looking at how to use EIR to create pretrained models, and successively use them for continued training on the same data, as well as partially loading matching layers when changing the model architecture. .. note:: This tutorial assumes you are familiar with the basics of EIR, and have gone through previous tutorials. Not required, but recommended. A - Data -------- We will be using the same dataset we used in the :ref:`03-sequence-tutorial`: the IMDB reviews dataset, and we will be repeating the same task as before, i.e., sentiment classification. See `here `__ for more information about the data. To download the data, `use this link. `__ After downloading the data, the folder structure should look like this: .. literalinclude:: ../tutorial_files/e_pretraining/01_checkpointing/commands/tutorial_folder.txt :language: console B - Training a Model From Scratch --------------------------------- Training follows the same approach as we have seen on other tutorials, starting with the configurations. The global config sets the universal parameters for training: .. literalinclude:: ../tutorial_files/e_pretraining/01_checkpointing/imdb_globals.yaml :language: yaml :caption: imdb_globals.yaml The input config outlines the IMDB dataset's specific structure: .. literalinclude:: ../tutorial_files/e_pretraining/01_checkpointing/imdb_input.yaml :language: yaml :caption: imdb_input.yaml For the output configurations: .. literalinclude:: ../tutorial_files/e_pretraining/01_checkpointing/imdb_output.yaml :language: yaml :caption: imdb_output.yaml Here is the command for training: .. literalinclude:: ../tutorial_files/e_pretraining/01_checkpointing/commands/1_CHECKPOINT_PRETRAIN_IMDB_FROM_SCRATCH.txt :language: console Training Results: .. image:: ../tutorial_files/e_pretraining/01_checkpointing/figures/training_curve_LOSS_1_text_from_scratch.png :width: 100% :align: center So, these training results are nothing too much out of the ordinary, with the training and validation loss both decreasing as training goes on. C - Continuing Training from a Checkpoint ----------------------------------------- Often, you might want to resume training from a previously saved checkpoint. This can be especially useful for reasons such as fine-tuning the model on a different dataset, or resuming a long-running training process after interruption. For this, we can use the ``pretrained_checkpoint`` argument in the global config. Here is how we can do that: .. literalinclude:: ../tutorial_files/e_pretraining/01_checkpointing/commands/2_CHECKPOINTING_IMDB_FROM_PRETRAINED_GLOBAL.txt :language: console .. important:: The argument points towards a saved model file from a previous experiment, and the loading process relies on some saved data from the previous experiment. Therefore, it will likely not work if you try to load a checkpoint that has been moved from the relative path it was saved in. Training Results After Continued Training: .. image:: ../tutorial_files/e_pretraining/01_checkpointing/figures/training_curve_LOSS_2_text_from_global_pretrained.png :width: 100% :align: center From the training curve, it's evident how the model essentially picks up from where it left off as the training loss is already quite low from the start, compared to the previous training from scratch. D - Partial Loading of Matching Layers --------------------------------------- There are scenarios where you might change the architecture of your model but still want to use the pretrained weights for the layers that match. This can be achieved by setting the ``strict_pretrained_loading`` argument to ``False`` in the global config. Below, we will change the dimension of the fully connected layers in the fusion module, but keep the rest of the model the same. .. literalinclude:: ../tutorial_files/e_pretraining/01_checkpointing/commands/3_CHECKPOINTING_IMDB_FROM_PRETRAINED_GLOBAL_NON_STRICT.txt :language: console Results After Partial Loading and Continued Training: .. image:: ../tutorial_files/e_pretraining/01_checkpointing/figures/training_curve_LOSS_3_text_from_global_pretrained_non_strict.png :width: 100% :align: center Notice how the training loss starts at a similar value as when training from scratch, but then more quickly decreases to a lower value, indicating that the model can still benefit from the pretrained weights in the unchanged layers. Thank you for reading!