05 – Image Tutorial: Hot Dog or Not?

In this tutorial, we will be using EIR to train deep learning models for image classification. Specifically, we will be training our models in the important task of classifying whether an image contains a hot dog or not We will be using a subset of the Food-101 dataset, originally introduced here To download the data and configurations for this part of the tutorial, use this link.

Note that this tutorial assumes that you are already familiar with the basic functionality of the framework (see 01 – Genotype Tutorial: Ancestry Prediction). If you have not already, it can also be useful to go over the sequence tutorial (see 03 – Sequence Tutorial: Movie Reviews and Peptides).

A - Baseline

eir_tutorials/a_using_eir/05_image_tutorial/
├── conf
│   ├── globals.yaml
│   ├── inputs.yaml
│   ├── inputs_efficientnet_b0.yaml
│   ├── inputs_resnet18.yaml
│   └── output.yaml
└── data
    └── hot_dog_not_hot_dog
        ├── food_images
        └── labels.csv

Looking at the data we are working with, we can indeed see that it contains images of hot dogs and all kinds of other food:

I did not know drinking coffee/cacao with hot dogs was a thing. Anyway, now we will train a simple residual network from scratch to get a little baseline. The image models we be using come from the excellent timm library, which includes those used in this tutorial and many more!

To the configuration!

globals.yaml

output_folder: eir_tutorials/tutorial_runs/a_using_eir/tutorial_05_is_it_a_hot_dog
valid_size: 0.10
device: "mps"
batch_size: 32
n_saved_models: 1
dataloader_workers: 0
checkpoint_interval: 100
sample_interval: 100
n_epochs: 200
memory_dataset: True
max_attributions_per_class: 10
compute_attributions: True
mixing_alpha: 0.5
plot_skip_steps: 0

inputs.yaml

input_info:
  input_source: eir_tutorials/a_using_eir/05_image_tutorial/data/hot_dog_not_hot_dog/food_images
  input_name: hot_dog
  input_type: image

input_type_info:
  mixing_subtype: "cutmix"
  size:
    - 64

model_config:
  model_type: "ResNet"
  model_init_config:
    layers: [1, 1, 1, 1]
    block: "BasicBlock"

interpretation_config:
    num_samples_to_interpret: 30

output.yaml

output_info:
        output_source: eir_tutorials/a_using_eir/05_image_tutorial/data/hot_dog_not_hot_dog/labels.csv
        output_name: hot_dog_output
        output_type: tabular

output_type_info:
        target_cat_columns:
                - CLASS

As usually, we do our training with the following command:

eirtrain \
--global_configs eir_tutorials/a_using_eir/05_image_tutorial/conf/globals.yaml \
--input_configs eir_tutorials/a_using_eir/05_image_tutorial/conf/inputs.yaml \
--output_configs eir_tutorials/a_using_eir/05_image_tutorial/conf/output.yaml

Note

Training these deep image models can take quite some time if one is using a laptop. If possible, try using a system with a GPU available!

Now for the results, we see the following:

../../_images/05_image_training_curve_ACC_resnet_1.png

That looks kind of ok, but far from great. Our validation performance is all over the place (a contributing factor could be that our validation set here is very small), and we don’t get a better performance than around 76%. Certainly not good enough for an actual app!

B - Pretrained Image Model

Now we will take advantage of the fact that there exist pretrained models that have been trained on a bunch of data (not just a few pictures of hot dogs and other food) and see whether that helps our performance.

Now our input configuration looks like this:

inputs_resnet18.yaml

input_info:
  input_source: eir_tutorials/a_using_eir/05_image_tutorial/data/hot_dog_not_hot_dog/food_images
  input_name: hot_dog_resnet18
  input_type: image

input_type_info:
  mixing_subtype: "cutmix"
  size:
    - 64

model_config:
  model_type: "resnet18"
  pretrained_model: True

interpretation_config:
    num_samples_to_interpret: 30

To train, we run:

eirtrain \
--global_configs eir_tutorials/a_using_eir/05_image_tutorial/conf/globals.yaml \
--input_configs eir_tutorials/a_using_eir/05_image_tutorial/conf/inputs_resnet18.yaml \
--output_configs eir_tutorials/a_using_eir/05_image_tutorial/conf/output.yaml \
--globals.output_folder=eir_tutorials/tutorial_runs/a_using_eir/tutorial_05_is_it_a_hot_dog_pretrained_resnet

Looking at our performance, we see:

../../_images/05_image_training_curve_ACC_resnet_pretrained_1.png

Definitely better! One factor here could be that we are training on different image sizes than the original model was trained on. In any case, let’s have a look at what our models are focusing on when deciding something is not a hot dog. (perhaps you already noticed we set the compute_attributions value to True in the global configuration):

../../_images/pretrained_resnet_not_hot_dog_attributions.png

That is not a hot dog alright, and our model seems to agree.

C - Combining pretrained image models

For the last part of this tutorial, we will be combining two pretrained models. We will keep the ResNet18 models as it is, feeding it 64 pixel images. We will also add a EfficientNet-B0 feature extractor, but feed it 224 pixel images.

The configuration for the EfficientNet part looks like this:

inputs_efficientnet_b0.yaml

input_info:
  input_source: eir_tutorials/a_using_eir/05_image_tutorial/data/hot_dog_not_hot_dog/food_images
  input_name: hot_dog_efficientnet
  input_type: image

input_type_info:
  mixing_subtype: "cutmix"
  size:
    - 224

model_config:
  model_type: "efficientnet_b0"
  pretrained_model: True

interpretation_config:
    num_samples_to_interpret: 30

Training as usual, notice that we are now passing in both input configurations to the --input_configs flag.

eirtrain \
--global_configs eir_tutorials/a_using_eir/05_image_tutorial/conf/globals.yaml \
--input_configs eir_tutorials/a_using_eir/05_image_tutorial/conf/inputs_efficientnet_b0.yaml eir_tutorials/a_using_eir/05_image_tutorial/conf/inputs_resnet18.yaml \
--output_configs eir_tutorials/a_using_eir/05_image_tutorial/conf/output.yaml \
--globals.output_folder=eir_tutorials/tutorial_runs/a_using_eir/tutorial_05_is_it_a_hot_dog_pretrained_combined

Note

Here we are maybe getting ahead of ourselves a little and going straight into combining models. Perhaps only using EfficientNet performs even better. I will leave that task to you, dear reader.

The training and validation curves I got look like so (I got a bit impatient and stopped the run early):

../../_images/05_image_training_curve_ACC_combined_pretrained_1.png

Definitely looks more stable, and better performance than before. As mentioned earlier, we should be careful about trusting these results too much as we have a tiny validation set, but since we are doing a tutorial, we’ll allow it!

For the last part of this tutorial, let’s have a look at what the our features extractors focus on for an example image.

First the ResNet18 feature extractor:

../../_images/pretrained_combined_resnet_not_hot_dog_attributions.png

And then the EfficientNet-B0 feature extractor:

../../_images/pretrained_combined_efficientnet_not_hot_dog_attributions.png

While it’s definitely more clear to the human eye in the ResNet18 case, both feature extractors seem to be focusing on the french fries when deciding that this is indeed, not a hot dog.

D - Serving

In this final section, we demonstrate serving our trained image classification model as a web service and interacting with it using HTTP requests.

Starting the Web Service

To serve the model, use the following command:

eirserve --model-path [MODEL_PATH]

Replace [MODEL_PATH] with the actual path to your trained model. This command initiates a web service that listens for incoming requests.

Here is an example of the command:

eirserve \
--model-path eir_tutorials/tutorial_runs/a_using_eir/tutorial_05_is_it_a_hot_dog_pretrained_combined/saved_models/tutorial_05_is_it_a_hot_dog_pretrained_combined_model_400_perf-average=0.9857.pt

Sending Requests

With the server running, we can now send image-based requests. For this model, we send encoded images to different feature extraction endpoints.

Here’s an example Python function demonstrating this process:

import requests
import base64
from PIL import Image
from io import BytesIO

def encode_image_to_base64(file_path: str) -> str:
    with Image.open(file_path) as image:
        buffered = BytesIO()
        image.save(buffered, format="JPEG")
        return base64.b64encode(buffered.getvalue()).decode("utf-8")

def send_request(url: str, payload: dict):
    response = requests.post(url, json=payload)
    return response.json()

payload = {
    "hot_dog_efficientnet": encode_image_to_base64("path/to/image1.jpg"),
    "hot_dog_resnet18": encode_image_to_base64("path/to/image1.jpg")
}

response = send_request('http://localhost:8000/predict', payload)
print(response)

Additionally, you can send requests using bash. Note that this requires preparing the base64-encoded image content in advance:

curl -X 'POST' \\
  'http://localhost:8000/predict' \\
  -H 'accept: application/json' \\
  -H 'Content-Type: application/json' \\
  -d '{
      "hot_dog_efficientnet": "[BASE64_ENCODED_IMAGE]",
      "hot_dog_resnet18": "[BASE64_ENCODED_IMAGE]"
  }'

Analyzing Responses

Before we going into the responses, let’s view the images that were used for predictions:

1040579.jpg

108743.jpg

After sending requests to the served model, the responses can be analyzed. These responses provide insights into the model’s predictions based on the input images.

predictions.json

[
    {
        "request": {
            "hot_dog_efficientnet": "eir_tutorials/a_using_eir/05_image_tutorial/data/hot_dog_not_hot_dog/food_images/1040579.jpg",
            "hot_dog_resnet18": "eir_tutorials/a_using_eir/05_image_tutorial/data/hot_dog_not_hot_dog/food_images/1040579.jpg"
        },
        "response": {
            "result": {
                "hot_dog_output": {
                    "CLASS": {
                        "Hot Dog": 0.8565942049026489,
                        "Not Hot Dog": 0.14340578019618988
                    }
                }
            }
        }
    },
    {
        "request": {
            "hot_dog_efficientnet": "eir_tutorials/a_using_eir/05_image_tutorial/data/hot_dog_not_hot_dog/food_images/108743.jpg",
            "hot_dog_resnet18": "eir_tutorials/a_using_eir/05_image_tutorial/data/hot_dog_not_hot_dog/food_images/108743.jpg"
        },
        "response": {
            "result": {
                "hot_dog_output": {
                    "CLASS": {
                        "Hot Dog": 0.07436760514974594,
                        "Not Hot Dog": 0.9256323575973511
                    }
                }
            }
        }
    }
]

With that, we conclude this image tutorial. Thank you for reading!