In this tutorial, we will utilize EIR for image-to-sequence tasks.
Image to Sequence (img-to-seq) models are a type of models that convert an
input image into a sequence of words. This could be useful for tasks like
image captioning, where the model generates a description of the contents of an image.
In this tutorial, we will be generating captions for images using the
COCO 2017 dataset.
When running the command above,
I got the following training curve:
The fact that the validation loss is lower
indicates that the model is likely able to
use the image to improve the quality of the captions.
After training, we can look at some of the generated captions:
While the captions seem to be somewhat related to the images,
they are far from perfect. As the validation loss
is still decreasing, we could train the model
for longer, try a larger model, use larger images,
or use a larger dataset.
Before analyzing the responses, let’s view the images that were used for generating captions:
000000000009.jpg
000000000034.jpg
000000581929.jpg
After sending requests to the served model, the responses can be analyzed.
These responses provide insights into the model’s capability to generate captions for the input images.
[{"request":{"image_captioning":"eir_tutorials/c_sequence_output/03_image_captioning/data/image_captioning/images/000000000009.jpg","captions":""},"response":{"result":{"captions":"A bowl of broccoli and a is on a plate."}}},{"request":{"image_captioning":"eir_tutorials/c_sequence_output/03_image_captioning/data/image_captioning/images/000000000034.jpg","captions":""},"response":{"result":{"captions":"Two zebras standing side by side in a grassy field."}}},{"request":{"image_captioning":"eir_tutorials/c_sequence_output/03_image_captioning/data/image_captioning/images/000000581929.jpg","captions":"A horse"},"response":{"result":{"captions":"A horse and a goat grazing in the grass"}}}]