06 – Training on binary data
Today, for this tutorial, we will be training deep learning models on raw binary data. In general, it is a good approach to use inductive bias and domain expertise when training our models, but sometimes we might not have a good idea of how to present our data, or we simply want to turn off our brains for a bit and throw raw compute at our problem. We will be using the familiar IMDB reviews dataset, see here for more information about the data. To download the data and configurations for this part of the tutorial, use this link.
A - Local Transformer
After downloading the data, the folder structure should look like this:
eir_tutorials/a_using_eir/06_raw_bytes_tutorial/
├── conf
│ ├── globals.yaml
│ ├── input.yaml
│ └── output.yaml
└── data
└── IMDB
├── IMDB_Reviews
├── conf
├── imdb.vocab
└── imdb_labels.csv
We will use the
built-in local transformer model
in EIR
for this tutorial.
If you have done the previous tutorials you might be used to this, but the configurations are here:
output_folder: eir_tutorials/tutorial_runs/a_using_eir/tutorial_06_imdb_sentiment_binary
valid_size: 0.10
n_saved_models: 1
device: "mps"
checkpoint_interval: 1000
sample_interval: 1000
dataloader_workers: 0
memory_dataset: true
n_epochs: 50
mixing_alpha: 0.5
input_info:
input_source: eir_tutorials/a_using_eir/03_sequence_tutorial/data/IMDB/IMDB_Reviews
input_name: imdb_reviews_bytes_base_transformer
input_type: bytes
input_type_info:
sampling_strategy_if_longer: "uniform"
max_length: 1024
model_config:
model_type: sequence-default
window_size: 128
embedding_dim: 64
pool: avg
position: "embed"
model_init_config:
num_layers: 4
num_heads: 8
output_info:
output_source: eir_tutorials/a_using_eir/03_sequence_tutorial/data/IMDB/imdb_labels.csv
output_name: imdb_output
output_type: tabular
output_type_info:
target_cat_columns:
- Sentiment
Note
The model we are training here is relatively deep, so you probably need a GPU to train it in a reasonable amount of time. If you do not have access to a GPU, try reducing the number of layers and the sequence length.
As usual, we can run the following command to train:
eirtrain \
--global_configs eir_tutorials/a_using_eir/06_raw_bytes_tutorial/conf/globals.yaml \
--input_configs eir_tutorials/a_using_eir/06_raw_bytes_tutorial/conf/input.yaml \
--output_configs eir_tutorials/a_using_eir/06_raw_bytes_tutorial/conf/output.yaml
When training, I got the following training curves:
Not so great, but not a complete failure either! When comparing with our previous modelling on this task (see 03 – Sequence Tutorial: Movie Reviews and Peptides), we definitely performed better when doing word level modelling compared to running on the raw bytes like we are doing here. It can well be we need to configure our model better, or train it on more data, but for now we will say that adapting the training to the task (in this case NLP) seems to perform better than training on raw binary data.
Tip
Here we are training on natural language data, but the approach here can in theory be applied to any type of file on a disk (e.g. images, videos, or other more obscure formats). As we saw above however, good results not guaranteed!
B - Serving
In this section, we’ll guide you through serving our t rained IMDB Reviews Bytes Classification model as a web service and show you how to interact with it using HTTP requests.
Starting the Web Service
To serve the model, execute the following command:
eirserve --model-path [MODEL_PATH]
Replace [MODEL_PATH] with the actual path to your trained model. This command initiates a web service that listens for incoming HTTP requests.
Here is an example of the command used:
eirdeploy \
--model-path eir_tutorials/tutorial_runs/a_using_eir/tutorial_06_imdb_sentiment_binary/saved_models/tutorial_06_imdb_sentiment_binary_model_15000_perf-average=0.5741.pt
Sending Requests
Once the server is up and running, you can send requests to it. For this binary model, we send text data in byte format to the model’s endpoint.
Here’s an example Python function to demonstrate how to send a request:
import requests
import numpy as np
import base64
def load_and_encode_data(data_pointer: str) -> str:
arr = np.fromfile(data_pointer, dtype="uint8")
arr_bytes = arr.tobytes()
return base64.b64encode(arr_bytes).decode("utf-8")
def send_request(url: str, encoded_data: str):
payload = {"data": encoded_data}
response = requests.post(url, json=payload)
return response.json()
encoded_data = load_and_encode_data('path/to/textfile.txt')
response = send_request('http://localhost:8000/predict', encoded_data)
print(response)
Analyzing Responses
After sending requests to the served model, you will receive responses that provide insights into the model’s predictions based on the input text data.
Let’s take a look at some of the text data used for predictions:
The worst movie I have seen since Tera Jadoo Chal Gaya. There is no story, no humor, no nothing! The action sequences seem more like a series of haphazard Akshay Kumar Thumbs-Up advertisements stitched together. Heavily influenced from The Matrix and Kung-Fu Hustle but very poorly executed.<br /><br />I did not go a lot of expectations, but watching this movie is an exasperating experience which makes you wonder "What were these guys thinking??!!".<br /><br />The only thing you might remember after watching it is an anorexic Kareena in a bikini.<br /><br />The reason why I did not give a rating of '1' is that every time I think I have seen the worst, Bollywood proves me wrong.
In this first episode of Friends, we are introduced to the 6 main characters of the series: Monica Geller,Phoebe Buffay,Chandler Bing,Ross Geller, Joey Tribbiani and eventually Rachel Green .<br /><br />We discover that Rachel, a rich girl that is Monica's friend from high school times, left her fiancé, Barry, at the altar, since she discovered she didn't love him. She also decides to live with Monica and become independent from her father,getting a new job as a waitress in Central Perk.<br /><br />Ross, for the other hand,discovered his wife is a lesbian and lost her for Susan, her partner. (We see him moving to a new apartment during the episode)<br /><br />Monica, in this episode, makes out (and eventually sleeps) with Paul "the wine guy", who gave her the excuse of being impotent since he divorced his wife. But in reality, he was just deceiving her.<br /><br />Ps: I just loooove Joey's and Chandler's haircuts in this first season! =)
Here are examples of the model’s predictions:
[
{
"request": {
"imdb_reviews_bytes_base_transformer": "eir_tutorials/a_using_eir/03_sequence_tutorial/data/IMDB/IMDB_Reviews/10021_2.txt"
},
"response": {
"result": {
"imdb_output": {
"Sentiment": {
"Negative": 0.7403308749198914,
"Positive": 0.25966906547546387
}
}
}
}
},
{
"request": {
"imdb_reviews_bytes_base_transformer": "eir_tutorials/a_using_eir/03_sequence_tutorial/data/IMDB/IMDB_Reviews/10132_9.txt"
},
"response": {
"result": {
"imdb_output": {
"Sentiment": {
"Negative": 0.22369135916233063,
"Positive": 0.7763086557388306
}
}
}
}
}
]
This concludes our tutorial, thank you for following along!