How to create and use Transformer models with HuggingFace’s AutoModel

Natural Language Processing, PTIT , Department of Computer Science, 2024

This section explains how to create and use Transformer models with HuggingFace’s AutoModel and other related classes. The AutoModel class is a convenient wrapper that can automatically determine and load the appropriate model architecture based on a checkpoint. However, if you already know which model type you need, such as BERT, you can directly instantiate it using its respective class like BertModel.

Creating a Transformer Model: You can initialize a model with a configuration using BertConfig. This configuration contains various model attributes such as hidden size, number of layers, and attention heads. Initially, the model is randomly initialized and needs to be trained.
Loading Pretrained Models: To avoid training from scratch, which is time-consuming and resource-intensive, you can load a pretrained model using the from_pretrained() method. This allows you to reuse models trained by others, like the popular BERT model (bert-base-cased). Using AutoModel instead of a specific model class (like BertModel) makes your code more checkpoint-agnostic, meaning it can adapt to different architectures.
Saving Models: After using or fine-tuning a model, you can save it with the save_pretrained() method, which saves both the configuration (config.json) and model weights (pytorch_model.bin). These files can be used to reload the model later.
Using Models for Inference: Models require input in the form of tokenized numbers (input IDs). Tokenizers convert sequences like “Hello!” into a list of integers that can be fed into the model as tensors. Once the input is tokenized and transformed into tensors, you can pass it to the model for predictions.

Here’s a summary with code examples demonstrating different use cases for creating, loading, saving, and using Transformer models with HuggingFace.

Example 1: Creating a Transformer Model from Configuration

In this example, we create a BERT model using the BertConfig and BertModel classes.

from transformers import BertConfig, BertModel

# Create a BERT configuration
config = BertConfig()

# Initialize a BERT model using the configuration
model = BertModel(config)

# Print the configuration details
print(config)

Output:

BertConfig {
  "hidden_size": 768,
  "intermediate_size": 3072,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  ...
}

Example 2: Loading a Pretrained Model

Instead of training from scratch, we load a pretrained BERT model using the from_pretrained() method.

from transformers import BertModel

# Load a pretrained BERT model
model = BertModel.from_pretrained("bert-base-cased")

# Print the model architecture
print(model)

Output:

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(28996, 768, padding_idx=0)
    ...
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0): BertLayer(...)
      ...
    )
  )
  (pooler): BertPooler(...)
)

Example 3: Using `AutoModel` for Flexible Loading

To make the code more flexible and adaptable to different checkpoints and architectures, use AutoModel.

from transformers import AutoModel

# Load a model using AutoModel (architecture-agnostic)
model = AutoModel.from_pretrained("bert-base-cased")

# Print the model details
print(model)

This ensures that the model is automatically loaded based on the checkpoint you specify.

Example 4: Saving a Model

Once a model is loaded or fine-tuned, you can save it to a specific directory using the save_pretrained() method.

# Save the model to a directory
model.save_pretrained("my_model_directory")

# Check the contents of the directory
!ls my_model_directory

Output:

config.json    pytorch_model.bin

Example 5: Using a Model for Inference

Here’s an example of how to use a BERT model for making predictions by converting text into tokens (numbers) that the model can process.

from transformers import BertTokenizer
import torch

# Load the tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
model = BertModel.from_pretrained("bert-base-cased")

# Example input text
sequences = ["Hello!", "Cool.", "Nice!"]

# Convert text into token IDs
encoded_sequences = tokenizer(sequences, return_tensors="pt", padding=True, truncation=True)

# Pass the encoded inputs to the model
output = model(**encoded_sequences)

# Print the model output
print(output)

Output:

BaseModelOutputWithPoolingAndCrossAttentions(
    last_hidden_state=tensor([[[-0.0505,  0.0244,  0.0584,  ..., -0.1566,  0.0499,  0.0146],
         [-0.0901, -0.0601,  0.0582,  ..., -0.2255, -0.0472,  0.1543],
         [-0.0577,  0.0153,  0.1073,  ..., -0.1121,  0.0168,  0.1146]],
        ...
    pooler_output=tensor([[-0.6588, -0.5088,  0.6350,  ..., -0.0093,  0.3687,  0.4415]]), 
)

Example 6: Converting Text to Tokens

The tokenizer is responsible for converting text into token IDs. Here’s an example of how to tokenize text for inference.

# Example text input
sequences = ["Hello!", "Cool.", "Nice!"]

# Tokenize the sequences into input IDs
encoded_sequences = tokenizer(sequences)

# Print the tokenized input (IDs)
print(encoded_sequences["input_ids"])

Output:

[[101, 7592, 999, 102], [101, 4658, 1012, 102], [101, 3835, 999, 102]]

These are the token IDs for the input text, which the model will use for inference.

Summary

Creating Models: You can instantiate a model using its configuration or load a pretrained one.
Loading Models: Pretrained models can be loaded with the from_pretrained() method.
Saving Models: Save models to a directory using save_pretrained().
Inference: Convert text to tokens using a tokenizer and pass them to the model for predictions.
Using AutoModel: Make your code flexible and adaptable to various architectures by using AutoModel.

Share on

Twitter Facebook LinkedIn

Japhari Mbaru

Example 1: Creating a Transformer Model from Configuration

Example 2: Loading a Pretrained Model

Example 3: Using AutoModel for Flexible Loading

Example 4: Saving a Model

Example 5: Using a Model for Inference

Example 6: Converting Text to Tokens

Summary

Share on

Example 3: Using `AutoModel` for Flexible Loading