What Are Transformers? An Introduction with Hugging Face

AI Engineering has a lot of buzzwords that get thrown around. And if you're like me, you usually have an ongoing note somewhere with an ever-growing list of them. Two that I hear often are Transformers and Hugging Face Transformers.

In this post, I'll peel back the dark shroud to explain what both Transformers are and what Hugging Face Transformers is.

What Are Transformers?

At a high level, Transformers refers to a neural network architecture. It has become the foundation of most modern AI language models.

So what makes it significant?

The architecture was introduced in the 2017 paper "Attention is All You Need" by researchers at Google. What they accomplished was replacing recurrent neural networks (RNNs) with an "attention mechanism" that allows the model to focus on various parts of the input sequence simultaneously.

The architecture has two main components:

Encoder: Processes the input sequence and creates representations
Decoder: Generates the output sequence based on those representations

The encoder-decoder structure in Transformer models is fundamental for tasks like translation, where one sequence (e.g., English) is converted into another (e.g., French).

For example, it can translate the English sentence "I want to buy a car" to "Je veux acheter une voiture."

At the heart of the attention mechanism is the ability to weigh the importance of different words or tokens while processing the sequence, rather than processing them one by one—vastly different from how it was done before.

How Attention Works

The attention mechanism can be broken down into three sub-processes:

Weight Assignment: The attention mechanism first assigns importance scores or weights to the various parts of the input.
Contextual Focus: The model then builds a "context vector" based on the calculated score, which is a weighted sum of the input elements. This highlights the most important pieces of information.
Enhanced Output: The context vector is then used by the model to produce its next output.

Hugging Face in Practice

Now that we have some background on what exactly Transformers is, let's get into how we can leverage Hugging Face Transformers to build on it.

First, for those who don't know: Hugging Face is a community-driven platform offering pre-trained AI models, datasets, and tools for NLP, computer vision, and audio.

Along with providing those services, they also provide an open-source Python library named Hugging Face Transformers that provides easy access to and implementation of various pre-trained models based on the Transformers architecture. It provides:

Pre-trained models: A collection of Transformer models (BERT, GPT, RoBERTa, etc.)
Unified API: Access to an API for interacting with models across various deep learning frameworks like PyTorch and TensorFlow
Fine-Tuning Tools: Functionality to adapt pre-trained models to specific tasks or datasets with relatively small amounts of data
Tokenizers: Tools to prepare raw text data for input into Transformer models

The Transformer architecture is the blueprint, and the Hugging Face Transformers library is the comprehensive toolkit that allows researchers and developers to easily build, train, and deploy models following that blueprint.

Code Example: Sentiment Analysis

Now that we have the background of what Hugging Face Transformers is, let's see it in action.

Before you get started, ensure that you have Hugging Face Transformers installed. You can install it via pip:

pip install transformers

Once the library is installed, we can use the code below to load a pre-trained pipeline and pass any text into it to get a sentiment score:

from transformers import pipeline

# Load a pre-trained sentiment-analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis")

# Try it out
text = "Transformers have completely changed the AI landscape!"
result = sentiment_analyzer(text)[0]

print(f"Label: {result['label']}, Confidence: {round(result['score'], 4)}")

# Example output:
# Label: POSITIVE, Confidence: 0.9997

The key takeaway is how simple it is: a sentiment analyzer can be set up in about six lines of code.

The pipeline automatically picks a suitable pre-trained model for sentiment analysis and can be customized if needed. The library covers text, vision, and audio models, with support for PyTorch, TensorFlow, and JAX.

This could just be the base of what you can build. Once this is in place, you could simply add a FastAPI endpoint in front of it and have a REST endpoint that handles sentiment analysis for you.

Gone are the days where it would take months gathering, cleaning, and annotating data to train a model. Now you get the same results with a few lines of code using the Transformers library.

Wrapping Up

Transformers have fundamentally shifted how we build AI, and Hugging Face has made this power widely accessible. Whether you're exploring NLP, computer vision, or audio, you now have the building blocks at your fingertips.