How to use Hugging Face 🤗 Transformers and pipelines

In this article, we’ll explore how to use Hugging Face 🤗 Transformers library, and in particular pipelines.

With over 1 million hosted models, Hugging Face is THE platform bringing Artificial Intelligence practitioners together.

Its 🤗 Transformers library provides simplified access to transformer models – trained by experts.

Thus, beginners, professionals and researchers can easily use cutting-edge models in their projects.

In a previous article, you learned more about Hugging Face and its 🤗 Transformers library. We explored the company’s purpose and the added value it brings to the field of AI.

Today, you’re going to find out how to use the 🤗 Transformers library concretely, using pipelines.

Pipelines – Hugging Face 🤗 Transformers

Definition

A pipeline in 🤗 Transformers refers to a process where several steps are followed in a precise order to obtain a prediction from a model.
Glossary

These steps can include data preparation, feature extraction and normalization.

You can think of it as a toolbox 🛠️ that automates complex tasks for you.

The main advantage of the pipeline is its ease of use. It requires just one line of code to load and another line of code to use.

🤗 Transformers offers a clear and intuitive API. So even if you’re not a Machine Learning expert, you can use these cutting-edge models.

You’ll be able to solve a variety of tasks such as Named Entity Recognition (NER), sentiment analysis, object detection, and many more.

The pipeline is an abstraction, meaning that even if on the surface you only use one line of code, in depth, many levers will be triggered.

Depending on the parameters you select, the pipeline will use TensorFlow, PyTorch or JAX to build and provide you with a model.

I recommend this great article if you’d like to delve deeper into the subject.

Speaking of code and parameters, let’s take a closer look at pipelines.

Import – Hugging Face 🤗 Transformers

To install the 🤗 Transformers library, simply use the following command in your terminal:

pip install transformers

Note: if you’re working directly on a notebook, you can use !pip install transformers to install the library from your environment.

Once the library is installed, here’s how to import a pipeline into Python:

from transformers import pipeline

To use it, simply call pipeline(), specifying the required parameters in brackets.

The first thing to note is that you can specify the task you wish to perform using the task parameter.

You can simply choose the task you’re interested in, and the pipeline will do the rest for you.

Note: we’ll go into more detail about the different tasks that can be performed by the pipeline in the following sections of this article.

In addition to task, other parameters can be modulated to adapt the pipeline to your needs.

Parameter selection – Hugging Face 🤗 Transformers

🤗 Transformers gives you the flexibility to choose the model you want to use.

If you have a preference, you can specify it using the model parameters. If you don’t specify one, the pipeline will automatically use the default model for the selected task.

The config parameter lets you customize the model configuration. For example, for text generation models, you can set the maximum number of characters generated.

The tokenizer parameter, used for NLP tasks, and feature_extractor, used for computer vision and multimodal tasks, manage data encoding. Again, if you don’t specify anything, the pipeline will automatically choose the appropriate default values.

The choice of framework, whether PyTorch framework=pt or TensorFlow framework=tf, is also supported.

In addition to these key parameters, the 🤗 Transformers pipeline offers several additional options to customize your use.

For example, the device parameter lets you define the processor on which the pipeline will run: CPU or GPU.

Using these parameters, you can easily adapt the 🤗 Transformers pipeline to your specific needs.

This makes access to state-of-the-art models for a variety of tasks more open than ever.

Now that we’ve seen the main pipeline parameters, let’s take a concrete look at how to use this class on :

Natural Language Processing
Computer Vision
Traitement Audio
Multi-Modal

Natural Language Processing

A multitude of tasks

Natural Language Processing refers to text treatment using Artificial Intelligence.

Thanks to 🤗 Transformers pipelines, we can perform a multitude of tasks in this field:

conversational: textual communication with virtual agents or chatbots
fill-mask: replacing “masks” with words or phrases in text to complete or generate content
ner: Named Entity Recognition – identification and classification of named entities (such as names of people, organizations or places) in text
question-answering: answering textual questions by extracting information from a given text
summarization: reducing the length of a text while preserving relevant information to create a concise summary
table-question-answering: answering questions using tabular data as a source of information
text2text-generation: text generation from source text, which can include tasks such as translation, rewriting or paraphrasing.
text-classification: assigning a predefined category or label to a text based on its content.
text-generation: creation of text from scratch or by adding to existing text
token-classification: assigning labels to each token (unit of text, usually a word or character) in a text, often used for semantic annotation or entity detection (NER)
translation: automatic translation

It’s easy to use a pipeline to perform each of these tasks.

Usage

For example, here I’m using a pipeline for a text-classification task in just 3 lines of code:

from transformers import pipeline

pipe = pipeline("text-classification")
pipe("This restaurant is awesome")

Output: [{'label': 'POSITIVE', 'score': 0.9998743534088135}]

The pipeline loads a default model automatically (here, the model loaded is distilbert-base-uncased-finetuned-sst-2-english). What’s more, the pipeline can process plain text directly, as it supports NLP preprocessing in addition to prediction.

Starting with the sentence “This restaurant is awesome”, the model predicts a positive tone from the author.

It is also possible to process several pieces of data at the same time. To do this, we simply pass a list of strings to the pipeline:

pipe(["This restaurant is awesome", "This restaurant is awful"])

Output:
[{'label': 'POSITIVE', 'score': 0.9998743534088135}, {'label': 'NEGATIVE', 'score': 0.9996669292449951}]

Here, the default loaded model perfectly determines the polarity of our sentences: positive or negative.

By the way, if your goal is to master Deep Learning - I've prepared the Action plan to Master Neural networks. for you.

7 days of free advice from an Artificial Intelligence engineer to learn how to master neural networks from scratch:

Plan your training
Structure your projects
Develop your Artificial Intelligence algorithms

I have based this program on scientific facts, on approaches proven by researchers, but also on my own techniques, which I have devised as I have gained experience in the field of Deep Learning.

To access it, click here :

GET MY ACTION PLAN

Now we can get back to what I was talking about earlier.

Computer vision

Computer Vision is the processing of images and image sequences (video) by Artificial Intelligence.

Here are the main tasks that 🤗 Transformers pipelines can solve:

depth-estimation: estimating the depth or distance of objects in an image
image-classification: classification of an image into a predefined class or prediction of its contents
image-segmentation: division of an image into regions or segments in order to identify the objects present in the image
object-detection: location and classification of specific objects in an image
video-classification: classification of a video sequence into a given class according to its content
zero-shot-image-classification: classification of an image into a class not seen during model training
zero-shot-object-detection: location and classification of specific objects in an image, including objects not seen during model training

Let’s illustrate the simplicity of the pipeline with a depth-estimation task.

First, let’s import an image :

from PIL import Image
import requests
from io import BytesIO

image_url = "https://cc0.photo/wp-content/uploads/2016/11/Fluffy-orange-cat-980x735.jpg"
Image.open(BytesIO(requests.get(image_url).content))

We then use a pipeline to process it. In addition, we indicate a model selected by ourselves the "Intel/dpt-large".

from transformers import pipeline

estimator = pipeline(task="depth-estimation", model="Intel/dpt-large")
result = estimator(images=image_url)
result

Output:
{'predicted_depth': tensor([[
[ 6.3199, 6.3629, 6.4148, ..., 10.4104, 10.5109, 10.3847],
...,
[22.5176, 22.5275, 22.5218, ..., 22.6281, 22.6216, 22.6108]]]),
'depth': <PIL.Image.Image image mode=L size=640x480>}

The result is given in the form of both a PyTorch tensor and a PIL image.

The image can be displayed like this:

result['depth']

Hugging Face 🤗 Transformers – Depth Estimation

Using the model we’ve selected, the pipeline easily determines the depth of our image.

Audio

An increasingly popular field in Artificial Intelligence is audio processing.

Here are the tasks that can be performed in this field with a pipeline of 🤗 Transformers:

audio-classification: assigning a category or label to an audio file based on its content
automatic-speech-recognition: conversion of human speech into text
text-to-audio: conversion of text into audio files, usually using speech synthesizers or automated narration techniques
zero-shot-audio-classification: classification of audio files into predefined categories without the need for specific training data for each category, using pre-trained models to predict labels.

From the audio below, we’ll perform an automatic-speech-recognition task:

from transformers import pipeline

transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-small")
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")

Output: {'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}

Our whisper-small model from OpenAI perfectly transcribes audio into text.

Multimodal

The multimodal domain in AI refers to the simultaneous use of multiple input modes such as text, image and audio.

The following multimodal tasks can be solved with a pipeline:

document-question-answering: answering questions formulated in natural language, using documents as a source of information
feature-extraction: extracting features or attributes from raw data
image-to-text: conversion of visual information contained in an image into descriptive text or metadata
visual-question-answering: answering questions using visual information, usually based on images, combining image processing and natural language processing to provide answers.

Here, we’re going to ask the model what the person in the photo is wearing:

from transformers import pipeline

oracle = pipeline(model="dandelin/vilt-b32-finetuned-vqa")
image_url = "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/lena.png"
oracle(question="What is she wearing ?", image=image_url)

Output:
[{'score': 0.9480270743370056, 'answer': 'hat'}, {'score': 0.008636703714728355, 'answer': 'fedora'}, {'score': 0.003124275477603078, 'answer': 'clothes'}, {'score': 0.0029374377336353064, 'answer': 'sun hat'}, {'score': 0.0020962399430572987, 'answer': 'nothing'}]

The multimodal model takes an image and text as input. This enables it not only to understand our photo, but also to answer our question.

Where to find models? – Hugging Face 🤗 Transformers

By default 🤗 Transformers loads a model when calling a pipeline.

However, it is also possible to use a model hosted by Hugging Face.

You can discover the full range of available models on the company’s website.

Once there, follow these steps to find the model that suits your needs:

Click on “Models” at the top of the screen
Choose a task from the menu on the left
Copy the name of the model of your choice from the main window
Call the copied model with pipeline(model=modelName)

These steps are visually summarized below:

Hugging Face 🤗 Transformers – How to use a model

Thanks to its API, Hugging Face offers its users cutting-edge models.

The goal of Hugging Face is clear: to democratize AI within companies.

And it intends to become a major player in the field.

Today, it’s thanks to Deep Learning that tech leaders can create the most powerful Artificial Intelligences.

If you want to deepen your knowledge in the field, you can access my Action plan to Master Neural networks.

A program of 7 free courses that I’ve prepared to guide you on your journey to master Deep Learning.

If you’re interested, click here:

GET MY ACTION PLAN

source: TechCrunch – Hugging Face raises $235M from investors, including Salesforce and Nvidia

One last word, if you want to go further and learn about Deep Learning - I've prepared for you the Action plan to Master Neural networks. for you.

7 days of free advice from an Artificial Intelligence engineer to learn how to master neural networks from scratch:

Plan your training
Structure your projects
Develop your Artificial Intelligence algorithms

To access it, click here :

GET MY ACTION PLAN

How to use Hugging Face 🤗 Transformers and pipelines

Pipelines – Hugging Face 🤗 Transformers

Definition

Import – Hugging Face 🤗 Transformers

Parameter selection – Hugging Face 🤗 Transformers

Natural Language Processing

A multitude of tasks

Usage

Computer vision

Audio

Multimodal

Where to find models? – Hugging Face 🤗 Transformers

Tom Keldenich

Leave a ReplyCancel Reply

Pipelines – Hugging Face 🤗 Transformers

Definition

Import – Hugging Face 🤗 Transformers

Parameter selection – Hugging Face 🤗 Transformers

Natural Language Processing

A multitude of tasks

Usage

Computer vision

Audio

Multimodal

Where to find models? – Hugging Face 🤗 Transformers

Tom Keldenich

Related Posts

Easily understand non-linearity in a Neural Network

Sentences Embedding Visualization – How to do it the Best Way

What is a GAN in Deep Learning? Understand easily now

Leave a ReplyCancel Reply