How to use Hugging Face 🤗 Transformers and pipelines

In this article, we’ll explore how to use Hugging Face 🤗 Transformers library, and in particular pipelines.

With over 1 million hosted models, Hugging Face is THE platform bringing Artificial Intelligence practitioners together.

Its 🤗 Transformers library provides simplified access to transformer models – trained by experts.

Thus, beginners, professionals and researchers can easily use cutting-edge models in their projects.

In a previous article, you learned more about Hugging Face and its 🤗 Transformers library. We explored the company’s purpose and the added value it brings to the field of AI.

Today, you’re going to find out how to use the 🤗 Transformers library concretely, using pipelines.

Pipelines – Hugging Face 🤗 Transformers

Definition

A pipeline in 🤗 Transformers refers to a process where several steps are followed in a precise order to obtain a prediction from a model.

Glossary

These steps can include data preparation, feature extraction and normalization.

You can think of it as a toolbox 🛠️ that automates complex tasks for you.

The main advantage of the pipeline is its ease of use. It requires just one line of code to load and another line of code to use.

🤗 Transformers offers a clear and intuitive API. So even if you’re not a Machine Learning expert, you can use these cutting-edge models.

You’ll be able to solve a variety of tasks such as Named Entity Recognition (NER), sentiment analysis, object detection, and many more.

The pipeline is an abstraction, meaning that even if on the surface you only use one line of code, in depth, many levers will be triggered.

Depending on the parameters you select, the pipeline will use TensorFlow, PyTorch or JAX to build and provide you with a model.

I recommend this great article if you’d like to delve deeper into the subject.

Speaking of code and parameters, let’s take a closer look at pipelines.

Import – Hugging Face 🤗 Transformers

To install the 🤗 Transformers library, simply use the following command in your terminal:

pip install transformers

Note: if you’re working directly on a notebook, you can use !pip install transformers to install the library from your environment.

Once the library is installed, here’s how to import a pipeline into Python:

from transformers import pipeline

To use it, simply call pipeline(), specifying the required parameters in brackets.

The first thing to note is that you can specify the task you wish to perform using the task parameter.

You can simply choose the task you’re interested in, and the pipeline will do the rest for you.

Note: we’ll go into more detail about the different tasks that can be performed by the pipeline in the following sections of this article.

In addition to task, other parameters can be modulated to adapt the pipeline to your needs.

Parameter selection – Hugging Face 🤗 Transformers

🤗 Transformers gives you the flexibility to choose the model you want to use.

If you have a preference, you can specify it using the model parameters. If you don’t specify one, the pipeline will automatically use the default model for the selected task.

The config parameter lets you customize the model configuration. For example, for text generation models, you can set the maximum number of characters generated.

The tokenizer parameter, used for NLP tasks, and feature_extractor, used for computer vision and multimodal tasks, manage data encoding. Again, if you don’t specify anything, the pipeline will automatically choose the appropriate default values.

The choice of framework, whether PyTorch framework=pt or TensorFlow framework=tf, is also supported.

In addition to these key parameters, the 🤗 Transformers pipeline offers several additional options to customize your use.

For example, the device parameter lets you define the processor on which the pipeline will run: CPU or GPU.

Using these parameters, you can easily adapt the 🤗 Transformers pipeline to your specific needs.

This makes access to state-of-the-art models for a variety of tasks more open than ever.

Now that we’ve seen the main pipeline parameters, let’s take a concrete look at how to use this class on :

  • Natural Language Processing
  • Computer Vision
  • Traitement Audio
  • Multi-Modal

Natural Language Processing

A multitude of tasks

Natural Language Processing refers to text treatment using Artificial Intelligence.

Thanks to 🤗 Transformers pipelines, we can perform a multitude of tasks in this field:

  • conversational: textual communication with virtual agents or chatbots
  • fill-mask: replacing “masks” with words or phrases in text to complete or generate content
  • ner: Named Entity Recognition – identification and classification of named entities (such as names of people, organizations or places) in text
  • question-answering: answering textual questions by extracting information from a given text
  • summarization: reducing the length of a text while preserving relevant information to create a concise summary
  • table-question-answering: answering questions using tabular data as a source of information
  • text2text-generation: text generation from source text, which can include tasks such as translation, rewriting or paraphrasing.
  • text-classification: assigning a predefined category or label to a text based on its content.
  • text-generation: creation of text from scratch or by adding to existing text
  • token-classification: assigning labels to each token (unit of text, usually a word or character) in a text, often used for semantic annotation or entity detection (NER)
  • translation: automatic translation

It’s easy to use a pipeline to perform each of these tasks.

Usage

For example, here I’m using a pipeline for a text-classification task in just 3 lines of code:

from transformers import pipeline

pipe = pipeline("text-classification")
pipe("This restaurant is awesome")

Output: [{'label': 'POSITIVE', 'score': 0.9998743534088135}]

The pipeline loads a default model automatically (here, the model loaded is distilbert-base-uncased-finetuned-sst-2-english). What’s more, the pipeline can process plain text directly, as it supports NLP preprocessing in addition to prediction.

Starting with the sentence “This restaurant is awesome”, the model predicts a positive tone from the author.

It is also possible to process several pieces of data at the same time. To do this, we simply pass a list of strings to the pipeline:

pipe(["This restaurant is awesome", "This restaurant is awful"])

Output:
[{'label': 'POSITIVE', 'score': 0.9998743534088135},
{'label': 'NEGATIVE', 'score': 0.9996669292449951}]

Here, the default loaded model perfectly determines the polarity of our sentences: positive or negative.

By the way, if your goal is to master Deep Learning - I've prepared the Action plan to Master Neural networks. for you.

7 days of free advice from an Artificial Intelligence engineer to learn how to master neural networks from scratch:

  • Plan your training
  • Structure your projects
  • Develop your Artificial Intelligence algorithms

I have based this program on scientific facts, on approaches proven by researchers, but also on my own techniques, which I have devised as I have gained experience in the field of Deep Learning.

To access it, click here :

GET MY ACTION PLAN

GET MY ACTION PLAN

Now we can get back to what I was talking about earlier.

Computer vision

Computer Vision is the processing of images and image sequences (video) by Artificial Intelligence.

Here are the main tasks that 🤗 Transformers pipelines can solve:

  • depth-estimation: estimating the depth or distance of objects in an image
  • image-classification: classification of an image into a predefined class or prediction of its contents
  • image-segmentation: division of an image into regions or segments in order to identify the objects present in the image
  • object-detection: location and classification of specific objects in an image
  • video-classification: classification of a video sequence into a given class according to its content
  • zero-shot-image-classification: classification of an image into a class not seen during model training
  • zero-shot-object-detection: location and classification of specific objects in an image, including objects not seen during model training

Let’s illustrate the simplicity of the pipeline with a depth-estimation task.

First, let’s import an image :

from PIL import Image
import requests
from io import BytesIO

image_url = "https://cc0.photo/wp-content/uploads/2016/11/Fluffy-orange-cat-980x735.jpg"
Image.open(BytesIO(requests.get(image_url).content))

We then use a pipeline to process it. In addition, we indicate a model selected by ourselves the "Intel/dpt-large".

from transformers import pipeline

estimator = pipeline(task="depth-estimation", model="Intel/dpt-large")
result = estimator(images=image_url)
result

Output:
{'predicted_depth': tensor([[
[ 6.3199, 6.3629, 6.4148, ..., 10.4104, 10.5109, 10.3847],
...,
[22.5176, 22.5275, 22.5218, ..., 22.6281, 22.6216, 22.6108]]]),
'depth': <PIL.Image.Image image mode=L size=640x480>}

The result is given in the form of both a PyTorch tensor and a PIL image.

The image can be displayed like this:

result['depth']
Hugging Face 🤗 Transformers – Depth Estimation

Using the model we’ve selected, the pipeline easily determines the depth of our image.

Audio

An increasingly popular field in Artificial Intelligence is audio processing.

Here are the tasks that can be performed in this field with a pipeline of 🤗 Transformers:

  • audio-classification: assigning a category or label to an audio file based on its content
  • automatic-speech-recognition: conversion of human speech into text
  • text-to-audio: conversion of text into audio files, usually using speech synthesizers or automated narration techniques
  • zero-shot-audio-classification: classification of audio files into predefined categories without the need for specific training data for each category, using pre-trained models to predict labels.

From the audio below, we’ll perform an automatic-speech-recognition task:

from transformers import pipeline

transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-small")
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")

Output: {'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}

Our whisper-small model from OpenAI perfectly transcribes audio into text.

Multimodal

The multimodal domain in AI refers to the simultaneous use of multiple input modes such as text, image and audio.

The following multimodal tasks can be solved with a pipeline:

  • document-question-answering: answering questions formulated in natural language, using documents as a source of information
  • feature-extraction: extracting features or attributes from raw data
  • image-to-text: conversion of visual information contained in an image into descriptive text or metadata
  • visual-question-answering: answering questions using visual information, usually based on images, combining image processing and natural language processing to provide answers.

Here, we’re going to ask the model what the person in the photo is wearing:

from transformers import pipeline

oracle = pipeline(model="dandelin/vilt-b32-finetuned-vqa")
image_url = "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/lena.png"
oracle(question="What is she wearing ?", image=image_url)

Output:
[{'score': 0.9480270743370056, 'answer': 'hat'},
{'score': 0.008636703714728355, 'answer': 'fedora'},
{'score': 0.003124275477603078, 'answer': 'clothes'},
{'score': 0.0029374377336353064, 'answer': 'sun hat'},
{'score': 0.0020962399430572987, 'answer': 'nothing'}]

The multimodal model takes an image and text as input. This enables it not only to understand our photo, but also to answer our question.

Where to find models? – Hugging Face 🤗 Transformers

By default 🤗 Transformers loads a model when calling a pipeline.

However, it is also possible to use a model hosted by Hugging Face.

You can discover the full range of available models on the company’s website.

Once there, follow these steps to find the model that suits your needs:

  • Click on “Models” at the top of the screen
  • Choose a task from the menu on the left
  • Copy the name of the model of your choice from the main window
  • Call the copied model with pipeline(model=modelName)

These steps are visually summarized below:

Hugging Face 🤗 Transformers – How to use a model

Thanks to its API, Hugging Face offers its users cutting-edge models.

The goal of Hugging Face is clear: to democratize AI within companies.

And it intends to become a major player in the field.

Today, it’s thanks to Deep Learning that tech leaders can create the most powerful Artificial Intelligences.

If you want to deepen your knowledge in the field, you can access my Action plan to Master Neural networks.

A program of 7 free courses that I’ve prepared to guide you on your journey to master Deep Learning.

If you’re interested, click here:

GET MY ACTION PLAN

source: TechCrunch – Hugging Face raises $235M from investors, including Salesforce and Nvidia

One last word, if you want to go further and learn about Deep Learning - I've prepared for you the Action plan to Master Neural networks. for you.

7 days of free advice from an Artificial Intelligence engineer to learn how to master neural networks from scratch:

  • Plan your training
  • Structure your projects
  • Develop your Artificial Intelligence algorithms

I have based this program on scientific facts, on approaches proven by researchers, but also on my own techniques, which I have devised as I have gained experience in the field of Deep Learning.

To access it, click here :

GET MY ACTION PLAN

GET MY ACTION PLAN

Tom Keldenich
Tom Keldenich

Artificial Intelligence engineer and data enthusiast!

Founder of the website Inside Machine Learning

Leave a Reply

Your email address will not be published. Required fields are marked *

This page will not stay online forever

Enter your email to receive for free

The PANE method for Deep Learning

* indicates required

 

You will receive one email per day for 7 days – then you will receive my newsletter.
Your information will never be given to third parties.

You can unsubscribe in 1 click from any of my emails.



Entre ton email pour recevoir gratuitement
la méthode PARÉ pour faire du Deep Learning


Tu recevras un email par jour pendant 7 jours - puis tu recevras ma newsletter.
Tes informations ne seront jamais cédées à des tiers.

Tu peux te désinscrire en 1 clic depuis n'importe lequel de mes emails.