The Ultimate NLP Library – Question Answering, Text Summary and more

Let me introduce you to the transformers library which solves the most complicated NLP problems (Question Answering, Summarization and Text Generation, and more) in a few seconds !

For a long time, I’ve been looking for an AI capable of producing a faithful summary of a text or an article.

I trained several models but none of them gave convincing results. However, one afternoon, while wandering on Twitter… I discovered THE ultimate NLP library : transformers.

Although transformers does not allow you to train model, so it is different from TensorFlow, PyTorch… instead it provides state-of-the-art models such as GPT-2, BERT, MT5 and many others!

I propose you today to explore this library to test its capacities. Moreover, we will take for theme Napoleon to celebrate the bicentenary !

Le Sacre de Napoléon par Jacques-Louis David (1808 musée du Louvre)

Introduction

Transformers a été créé en 2020 par HuggingFace une entreprise spécialisée dans les modèle de NLP. Le nom complet de la librairie qu’il propose est “Transformers: State-of-the-Art Natural Language Processing“. Elle permet d’avoir les algorithmes à la pointe de la Recherche (état de l’art) en NLP.

À noter que nous utiliserons cette librairie avec la langue françaisedes textes en anglais**, les algorithmes n’etant pas tous à jour pour traiter la langue française.

Pour l’installer rien de plus simple :

First of all, a bibliography of the sources we will use:

  • the transformers library, available here
  • Napoleon’s wikipedia, available here

Transformers was created in 2020 by HuggingFace, a company specialising in NLP models. The full name of the library it offers is “Transformers: State-of-the-Art Natural Language Processing“. It allows you to have algorithms at the cutting edge of NLP research (state of the art).

Please note that we will use this library with English texts, as the algorithms are not all up to date to handle many other languages.

To install it, nothing could be easier:

!pip install transformers &> /dev/null

Then you can start using it. For that, we are going to demonstrate its performance on a basic task, sentiment analysis, and then on more complex tasks… and that’s where it will get interesting !

Sentiment analysis

Sentiment analysis is really the basis of NLP. We have written this tutorial that allows you to create a Deep Learning model specialized for this problem.

With transformers, you only need the pipeline module and to indicate that you want to do sentiment analysis :

from transformers import pipeline

classifier = pipeline('sentiment-analysis')

When you use pipeline, a download starts. In fact, the library loads the packages needed to solve the problem.

Once this is done, we enter the sentence we want to analyse :

classifier('We are very pleased to introduce this tutorial for using transformers on Napoleon\'s Wikipedia.')

The algorithm considers our sentence as positive with a reliability rate of 0.99, that’s huge ! And this is just the beginning !

Question Answering

Question Answering consists of creating a Deep Learning model that can answer our questions.

In fact, we give our algorithm the text we want to analyse and the question to which we want an answer.

We start by indicating the task to be carried out on the pipeline:

nlp = pipeline("question-answering")

Then we take an extract from Napoleon’s wikipedia :

context = r"""
Napoleon was exiled to the island of Elba, between Corsica and Italy.
In France, the Bourbons were restored to power.
However, Napoleon escaped from Elba in February 1815 and took control of France.
The Allies responded by forming a Seventh Coalition, which ultimately defeated Napoleon at the Battle of Waterloo in June 1815.
The British exiled him to the remote island of Saint Helena in the South Atlantic, where he died in 1821 at the age of 51.
"""

And finally we ask our question, while indicating our extract, here with the context variable.

result = nlp(question="What was Napoleon's last battle ?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

To “What was Napoleon’s last battle ?our algorithm answersThe battle of Waterloo“, not bad isn’t it?

The algorithm also tells us the part of the extract that allowed it to give this answer, here from the 295th character and the 313rd.

La bataille d’Austerlitz. 2 decembre 1805 par François Gérard (1810, Musée du château de Versaille)

Masked Language Modeling

Perhaps the least common NLP task : completing a gap-fill text.

As in the previous examples, the pipeline is told the task to be solved

from transformers import pipeline

nlp = pipeline("fill-mask")

To use fill-mask we need to specify {nlp.tokenizer.mask_token} at the point where a word is missing. The algorithm will give us the top 5 most likely options.

We will test our algorithm on a trap sentenceThe most powerful man who governed France was … !Will it follow our theme and answer “Napoleon“?

from pprint import pprint
pprint(nlp(f"The most powerful man who governed France was {nlp.tokenizer.mask_token} !"))

Impressive! He answered “Napoleon” with a confidence rate of 73% !

We can notice that the other options, which are much less likely, have a confidence rate of less than 10%.

This means that our algorithm is quite sure of its answer !

Text Generation

Let’s continue text generation. This is one of the most complex tasks in NLP because in addition to creating a sequence of words, the algorithm must create meaning, a sequence of words that signify something.

You know the procedure, we give the task to the pipeline module :

from transformers import pipeline

text_generator = pipeline("text-generation")

Now, there are several options :

  • either we generate text without any context; in this case the algorithm will choose for us
  • or we tell the algorithm the beginning of a sentence; it will then have to complete it

Here, we indicate “During a long period of time, Napoleon was” and we ask it to add 40 more characters, max_length=40.

print(text_generator("During a long period of time, Napoleon was", max_length=40, do_sample=False))

We get : “During a long period of time, Napoleon was able to make some changes to his military tactics. He started using the ‘pistol’ as a weapon, and used it to attack.”

Cool !

I don’t know if this is true but it makes sense and seems quite possible !

Summarization

We move on to our last task and not the least : text summarization.

We use pipeline and tell it the task to be performed:

from transformers import pipeline

summarizer = pipeline("summarization")

Next, we use the introduction to Napoleon’s wikipedia.

ARTICLE = """Napoléon Bonaparte (15 August 1769 – 5 May 1821) was a French military and political leader.
He rose to prominence during the French Revolution and led several successful campaigns during the Revolutionary Wars.
As Napoleon I, he was Emperor of the French from 1804 until 1814 and again in 1815.
Napoleon dominated European and global affairs for more than a decade while leading France against a series of coalitions in the Napoleonic Wars.
He won most of these wars and the vast majority of his battles, building a large empire that ruled over continental Europe before its final collapse in 1815.
One of the greatest commanders in history, his wars and campaigns are studied at military schools worldwide.
He remains one of the most celebrated and controversial political figures in human history.
Napoleon had an extensive and powerful impact on the modern world, bringing liberal reforms to the numerous territories that he conquered and controlled, especially the Low Countries, Switzerland, and large parts of modern Italy and Germany.
He implemented fundamental liberal policies in France and throughout Western Europe. His lasting legal achievement, the Napoleonic Code, has been highly influential.
Roberts says, "The ideas that underpin our modern world—meritocracy, equality before the law, property rights, religious toleration, modern secular education, sound finances, and so on—were championed, consolidated, codified and geographically extended by Napoleon.
To them he added a rational and efficient local administration, an end to rural banditry, the encouragement of science and the arts, the abolition of feudalism and the greatest codification of laws since the fall of the Roman Empire.
"""

Note that here we have copied and pasted the article, but for a longer text, such as a book or a research paper, it could be interesting to use web scraping to retrieve the text directly from the website.

To use our algorithm we indicate our text but also the minimum and maximum length desired for the summary.

print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))

Result : “Napoleon was one of the most successful military leaders in history. He ruled over Europe for more than a decade. He implemented liberal policies in France and the rest of Europe.

The transformers library gives us this lovely closing note to celebrate Napoleon’s bicentenary and the ever increasing effectiveness of AI.

For more information on transformers, please check the summary of the tasks proposed on HuggingFace

See you soon for a new article ! 😉

Napoléon à Sainte-Hélène, Musée Nationaux Malmaison

Question Answering

source :

Tom Keldenich
Tom Keldenich

Data Engineer & passionate about Artificial Intelligence !

Founder of the website Inside Machine Learning

Leave a Reply

Your email address will not be published.

Beginner, expert or just curious?Discover our latest news and articles on Machine Learning

Explore Machine Learning, browse our most recent notebooks and stay up to date with the latest practices and technologies!