The 13 essential Python libraries to know to do Data Science, but especially the code to use them directly !
Data Science is the field that brings together manipulation, analysis and understanding of data.
Python is the most used language in this field. But what are the Data Science libraries that you should absolutely know?
That’s what we see in this article !
Pandas
No need to introduce it anymore !
The Pandas library is the basis for any Data Scientist.
It allows you to easily manipulate data. To extract them from an excel, csv, txt file, and even from a web page !
But also to do operations between columns, rows and cells of a DataFrame.
It is ideal for working with any type of data: integer, float, text, date, etc.
To use it :
pip install pandas
import pandas as pd
Numpy
Numpy allows to work easily with Array.
It is easy to perform complex mathematical operations thanks to its set of functions.
In addition to that, its low computation time enables you to execute your code rapidly.
To use it :
pip install numpy
import numpy as np
Scipy
Scipy is an extension of Numpy.
It allows you to push the calculations even further, in particular to do :
- optimization
- statistics
- signal processing
- linear algebra
To use it :
pip install scipy
import scipy
Matplotlib
Want to display graphics without the headache?
Matplotlib is the library you need !
It allows you to make simple but powerful graphics. Whether it is via Pandas DataFrame or Numpy Array.
With Matplotlib you can make :
- continuous interval graphs
- discontinuous interval graphs
- scatter plots
- Tukey boxes
- bar charts
- pie charts
- 3D volumes
- heatmap
- time series visualizations
… and many more! I let you explore the documentation to see the extent of Matplotlib.
To use it :
pip install matplotlib
import matplotlib.pyplot as plt
Seaborn
As Scipy is an extension of Numpy, Seaborn is an extension of Matplotlib
Its major contribution ?
A more pleasant use of Matlpotlib. Seaborn has pre-implemented functions allowing to draw stylized graphics in a single line of code.
Example in image :

And the line of code the three lines of code to reproduce the example :
import seaborn as sns
sns.set_theme(style="white")
# Load the example mpg dataset
mpg = sns.load_dataset("mpg")
# Plot miles per gallon against horsepower with other semantics
sns.relplot(x="horsepower", y="mpg", hue="origin", size="weight",
sizes=(40, 400), alpha=.5, palette="muted",
height=6, data=mpg)
To use it :
pip install seaborn
import seaborn as sns
Plotly
Plotly is a more advanced library than matplotlib for data visualization.
The developers of the library claim to be able to make “publication-quality graphs”, i.e. professional quality graphs especially for scientific publications.
Personally, I like the fact of having interactive graphics with Plotly in which you can zoom and navigate easily. But for simple analysis graphs, you can stick with Matplotlib.
To use it:
THE PANE METHOD FOR DEEP LEARNING!
Get your 7 DAYS FREE TRAINING to learn how to create your first ARTIFICIAL INTELLIGENCE!
For the next 7 days I will show you how to use Neural Networks.
You will learn what Deep Learning is with concrete examples that will stick in your head.
BEWARE, this email series is not for everyone. If you are the kind of person who likes theoretical and academic courses, you can skip it.
But if you want to learn the PANE method to do Deep Learning, click here :
pip install plotly
import plotly.express as px
Statsmodels
Statsmodels is a Python library for statistics, estimation and data mining.
You have at your disposal several models to better understand your data. Thus you can do linear regression, time series analysis or implement Generalized Additive Models (GAM).
To use it:
pip install statsmodels
import statsmodels.api as sm
Scikit-learn
Scikit-learn is THE most used library in Data Science to do Machine Learning.
It allows you to do Machine Learning in a simple way by providing you with ready-to-use algorithms !
This makes it an essential base for Data Science but also a good entry point to Machine Learning.
To use it :
pip install scikit-learn
import scikit-learn as sklearn
NLTK
NLTK is the leader in natural language processing.
This library offers functions for a wide variety of operations :
- tokenization
- lemmatization
- stemmatization
- entity and proper name detection
- stopwords removal
- sentiment analysis (and intensity)
The list is too long to be exhaustive but you can see our other articles in the NLP category !
To use it :
pip install nltk
import nltk
Gensim
Gensim is used to do a very specific NLP task : vector representation.
Indeed, with Gensim you can represent text as a vector. And it works for any kind of text, be it a scientific document, a book or a press article !
Once a text is represented as a vector, there are a lot of nice analysis to do. For example, you can calculate the similarity between two texts, even if they have no words in common :

To use it :
pip install gensim
import gensim
Spacy
Spacy is the last NLP library of this top.
It shares most of the features of NLTK but here the library specializes in production applications.
One will use Spacy to integrate text analysis tools in web apps rather than for pure analysis in Python.
Note that Spacy is particularly effective in understanding long and detailed text.
To use it :
pip install spacy
import spacy
BeautifulSoup
BeautifulSoup is a library for extracting data from HTML files.
Put simply, BeautifulSoup allows you to retrieve data from other websites. This technique is called Web Scraping.
In addition, this library offers a simple way to navigate through this HTML file. For instance to display titles, we’ll use file.title
.
And for those who are less familiar with HTML, a function is proposed to convert HTML into text. Ideal if you want to use NLP !
To use it :
pip install beautifulsoup4
from bs4 import BeautifulSoup
NetworkX
NetworkX is a niche library. Only a part of the Data Scientists need it.
It is a library that offers a class to manipulate Graphs and all kind of functions associated to this objects.
Graphs are particularly useful objects to represent relationships between individuals (people, companies, …).
Be careful here we are not talking about a graph but about a Graph.
To use it :
pip install pandas
import pandas as pd
This concludes our article on Data Science libraries !
If you are a beginner and you want to know more about Data and Machine Learning, it’s here 😉
THE PANE METHOD FOR DEEP LEARNING!
Get your 7 DAYS FREE TRAINING to learn how to create your first ARTIFICIAL INTELLIGENCE!
For the next 7 days I will show you how to use Neural Networks.
You will learn what Deep Learning is with concrete examples that will stick in your head.
BEWARE, this email series is not for everyone. If you are the kind of person who likes theoretical and academic courses, you can skip it.
But if you want to learn the PANE method to do Deep Learning, click here :