How to easily open and save a CSV with the Pandas library ? Here you will find the most used line of code in Data field.
For this tutorial, we will use the happiness.csv file which is located on this GitHub link.
But the essential prerequisite is to import Pandas :
import pandas as pd
And we can start !
Open a csv
Classic method
The classical method is simply to use the read_csv() function by indicating the path of the csv file:
df = pd.read_csv('path/happiness.csv')
Column method
If we want to extract only a part of the csv, we can indicate it to pandas directly in the read_csv() function, with the usecols attribute as below:
df = pd.read_csv('path/happiness.csv', usecols=['Gender','Mean','N='])
Separator method
And finally a vital method when you have csv saved with different separators like: ‘.’ or ‘;’ and many others.
In this example our csv uses the comma ‘,’ as separator:
df = pd.read_csv('path/happiness.csv', sep = ',')
Now we know how to open a csv, let’s move on to how to create one!
By the way, if your goal is to master Deep Learning - I've prepared the Action plan to Master Neural networks. for you.
7 days of free advice from an Artificial Intelligence engineer to learn how to master neural networks from scratch:
- Plan your training
- Structure your projects
- Develop your Artificial Intelligence algorithms
I have based this program on scientific facts, on approaches proven by researchers, but also on my own techniques, which I have devised as I have gained experience in the field of Deep Learning.
To access it, click here :
Now we can get back to what I was talking about earlier.
Save a csv
Classic method
To save a csv from a dataframe you simply have to use the to_csv() function by indicating the path and the name of the desired file:
df.to_csv('path/new_happiness.csv')
Recommended method
The method we recommend at Inside Machine Learning is to use the index attribute and to give it the value False:
df.to_csv('path/new_happiness.csv', index=False)
In fact, if you don’t do this, the default value is True. This implies that the csv created will have the columns and values of each row, but also an index column with the values in addition to having the base index in each csv.
In short index=False avoids having two columns indicating the index of each row in our final csv !
Compressed method
To finish, another method exists for large DataFrame: the compressing method.
We just have to add the compression attribute in the function and write our file in .zip and not in .csv :
df.to_csv('path/new_happiness.zip', index=False, compression='zip')
Note that several formats are available. Here they are detailed : ‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’.
That’s all for this tutorial. We hope you will find it useful 😉
Are you interested in happiness.csv ? We use with it a technique to boost its Machine Learning in this article… a must-read read !
sources :
- Pandas documentation
- Photo by Chris Curry on Unsplash
One last word, if you want to go further and learn about Deep Learning - I've prepared for you the Action plan to Master Neural networks. for you.
7 days of free advice from an Artificial Intelligence engineer to learn how to master neural networks from scratch:
- Plan your training
- Structure your projects
- Develop your Artificial Intelligence algorithms
I have based this program on scientific facts, on approaches proven by researchers, but also on my own techniques, which I have devised as I have gained experience in the field of Deep Learning.
To access it, click here :