Quickly upload public Google Drive files on Notebook and Colab

In this tip we will see how to upload files from a public drive to Google Colab, without access key and without connecting to your drive.

When we do Machine Learning on Jupyter Notebook or Google Colab it’s usual to upload files on our session. One can import them locally when working alone but as soon as one works with several people, the question of loading the data arises.

These data can be very large especially when it comes to pre-trained models or checkpoints.

Many people store these files on Google Drive or other storage platforms. This saves time and space on their local drive.

Python gives us the possibility to upload these kind of files stored on the cloud directly on our session whether they are private, within a team, or public, accessible to all.

No need to download templates yourself, Python does it by itself !

We will see in this article how to do it !

[smartslider3 slider=”9″]

Photo by Michael Niessl on Unsplash

Upload files from your Google Drive folder

Google Colab has a library that allows you to import your own Drive :

from google.colab import drive drive.mount('/content/drive')

After executing the code, Python gives us a link to find our access key.

We just have to enter the key in the corresponding field to have access to all the files of our Drive on our session. Field should look like this :

This is an easy solution if you want to access the files on your drive, but what about the files in a public directory that you want to access without having the access key ?

This can be a problem, especially if we work in collaboration with others who don’t have access to our drive’s password or if we want to make public our freshly baked Machine Learning algorithm !

Don’t worry, there is a solution to upload public drive files ! And it even works for big files !

Photo by James Jadotte on Unsplash

Upload files from a public Google Drive folder

Google Colab allows to use Shell commands like pip, ls or wget… it’s the last one we are interested in 😉

We are going to use wget to load the file from a public directory of Google Drive that we want but before that we need to get the iD of this file.

For that, we need to go to the Drive link where the file is located for example this one.

Then make a right click on the file we are interested in and click on Get link, here :

Once done you should get a link like this: ‘https://drive.google.com/file/d/1ML-Rpeaftox3z7b1GThUZtl1Ql_lGUrb/view?usp=sharing’

The iD you need to retrieve is between ‘https://drive.google.com/file/d/’ and ‘/view?usp=sharing’.

In our case the iD is : 1ML-Rpeaftox3z7b1GThUZtl1Ql_lGUrb

Then, depending on the size of the file, we have two options:

With small files

Use the command :

!wget -q --show-progress --no-check-certificate 'https://docs.google.com/uc?export=download&id=iD' -O Nom_fichier_sortie

With :

  • iD : the iD of the file
  • Output_file_name : the name we want to give the file we downloaded with the associated extension (e.g.: .txt, .png, .pdf, .zip, .tar, … )

In our case:

!wget q --show-progress --no-check-certificate 'https://docs.google.com/uc?export=download&id=1ML-Rpeaftox3z7b1GThUZtl1Ql_lGUrb' -O fichier.tar

With large files

Use the command :

!wget -q --show-progress --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=iD' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=iD" -O Nom_fichier_sortie && rm -rf /tmp/cookies.txt

With :

  • iD: the iD of the file
  • Output_file_name : the name we want to give the file we downloaded with the associated extension (e.g.: .txt, .png, .pdf, .zip, .tar, … )

In our case :

!wget -q --show-progress --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1ML-Rpeaftox3z7b1GThUZtl1Ql_lGUrb' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1ML-Rpeaftox3z7b1GThUZtl1Ql_lGUrb" -O fichier.tar && rm -rf /tmp/cookies.txt

sources :

Tom Keldenich
Tom Keldenich

Data Engineer & passionate about Artificial Intelligence !

Founder of the website Inside Machine Learning

Leave a Reply

Your email address will not be published.

Beginner, expert or just curious?Discover our latest news and articles on Machine Learning

Explore Machine Learning, browse our most recent notebooks and stay up to date with the latest practices and technologies!