Parallelization in Python - Getting the most out of your CPU

Parallelization is distributing task to different workers (CPU). These workers execute the code together and thus accelerate the algorithm.

For example in a for loop from 1 to 5 with 3 CPU. Each CPU will run the loop but each one at a different iteration.

The first CPU at iteration 1, the second at iteration 2 and the third at iteration 3.

Once a CPU has finished its work, it directly takes the next iteration.

The task is thus parallelized and the algorithm is much faster!

In this article we will parallelize our Python code thanks to the multiprocessing library.

Number of CPU available

Parallelization

Is it really working ?

Number of CPU available

Before we can start parallelizing our execution, we need to know how many CPUs we can use.

To do so, we use the cpu_count() function:

import multiprocessing

multiprocessing.cpu_count()

Output : Number of CPUs you have.

Parallelization

Once we know our number of CPUs, we can finally parallelize.

We will use Pool. A Pool is a virtual place where our code will be executed.

Each Pool corresponds to one of our CPUs.

import multiprocessing
from multiprocessing import Pool

def f(x):
  return x

with Pool(processes=multiprocessing.cpu_count()) as pool:
  for i in pool.imap(f, range(10)):
    print(i)

Output : 0 1 2 3 4 5 6 7 8 9

In the above code the number of Pools is determined. Here we give it as value the number of available CPU.

Then we distribute, with the imap() function, to each Pool the task of applying the f function to a list of numbers between 1 and 10.

In fact the work will be distributed to each Pool.

By the way, if your goal is to master Deep Learning - I've prepared the Action plan to Master Neural networks. for you.

7 days of free advice from an Artificial Intelligence engineer to learn how to master neural networks from scratch:

Plan your training
Structure your projects
Develop your Artificial Intelligence algorithms

I have based this program on scientific facts, on approaches proven by researchers, but also on my own techniques, which I have devised as I have gained experience in the field of Deep Learning.

To access it, click here :

GET MY ACTION PLAN

Now we can get back to what I was talking about earlier.

The first Pool will apply the function f to one of the numbers and at the same time the second one will apply the function f to another one, and so on.

At the end of each task, the Pool applies the function on the next iteration !

A chaque fin de tâche, le Pool applique la fonction sur l’itération suivante !

Is it really working ?

But then how do we know that parallelization really works ?

After all, the numbers in the code output are arranged in ascending order. Maybe the execution is just not parallelized ?

Well, to find out we’ll use the imap_unordered() function.

This function will distribute the tasks in an unordered way. So that we can see that CPUs are working in parallel :

import multiprocessing
from multiprocessing import Pool

def f(x):
  return x

with Pool(processes=4) as pool:
  for i in pool.imap_unordered(f, range(10)):
    print(i)

Output : 0 2 3 1 6 7 8 9 5 4

Here, we can see that the output is not hierarchical. This means that the code has not been executed classically but in parallel !

Parallelization is a technique to be used only in peculiar cases.

Indeed, in a task as simple as our f function, parallelization is not useful.

On the contrary, it takes more time than a classical execution because the time to create the pools is considerable.

However, it appears to be efficient in many cases where the calculation to be performed is complex.

To determine if you need parallelization, don’t hesitate to refer to our article on the execution time of your algos !

sources :

Photo by Ales Nesetril on Unsplash

One last word, if you want to go further and learn about Deep Learning - I've prepared for you the Action plan to Master Neural networks. for you.

7 days of free advice from an Artificial Intelligence engineer to learn how to master neural networks from scratch:

Plan your training
Structure your projects
Develop your Artificial Intelligence algorithms

To access it, click here :

GET MY ACTION PLAN

Parallelization in Python – Getting the most out of your CPU

Number of CPU available

Parallelization

Is it really working ?

Tom Keldenich

Leave a ReplyCancel Reply

Number of CPU available

Parallelization

Is it really working ?

Tom Keldenich

Related Posts

Bayesian optimization – What is it? How to use it best?

Quickly load public Google Drive files on Notebook and Colab

What is cuML and how to use it on Google Colab?

Leave a ReplyCancel Reply