Parallelization is distributing task to different workers (CPU). These workers execute the code together and thus accelerate the algorithm.
For example in a for loop from 1 to 5 with 3 CPU. Each CPU will run the loop but each one at a different iteration.
The first CPU at iteration 1, the second at iteration 2 and the third at iteration 3.
Once a CPU has finished its work, it directly takes the next iteration.
The task is thus parallelized and the algorithm is much faster!
In this article we will parallelize our Python code thanks to the multiprocessing library.
Number of CPU available
Before we can start parallelizing our execution, we need to know how many CPUs we can use.
To do so, we use the cpu_count() function:
import multiprocessing
multiprocessing.cpu_count()
Output : Number of CPUs you have.
Parallelization
Once we know our number of CPUs, we can finally parallelize.
We will use Pool. A Pool is a virtual place where our code will be executed.
Each Pool corresponds to one of our CPUs.
import multiprocessing
from multiprocessing import Pool
def f(x):
return x
with Pool(processes=multiprocessing.cpu_count()) as pool:
for i in pool.imap(f, range(10)):
print(i)
Output : 0 1 2 3 4 5 6 7 8 9
In the above code the number of Pools is determined. Here we give it as value the number of available CPU.
Then we distribute, with the imap() function, to each Pool the task of applying the f function to a list of numbers between 1 and 10.
In fact the work will be distributed to each Pool.
THE PANE METHOD FOR DEEP LEARNING!
Get your 7 DAYS FREE TRAINING to learn how to create your first ARTIFICIAL INTELLIGENCE!
For the next 7 days I will show you how to use Neural Networks.
You will learn what Deep Learning is with concrete examples that will stick in your head.
BEWARE, this email series is not for everyone. If you are the kind of person who likes theoretical and academic courses, you can skip it.
But if you want to learn the PANE method to do Deep Learning, click here :
The first Pool will apply the function f to one of the numbers and at the same time the second one will apply the function f to another one, and so on.
At the end of each task, the Pool applies the function on the next iteration !
A chaque fin de tâche, le Pool applique la fonction sur l’itération suivante !
Is it really working ?
But then how do we know that parallelization really works ?
After all, the numbers in the code output are arranged in ascending order. Maybe the execution is just not parallelized ?
Well, to find out we’ll use the imap_unordered() function.
This function will distribute the tasks in an unordered way. So that we can see that CPUs are working in parallel :
import multiprocessing
from multiprocessing import Pool
def f(x):
return x
with Pool(processes=4) as pool:
for i in pool.imap_unordered(f, range(10)):
print(i)
Output : 0 2 3 1 6 7 8 9 5 4
Here, we can see that the output is not hierarchical. This means that the code has not been executed classically but in parallel !
Parallelization is a technique to be used only in peculiar cases.
Indeed, in a task as simple as our f function, parallelization is not useful.
On the contrary, it takes more time than a classical execution because the time to create the pools is considerable.
However, it appears to be efficient in many cases where the calculation to be performed is complex.
To determine if you need parallelization, don’t hesitate to refer to our article on the execution time of your algos !
sources :
- Photo by Ales Nesetril on Unsplash
THE PANE METHOD FOR DEEP LEARNING!
Get your 7 DAYS FREE TRAINING to learn how to create your first ARTIFICIAL INTELLIGENCE!
For the next 7 days I will show you how to use Neural Networks.
You will learn what Deep Learning is with concrete examples that will stick in your head.
BEWARE, this email series is not for everyone. If you are the kind of person who likes theoretical and academic courses, you can skip it.
But if you want to learn the PANE method to do Deep Learning, click here :