Let’s see in details what is the Pearson formula, or linear correlation coefficient, and how to code it in Python without any library !
In mathematics, the correlation between several variables implies that these variables are dependent on each other.
Hence, linear correlation implies that two variables have a linear relationship between them. If there is a linear correlation, then the relationship between these variables can be represented by a straight line.
To calculate this linear correlation coefficient, we use the Pearson formula which is the calculation of the covariance between the variables, divided by the product of their standard deviations.
Thus, if we want to calculate the linear correlation between two variables we use :

The higher the absolute value of the linear correlation coefficient, the more the two variables are linearly correlated (i.e. the more the relationship can be represented by a line).
However, a zero coefficient does not imply independence, because other types of (non-linear) correlation are possible.
This formula is discussed in the exercise on the HackerRank website for Statistics & Machine Learning : Correlation and Regression Lines – A Quick Recap #1
Pearson formula
First we have our two variables, two lists of integers :
THE PANE METHOD FOR DEEP LEARNING!
Get your 7 DAYS FREE TRAINING to learn how to create your first ARTIFICIAL INTELLIGENCE!
For the next 7 days I will show you how to use Neural Networks.
You will learn what Deep Learning is with concrete examples that will stick in your head.
BEWARE, this email series is not for everyone. If you are the kind of person who likes theoretical and academic courses, you can skip it.
But if you want to learn the PANE method to do Deep Learning, click here :
x = [15, 12, 8, 8, 7, 7, 7, 6, 5, 3]
y = [10, 25, 17, 11, 13, 17, 20, 13, 9, 15]
We calculate the average of these variables…
mX = sum(x)/len(x)
mY = sum(y)/len(y)
… and then calculate the covariance :
cov = sum((a - mX) * (b - mY) for (a,b) in zip(x,y)) / len(x)
Then we compute the standard deviation of each of the variables:
stdevX = (sum((a - mX)**2 for a in x)/len(x))**0.5
stdevY = (sum((b - mY)**2 for b in y)/len(y))**0.5
Afterwards, we can calculate the linear correlation coefficient thanks to Pearson’s formula !
(We have intentionally rounded the result to three digits after the decimal point)
result = round(cov/(stdevX*stdevY),3)
Finally, we display the result :
print(result)
We get 0.145, this implies that the two variables are not really linearly correlated !
sources :
- Wikipedia
- Photo by Everton Vila on Unsplash
THE PANE METHOD FOR DEEP LEARNING!
Get your 7 DAYS FREE TRAINING to learn how to create your first ARTIFICIAL INTELLIGENCE!
For the next 7 days I will show you how to use Neural Networks.
You will learn what Deep Learning is with concrete examples that will stick in your head.
BEWARE, this email series is not for everyone. If you are the kind of person who likes theoretical and academic courses, you can skip it.
But if you want to learn the PANE method to do Deep Learning, click here :