How we change what others think, feel, believe and do

Pearson correlation

Explanations > Social ResearchAnalysis > Pearson correlation

Description

Pearson devised a very common way of measuring correlation, often called the Pearson Product-Moment Correlation. It is is used when both variables are at least at interval level and data is parametric.

It is calculated by dividing the covariance of the two variables by the product of their standard deviations.

r = SUM((xi - xbar)(y - ybar)) / ((n - 1) * sx * sy)

Where x and y are the variables, xi is a single value of x, xbar is the mean of all x's, n is the number of variables, and sx is the standard deviation of all x's.

r may also be considered as being:

r2 = explained variation / total variation

where variation is calculated as the Sum of the Squares, SS

In other words, it is the proportion of variation that can be explained. A high explained proportion is good, and a value of one is perfect correlation. For example an r of 0.8 explains 64% of the variance.

When calculated from a population, Pearson's coefficient is denoted with the Greek letter 'rho' (ρ). When calculated from a sample, it is denoted with 'r'.

The Coefficient of Determination is calculated as r2.

Example

 x y x-xbar y-ybar (x-xbar) * (y-ybar) 1 2 -3.7 -2.3 8.51 3 5 -1.7 0.7 -1.19 5 6 0.3 1.7 0.51 6 6 1.3 1.7 2.21 8 7 3.3 2.7 8.91 9 7 4.3 2.7 11.61 6 5 1.3 0.7 0.91 4 3 -0.7 -1.3 0.91 3 1 -1.7 -3.3 5.61 2 1 -2.7 -3.3 8.91 n: 10 Totals: 57 43 46.90 xbar ybar Means: 4.70 4.30 (xbar is mean of x) sx sy Std dev: 2.58 2.36

Hence:

Pearson r = sum((xi - xbar)(y - ybar)) / ((n - 1) * sx * sy)

= 0.854

This is quite high, showing a moderately good correlation between the sets of numbers.

Discussion

Pearson is a parametric statistic and assumes:

1. A normal distribution.
2. Interval or ratio data.
3. A linear relationship between X and Y

The coefficient of determination, r2, represents the percent of the variance in the dependent variable explained by the dependent variable.

Correlation explains a certain amount of variance, but not all. This works on a square law, so a correlation of 0.5 indicates that the independent variable explains 25% of the variance of the dependent variable, and a correlation of 0.9 accounts for 81% of the of the variance.

This  means that the unexplained variance is indicated by (1-r2). This i typically due to random factors.

Pearson's Correlation is also known as the Pearson Product-Moment Correlation or Sample Correlation Coefficient. 'r' is also known as 'Pearson's r'.

And the big
paperback book