Correlation Coefficients (2024)

Back to the Table of Contents

Applied Statistics - Lesson 5

Lesson Overview

  • Correlation
  • Pearson Product Moment (r)
  • Spearman Rho
  • Factors Affecting the size of r
  • Homework

Correlation

The common usage of the word correlation refers to a relationship between two or more objects (ideas, variables...). In statistics, the word correlation refers to the relationship between two variables. We wish to be able to quantify this relationship, measure its strength, develop an equation for predicting scores, and ultimately draw testable conclusion about the parent population.This lesson focuses on measuring its strength, with the equation coming in the next lesson,and testing conclusions much later.

Examples: one variable might be the number ofhunters in a region and the other variable could be the deer population. Perhaps as the number of hunters increases, the deer population decreases.This is an example of a negative correlation: as one variable increases, the other decreases. A positive correlation is where the two variables react in the same way, increasing or decreasing together.Temperature in Celsius and Fahrenheit have a positive correlation.

Pearson Product Moment

How can you tell if there is a correlation?By observing the graphs, a person can tell if there is a correlation by howclosely the data resemble a line. If the points are scattered about thenthere may be no correlation. If the points would closely fit a quadratic or exponential equation, etc.,then they have a nonlinear correlation.In this course we will restrict ourselves to linear correlationsand hence linear regression.Since the data are almost linear, the data can be enclosed by an ellipse.The major axis (length) of the ellipse relative to the minor axis (width) of the ellipse,are an indication of the degree of correlation.

How can you tell by inspection the type of correlation?
If the graph of the variables represent a line with positive slope, thenthere is a positive correlation (x increases as y increases). If the slope of the line is negative, then there is a negative correlation (as x increases y decreases).

An important aspects of correlation is how strong it is.The strength of a correlation is measured by the correlation coefficient r. Another name for r is the Pearson product moment correlation coefficient in honor of Karl Pearson who developed it about 1900.There are at least three different formulae in common usedto calculate this number and these different formulaesomewhat represent different approaches to the problem.However, the same value for r is obtained by any one of the different procedures.First we give the raw score formula.n has the usual meaning of how many ordered pairsare in our sample. It is also important to recognizethe difference between the sum of the squaresand the squares of the sums!

r = nCorrelation Coefficients (1)xy - (Correlation Coefficients (2)x)(Correlation Coefficients (3)y)
sqrt[n(Correlation Coefficients (4)x2) - (Correlation Coefficients (5)x)2] · sqrt[n(Correlation Coefficients (6)y2) - (Correlation Coefficients (7)y)2]

Next we present the deviation score formula.This formula is closer to the developmental historysince it gives the average cross-product of thestandard scores of the two variables, but in acomputationally easier format.

r = Correlation Coefficients (8)xy
sqrt(Correlation Coefficients (9)x2Correlation Coefficients (10)y2)

We need to make some notes regarding notation since the x and y variables in the formula abovehave been transformed from the original variablesby subtracting their means.

Lastly we present the covariance formula,which is yet another approach. Covariances are commonly given between two variables and this is one reason why. (It should be noted that the size of the covariance is dependent on the units of measurement used for each variable.However, the correlation coefficient is not.)

r = sxy
sxsy

r is often denoted as rxyto emphasize the two variables under consideration.For samples, the correlation coefficient is represented by r while the correlation coefficient for populations is denotedby the Greek letter rho (which can look like a p).Be aware that the Spearman rho correlation coefficientalso uses the Greek letter rho, but generally appliesto samples and the data are rankings (ordinal data).

The closer r is to +1, the stronger the positive correlation is.The closer r is to -1, the stronger the negative correlation is. If |r| = 1 exactly, the two variables are perfectly correlated!Temperature in Celsius and Fahrenheit are perfectly correlated.

Formal hypothesis testing can be applied to r to determine how significant a result is. That is the subject of Hinkle chapter 17 and this lesson 12. The Student t distribution with n-2 degrees of freedom is used.

Remember, correlation does not imply causation.

A value of zero for r does not mean thatthere is no correlation, there could be a nonlinear correlation.Confounding variables might also be involved. Suppose you discover that miners have a higher than average rate of lung cancer. You might be tempted to immediate concludethat their occupation is the cause, whereas perhaps theregion has an abundance of radioactive radon gas leakingfrom the subterranian regions and all people in that area are affected.Or, perhaps, they are heavy smokers....

r2 is frequently used and is called the coefficient of determination.It is the fraction of the variation in the values of ythat is explained by least-squares regression of y on x.This will be discussed further in lesson 6after least squares is introduced.

Correlation coefficients whose magnitude are between 0.9 and 1.0 indicate variables which can be considered very highly correlated.Correlation coefficients whose magnitude are between 0.7 and 0.9indicate variables which can be considered highly correlated.Correlation coefficients whose magnitude are between 0.5 and 0.7indicate variables which can be considered moderately correlated.Correlation coefficients whose magnitude are between 0.3 and 0.5indicate variables which have a low correlation.Correlation coefficients whose magnitude are less than 0.3have little if any (linear) correlation.We can readily see that 0.9 < |r| < 1.0 corresponds with 0.81 < r2 < 1.00;0.7 < |r| < 0.9 corresponds with 0.49 < r2 < 0.81;0.5 < |r| < 0.7 corresponds with 0.25 < r2 < 0.49;0.3 < |r| < 0.5 corresponds with 0.09 < r2 < 0.25; and0.0 < |r| < 0.3 corresponds with 0.0 < r2 < 0.09.

Spearman Rho for Ranked/Ordinal Data

It is often the case that the data we wish to measure the correlationfor is not of the interval or ratio level of measurement.The Spearman rho correlation coefficient was developedto handle this situation.This is an unfortunate exception to the general rule thatGreek letters are population parameters! There are others.

The formula for calculating the Spearman rhocorrelation coefficient is as follows.

rho (p) = 1 - 6Correlation Coefficients (11)d2
n(n2-1)

n is the number of paired ranks andd is the difference between the paired ranks.If there are no tied scores, the Spearman rho correlation coefficient will be even closer to the Pearson product moment correlation coefficent.Also note that this formula can be easily understood whenyour realize that the sum of the squares from 1 to ncan be expressed as n(n + 1)(2n + 1)/6.From this you can realize the least sum of d2is zero and the greatest sum of d2 is twice the sum of the squares of the odd integers up to n/2 and this then scales such a sum between -1 and +1.

Example: Suppose we have test scoresof 110, 107, 100, 96, 89, 78, 67, 66, and 49.These correspond with ranks 1 through 9.If there were duplicates, then we would have to find the mean ranking for the duplicates andsubstitute that value for our ranks.The corresponding first page score totals were:29, 32, 27, 29, 25, 25, 21, 26, 22.Thus these ranks are as follows:2.5, 1, 4, 2.5, 6.5, 6.5, 9, 5, 8.(Note that if we reversed the order, assigning the ranks from low to high instead of high to low, the resultingSpearman rho correlation coefficient would reverse sign.)

We have constructed a table below from the information above.We have added additional columns of d and d2for ease in calculating the Spearman rho.Using the Spearman rho formula we get 1-6(24)/(9(80)) = 0.80.

Total (x)page 1 (y)x ranky rankdd2 xy x2 y2
110 29 1 2.5 -1.5 2.25 3190 12100 841
107 32 2 1 1 1 3424 11449 1024
100 27 3 4 -1 1 2700 10000 729
96 29 4 2.5 1.5 2.25 2784 9216 841
89 25 5 6.5 -1.5 2.25 2225 7921 625
78 25 6 6.5 -0.5 0.25 1950 6084 625
67 21 7 9 -2 4 1407 4489 441
66 26 8 5 3 9 1716 4356 676
49 22 9 8 1 1 1078 2401 484
--- --- ----- ---- ----- ----- ----
762236:sums:024 20474 68016 6286
We have added additional columns of xy, x2,and y2 to make it easier to calculatethe Pearson product moment correlation coefficient.Using the raw score formula for the Pearson product momentcorrelation coefficient we get (9×20474-762×236)/sqrt((9×68016-7622)(9×6286-2362)= 0.843. r2 = 0.71which means 71% of the variation in yis explained by the variation in x.It is also true and perhaps more useful to know that the same correlation coefficient is obtained when x and y are exchanged.However, a different equation will result.Perhaps it makes more sense to use the results of the first pageto predict the final test score rather than the other way around!

Factors Affecting the Size of r

We have looked now at how to calculate r,what various values mean, but it is also importantto understand what factors affect it.First, remember, it is only meaningful to calculatethe correlation coefficient if the data arepaired observations measured on an intervalor ratio scale.Next, since we are only concerned here with linearcorrelation, the Pearson product momentcorrelation coefficient will underestimate therelationship if there is a curvilinear relationship.It is a good idea to generate a scatterplot before calculating any correlation coefficients and then proceed only if the correlation is reasonably strong.

As the hom*ogeneity of a group increases,the variance decreases and the magnitude of thecorrelation coefficient tends toward zero.It is thus imperative on the researcher to ensureenough heterogeneity (variation) so that a relationship can manifest itself.In general, the correlation coefficient isnot affected by the size of the group.

BACKHOMEWORKACTIVITYCONTINUE
  • e-mail: calkins@andrews.edu
  • voice/mail: 269 471-6629/ BCM&S Smith Hall 106; Andrews University; Berrien Springs,
  • classroom: 269 471-6646; Smith Hall 100/FAX: 269 471-3713; MI, 49104-0140
  • home: 269 473-2572; 610 N. Main St.; Berrien Springs, MI 49103-1013
  • URL: http://www.andrews.edu/~calkins/math/edrm611/edrm05.htm
  • Copyright ©1998-2005, Keith G. Calkins. Revised on or after July 18, 2005.
Correlation Coefficients (2024)
Top Articles
Latest Posts
Article information

Author: Twana Towne Ret

Last Updated:

Views: 5668

Rating: 4.3 / 5 (44 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Twana Towne Ret

Birthday: 1994-03-19

Address: Apt. 990 97439 Corwin Motorway, Port Eliseoburgh, NM 99144-2618

Phone: +5958753152963

Job: National Specialist

Hobby: Kayaking, Photography, Skydiving, Embroidery, Leather crafting, Orienteering, Cooking

Introduction: My name is Twana Towne Ret, I am a famous, talented, joyous, perfect, powerful, inquisitive, lovely person who loves writing and wants to share my knowledge and understanding with you.