The Correlation Coefficient (r) (2024)

The sample correlation coefficient (r) is a measure of the closeness of association of the points in a scatter plot to a linear regression line based on those points, as in the example above for accumulated saving over time. Possible values of the correlation coefficient range from -1 to +1, with -1 indicating a perfectly linear negative, i.e., inverse, correlation (sloping downward) and +1 indicating a perfectly linear positive correlation (sloping upward).

A correlation coefficient close to 0 suggests little, if any, correlation. The scatter plot suggests that measurement of IQ do not change with increasing age, i.e., there is no evidence that IQ is associated with age.

Calculation of the Correlation Coefficient

The equations below show the calculations sed to compute "r". However, you do not need to remember these equations. We will use R to do these calculations for us. Nevertheless, the equations give a sense of how "r" is computed.

where Cov(X,Y) is the covariance, i.e., how far each observed (X,Y) pair is from the mean of X and the mean of Y, simultaneously, and and s_x² and s_y² are the sample variances for X and Y.

Describing Correlation Coefficients

The table below provides some guidelines for how to describe the strength of correlation coefficients, but these are just guidelines for description. Also, keep in mind that even weak correlations can be statistically significant, as you will learn shortly.

Correlation Coefficient (r)	Description (Rough Guideline )
+1.0	Perfect positive + association
+0.8 to 1.0	Very strong + association
+0.6 to 0.8	Strong + association
+0.4 to 0.6	Moderate + association
+0.2 to 0.4	Weak + association
0.0 to +0.2	Very weak + or no association
0.0 to -0.2	Very weak - or no association
-0.2 to – 0.4	Weak - association
-0.4 to -0.6	Moderate - association
-0.6 to -0.8	Strong - association
-0.8 to -1.0	Very strong - association
-1.0	Perfect negative association

The four images below give an idea of how some correlation coefficients might look on a scatter plot.

Beware of Non-Linear Relationships

Many relationships between measurement variables are reasonably linear, but others are not For example, the image below indicates that the risk of death is not linearly correlated with body mass index. Instead, this type of relationship is often described as "U-shaped" or "J-shaped," because the value of the Y-variable initially decreases with increases in X, but with further increases in X, the Y-variable increases substantially. The relationship between alcohol consumption and mortality is also "J-shaped."

Source: Calle EE, et al.: N Engl J Med 1999; 341:1097-1105

A simple way to evaluate whether a relationship is reasonably linear is to examine a scatter plot. To illustrate, look at the scatter plot below of height (in inches) and body weight (in pounds) using data from the Weymouth Health Survey in 2004. R was used to create the scatter plot and compute the correlation coefficient.

wey<-na.omit(Weymouth_Adult_Part)
attach(wey)
plot(hgt_inch,weight)
cor(hgt_inch,weight)
[1] 0.5653241

There is quite a lot of scatter, and the large number of data points makes it difficult to fully evaluate the correlation, but the trend is reasonably linear. The correlation coefficient is +0.56.

Beware of Outliers

Note also in the plot above that there are two individuals with apparent heights of 88 and 99 inches. A height of 88 inches (7 feet 3 inches) is plausible, but unlikely, and a height of 99 inches is certainly a coding error. Obvious coding errors should be excluded from the analysis, since they can have an inordinate effect on the results. It's always a good idea to look at the raw data in order to identify any gross mistakes in coding.

After excluding the two outliers, the plot looks like this:

return to top | previous page | next page

FAQs

The Correlation Coefficient (r)? ›

The sample correlation coefficient (r) is a measure of the closeness of association of the points in a scatter plot to a linear regression line based on those points, as in the example above for accumulated saving over time.

Show Me More ›

What is the formula for the correlation coefficient in R? ›

r=∑(xi−¯x)(yi−¯y)√∑(xi−¯x)2∑(yi−¯y)2 .

Discover More ›

What is the correlation coefficient r2 and R? ›

The Pearson correlation coefficient (r) is used to identify patterns in things whereas the coefficient of determination (R²) is used to identify the strength of a model.

Show Me More ›

What is the R value in a regression? ›

R in a regression analysis is called the correlation coefficient and it is defined as the correlation or relationship between an independent and a dependent variable.

Read The Full Story ›

How to find r on calculator? ›

TI-84: Correlation Coefficient

To view the Correlation Coefficient, turn on "DiaGnosticOn" [2nd] "Catalog" (above the '0'). Scroll to DiaGnosticOn. [Enter] [Enter] again. ...
Now you will be able to see the 'r' and 'r^2' values. Note: Go to [STAT] "CALC" "8:" [ENTER] to view. Previous Article. Next Article.

Jan 10, 2023

Read On ›

How to find r correlation coefficient? ›

Use the formula (z_y)_i = (y_i – ȳ) / s _y and calculate a standardized value for each y_i. Add the products from the last step together. Divide the sum from the previous step by n – 1, where n is the total number of points in our set of paired data. The result of all of this is the correlation coefficient r.

Know More ›

What is the R of the correlation? ›

Discover More ›

What does R-squared tell us? ›

R-Squared (R² or the coefficient of determination) is a statistical measure in a regression model that determines the proportion of variance in the dependent variable that can be explained by the independent variable. In other words, r-squared shows how well the data fit the regression model (the goodness of fit).

How do you interpret the R of the correlation? ›

The relationship (or the correlation) between the two variables is denoted by the letter r and quantified with a number, which varies between −1 and +1. Zero means there is no correlation, where 1 means a complete or perfect correlation. The sign of the r shows the direction of the correlation.

How to calculate r value? ›

R = R2 (R2/R1) / K

With radial surface R value calculations, the following factors are true: R value increases as the insulation thickness increases.

Get More Info ›

What does the R value tell you? ›

The relationship between two variables is generally considered strong when their r value is larger than 0.7. The correlation r measures the strength of the linear relationship between two quantitative variables. Pearson r: r is always a number between -1 and 1.

Learn More Now ›

What is a good R value? ›

Typical recommendations for exterior walls are R-13 to R-23, while R-30, R-38 and R-49 are common for ceilings and attic spaces. See the Department of Energy's (DOE) ranges for recommended levels of insulation below.

Read The Full Story ›

How to find coefficient of correlation? ›

Here are the steps to take in calculating the correlation coefficient:

Determine your data sets. ...
Calculate the standardized value for your x variables. ...
Calculate the standardized value for your y variables. ...
Multiply and find the sum. ...
Divide the sum and determine the correlation coefficient.

Jul 31, 2023

How to find correlation between two variables? ›

The correlation coefficient is determined by dividing the covariance by the product of the two variables' standard deviations. Standard deviation is a measure of the dispersion of data from its average. Covariance is a measure of how two variables change together.

What is the formula for the correlation coefficient? ›

The correlation coefficient formula is: r = (n*sumXY - sumX*sum Y)/sqrt{(n*sumX^2 - (sumX)^2)*(n*sumY^2 - (sumY^2))}.The terms in that formula are: n = the number of data points, sumXY is the sum of the product of the x-value and y-value for each point in the data set, sumX is the sum of the x-values in the data set, ...

Keep Reading ›

How to use cor() function in R? ›

The cor() function will calculate the correlation between two vectors, or will create a correlation matrix when given a matrix. cor(apple, micr) simply returned the correlation between the two stocks.

Read On ›

What is the formula for the regression coefficient in R? ›

Formula and basics

The mathematical formula of the linear regression can be written as y = b0 + b1*x + e , where: b0 and b1 are known as the regression beta coefficients or parameters: b0 is the intercept of the regression line; that is the predicted value when x = 0 . b1 is the slope of the regression line.

See Details ›

What is the correlation coefficient r for the data set? ›

The correlation coefficient of two variables in a data set equals to their covariance divided by the product of their individual standard deviations. It is a normalized measurement of how the two are linearly related.

See Details ›