Skip to main content
返回指南
6 min read5 步骤

How to Calculate Pearson Correlation: Step-by-Step Guide

Learn to manually calculate the Pearson Correlation Coefficient (r) step-by-step. Understand the formula, worked examples, and common pitfalls for measuring linear relationships.

跳过数学——使用计算器

分步说明

1

Gather and Organize Your Data

Begin by listing your paired data points (X and Y). Create a table to include columns for X, Y, XY (X multiplied by Y), X² (X squared), and Y² (Y squared). This structured approach will simplify the subsequent calculations.

2

Calculate Preliminary Sums

Fill in the `XY`, `X²`, and `Y²` values for each data pair. Once all rows are complete, sum each of the five columns (X, Y, XY, X², Y²) to obtain `Σx`, `Σy`, `Σxy`, `Σx²`, and `Σy²`. Also, count the total number of data pairs, `n`.

3

Apply the Pearson Correlation Formula

Insert the calculated sums (`n`, `Σx`, `Σy`, `Σxy`, `Σx²`, `Σy²`) into the Pearson correlation formula: `r = [nΣ(xy) - (Σx)(Σy)] / sqrt{[nΣx² - (Σx)²][nΣy² - (Σy)²]}`. Ensure careful substitution to avoid errors.

4

Compute the Numerator and Denominator Components

Independently calculate the numerator (`nΣ(xy) - (Σx)(Σy)`) and each part of the denominator. For the denominator, first calculate `[nΣx² - (Σx)²]` and `[nΣy² - (Σy)²]`, then multiply these two results, and finally take the square root of their product.

5

Calculate 'r' and Interpret the Coefficient

Divide the calculated numerator by the calculated denominator to find the Pearson correlation coefficient 'r'. Once you have 'r', interpret its value: a positive value indicates a positive linear relationship, a negative value indicates a negative linear relationship, and the closer 'r' is to +1 or -1, the stronger the linear relationship.

How to Calculate Pearson Correlation: Step-by-Step Guide

Understanding the relationship between two variables is a fundamental aspect of data analysis. The Pearson Product-Moment Correlation Coefficient, often denoted as 'r', is a statistical measure that quantifies the strength and direction of a linear relationship between two continuous variables. Its value ranges from -1 to +1.

  • A value of +1 indicates a perfect positive linear correlation: as one variable increases, the other increases proportionally.
  • A value of -1 indicates a perfect negative linear correlation: as one variable increases, the other decreases proportionally.
  • A value of 0 indicates no linear correlation between the two variables.

While software and calculators can quickly compute 'r', understanding the manual calculation provides deeper insight into how this crucial statistic is derived and what it truly represents.

Prerequisites

To effectively follow this guide, you should have a basic understanding of:

  • Arithmetic Operations: Addition, subtraction, multiplication, division, squaring, and square roots.
  • Variables: Recognizing independent (X) and dependent (Y) variables.
  • Summation Notation (Σ): Understanding that Σ means "the sum of."

The Pearson Correlation Formula

There are several equivalent formulas for Pearson 'r'. For manual calculation, the computational formula is often preferred as it avoids intermediate mean calculations for each data point:

$$r = \frac{n\Sigma(xy) - (\Sigma x)(\Sigma y)}{\sqrt{[n\Sigma x^2 - (\Sigma x)^2][n\Sigma y^2 - (\Sigma y)^2]}}$$

Where:

  • n = the number of paired data points.
  • Σx = the sum of all X values.
  • Σy = the sum of all Y values.
  • Σxy = the sum of the product of each X and Y pair.
  • Σx² = the sum of the squared X values.
  • (Σx)² = the square of the sum of all X values.
  • Σy² = the sum of the squared Y values.
  • (Σy)² = the square of the sum of all Y values.

Worked Example: Study Hours vs. Exam Scores

Let's calculate the Pearson correlation coefficient for a small dataset representing the number of study hours (X) and the corresponding exam scores (Y) for 5 students.

Student Study Hours (X) Exam Score (Y)
1 2 60
2 3 75
3 5 80
4 6 85
5 8 95

Step 1: Gather and Organize Your Data

First, list your paired data points. To facilitate the calculations, it's helpful to create a table with columns for X, Y, XY, X², and Y². This organizes all the intermediate values needed for the formula.

X Y XY
2 60
3 75
5 80
6 85
8 95

Step 2: Calculate Preliminary Sums

Fill in the XY, , and columns for each row, then sum each column to get Σx, Σy, Σxy, Σx², and Σy². Also, identify n, the number of data pairs (which is 5 in this example).

X Y XY (X*Y)
2 60 120 4 3600
3 75 225 9 5625
5 80 400 25 6400
6 85 510 36 7225
8 95 760 64 9025
--- --- ---------- ---- ------
Σx=24 Σy=395 Σxy=2015 Σx²=138 Σy²=31875

From the table, we have:

  • n = 5
  • Σx = 24
  • Σy = 395
  • Σxy = 2015
  • Σx² = 138
  • Σy² = 31875

Step 3: Apply the Pearson Correlation Formula

Now, substitute these sums into the Pearson correlation formula:

$$r = \frac{n\Sigma(xy) - (\Sigma x)(\Sigma y)}{\sqrt{[n\Sigma x^2 - (\Sigma x)^2][n\Sigma y^2 - (\Sigma y)^2]}}$$

$$r = \frac{5(2015) - (24)(395)}{\sqrt{[5(138) - (24)^2][5(31875) - (395)^2]}}$$

Step 4: Compute the Numerator and Denominator Components

Calculate the numerator and each part of the denominator separately to manage complexity.

Numerator Calculation:

nΣ(xy) - (Σx)(Σy) = 5(2015) - (24)(395) = 10075 - 9480 = 595

Denominator Calculation (Left Bracket):

[nΣx² - (Σx)²] = [5(138) - (24)²] = [690 - 576] = 114

Denominator Calculation (Right Bracket):

[nΣy² - (Σy)²] = [5(31875) - (395)²] = [159375 - 156025] = 3350

Denominator Final Calculation:

Denominator = √[114 * 3350] = √[381900] ≈ 618.0615

Step 5: Calculate 'r' and Interpret the Coefficient

Divide the numerator by the denominator to get the Pearson 'r' value.

r = 595 / 618.0615 r ≈ 0.9627

Interpretation

The calculated Pearson correlation coefficient r ≈ 0.9627 is a strong positive value, very close to +1. This indicates a very strong positive linear relationship between study hours and exam scores. In practical terms, students who study more tend to achieve significantly higher exam scores.

Common Pitfalls to Avoid

When working with Pearson correlation, be mindful of these common mistakes:

  • Correlation Does Not Imply Causation: A strong correlation means variables move together, but it does not mean one causes the other. There might be confounding variables or the relationship could be coincidental.
  • Non-Linear Relationships: Pearson 'r' only measures linear relationships. If the relationship between variables is non-linear (e.g., U-shaped), Pearson 'r' might be close to zero, even if there's a strong connection. Always visualize your data with a scatter plot first.
  • Outliers: Extreme data points can heavily influence the 'r' value, potentially distorting the true relationship. Consider checking for and addressing outliers.
  • Range Restriction: If the range of one or both variables is artificially limited, the correlation might be underestimated. For example, if you only study top-performing students, the correlation between study hours and grades might appear weaker than in the general student population.

When to Use a Calculator or Software

While manual calculation is excellent for understanding the underlying mechanics, it becomes impractical for larger datasets. For efficiency, accuracy, and additional analytical capabilities, consider using:

  • Statistical Calculators: Many scientific and graphing calculators have built-in functions for linear regression and correlation.
  • Spreadsheet Software: Programs like Microsoft Excel or Google Sheets can calculate 'r' using functions like CORREL().
  • Statistical Software: Tools like R, Python (with libraries like NumPy and SciPy), SPSS, SAS, or Stata are designed for robust statistical analysis, including correlation, and can also provide p-values, confidence intervals, and detailed scatter plots.

准备好计算了吗?

跳过手动工作并立即获得结果。

打开计算器

设置

隐私条款关于© 2026 PrimeCalcPro