How to Calculate Linear Regression
Linear regression finds the best-fitting straight line through a set of data points. It's one of the most important tools in statistics and data science, used to predict outcomes, identify trends, and understand relationships between variables.
The goal is to find the line y = mx + b that minimizes the sum of squared vertical distances from each data point to the line.
The Formulas
Slope:
m = (nΣxy − ΣxΣy) / (nΣx² − (Σx)²)
Y-intercept:
b = (Σy − mΣx) / n
Step-by-Step Example
Data: (1,2), (2,4), (3,5), (4,4), (5,5)
| x | y | xy | x² |
|---|---|---|---|
| 1 | 2 | 2 | 1 |
| 2 | 4 | 8 | 4 |
| 3 | 5 | 15 | 9 |
| 4 | 4 | 16 | 16 |
| 5 | 5 | 25 | 25 |
| Σ=15 | Σ=20 | Σ=66 | Σ=55 |
n = 5
m = (5×66 − 15×20) / (5×55 − 15²) = (330 − 300) / (275 − 225) = 30 / 50 = 0.6
b = (20 − 0.6×15) / 5 = (20 − 9) / 5 = 2.2
Regression line: y = 0.6x + 2.2
Interpreting the Results
- Slope (m = 0.6): For each 1-unit increase in x, y increases by 0.6 on average
- Intercept (b = 2.2): When x = 0, the predicted y is 2.2
- R² (coefficient of determination): Tells you what percentage of variation in y is explained by x
Use our linear regression calculator for any dataset.