分步说明
Gather Your Inputs
First, identify your dataset and calculate the mean of x (x̄) and the mean of y (ȳ), the deviations from the mean for x and y, and the products of the deviations for x and y
Calculate the Slope (b1)
Next, plug in the values into the slope formula: b1 = Σ[(xi - x̄)(yi - ȳ)] / Σ(xi - x̄)^2
Calculate the Intercept (b0)
Then, use the slope and the means to calculate the intercept: b0 = ȳ - b1 * x̄
Calculate the Correlation Coefficient (r)
The correlation coefficient measures the strength of the relationship: r = Σ[(xi - x̄)(yi - ȳ)] / sqrt(Σ(xi - x̄)^2 * Σ(yi - ȳ)^2)
Calculate the Residuals
Finally, calculate the residuals by subtracting the predicted values from the actual values: residuals = yi - (b0 + b1 * xi)
Introduction to Linear Regression
Linear regression is a statistical method used to model the relationship between two variables. The goal is to create a linear equation that best predicts the value of one variable based on the value of another variable.
Prerequisites
Before starting, ensure you have a basic understanding of algebra and statistics. You will need a dataset with two variables, x and y.
Step-by-Step Calculation
To calculate the linear regression by hand, follow these steps:
Step 1: Gather Your Inputs
First, identify your dataset and calculate the following:
- The mean of x (x̄) and the mean of y (ȳ)
- The deviations from the mean for x and y
- The products of the deviations for x and y
Step 2: Calculate the Slope (b1)
Next, plug in the values into the slope formula: b1 = Σ[(xi - x̄)(yi - ȳ)] / Σ(xi - x̄)^2
Step 3: Calculate the Intercept (b0)
Then, use the slope and the means to calculate the intercept: b0 = ȳ - b1 * x̄
Step 4: Calculate the Correlation Coefficient (r)
The correlation coefficient measures the strength of the relationship: r = Σ[(xi - x̄)(yi - ȳ)] / sqrt(Σ(xi - x̄)^2 * Σ(yi - ȳ)^2)
Step 5: Calculate the Residuals
Finally, calculate the residuals by subtracting the predicted values from the actual values: residuals = yi - (b0 + b1 * xi)
Worked Example
Suppose we have the following dataset:
| x | y |
|---|---|
| 1 | 2 |
| 2 | 3 |
| 3 | 5 |
| 4 | 7 |
| 5 | 8 |
First, calculate the means: x̄ = 3, ȳ = 5 Then, calculate the deviations and products:
| x | y | x - x̄ | y - ȳ | (x - x̄)(y - ȳ) | (x - x̄)^2 |
|---|---|---|---|---|---|
| 1 | 2 | -2 | -3 | 6 | 4 |
| 2 | 3 | -1 | -2 | 2 | 1 |
| 3 | 5 | 0 | 0 | 0 | 0 |
| 4 | 7 | 1 | 2 | 2 | 1 |
| 5 | 8 | 2 | 3 | 6 | 4 |
Now, calculate the slope: b1 = (6 + 2 + 0 + 2 + 6) / (4 + 1 + 0 + 1 + 4) = 16 / 10 = 1.6 Next, calculate the intercept: b0 = 5 - 1.6 * 3 = 5 - 4.8 = 0.2 The correlation coefficient: r = 16 / sqrt(10 * 28) = 16 / sqrt(280) = 16 / 16.73 = 0.96 Finally, calculate the residuals:
| x | y | predicted | residual |
|---|---|---|---|
| 1 | 2 | 2.8 | -0.8 |
| 2 | 3 | 4.4 | -1.4 |
| 3 | 5 | 5.8 | -0.8 |
| 4 | 7 | 7.2 | -0.2 |
| 5 | 8 | 8.6 | -0.6 |
Common Mistakes to Avoid
- Forgetting to calculate the means of x and y
- Incorrectly calculating the deviations and products
- Using the wrong formula for the slope or intercept
When to Use a Calculator
While it's possible to calculate linear regression by hand, it's often more convenient to use a calculator or statistical software, especially for larger datasets. This can save time and reduce the risk of errors.
Conclusion
Linear regression is a powerful tool for modeling the relationship between two variables. By following these steps and using the formulas provided, you can calculate the regression slope, intercept, correlation coefficient, and residuals by hand. However, for larger datasets or more complex analyses, it's often better to use a calculator or statistical software.