Math Calculators
Linear Regression Calculator - Slope, Intercept & R²
Compute least-squares linear regression for a set of (x, y) data points. Get the slope, intercept, R² coefficient of determination, Pearson correlation, and a scatter plot with the best-fit line.
Slope (m)
1.990000
Intercept (b)
0.050000
R²
0.997305
Pearson r
0.998652
Least-Squares Method
The calculator uses the ordinary least-squares (OLS) method to find the line ŷ = mx + b that minimises the sum of squared residuals. It returns the slope m, y-intercept b, R² goodness-of-fit, and the Pearson correlation coefficient r.
Interpreting R²
| R² range | Fit quality | Interpretation |
|---|---|---|
| > 0.9 | Strong | The line explains more than 90% of the variance in the data. |
| 0.7 – 0.9 | Good | Solid predictive power for many applied contexts. |
| 0.5 – 0.7 | Moderate | The model captures a meaningful trend but other variables matter. |
| < 0.5 | Weak | The linear model explains little of the observed variation. |
Important: R² measures correlation, not causation. A high R² does not mean that X causes Y - both variables might be driven by a third confounding factor.
Residuals
A residual is the difference between an observed value and the model's prediction: e = y − ŷ. OLS minimises the sum of squared residuals. Examining a residual plot (residuals vs. fitted values) reveals model problems:
- Non-random patterns: suggest non-linearity - a higher-order or different model may fit better.
- Fan shape (heteroscedasticity): variance increases with fitted values - a log transformation of y often helps.
- Outliers: individual points with large residuals may unduly influence the slope estimate.
OLS assumptions
- Linearity: the true relationship between X and Y is linear.
- Independence: observations are independent of each other (violated by time-series data without correction).
- Homoscedasticity: the variance of residuals is constant across all values of X.
- Normality of residuals: residuals are approximately normally distributed (required for valid hypothesis tests and confidence intervals, not for the regression itself).
Worked example
| Hours studied (x) | Exam score (y) |
|---|---|
| 1 | 50 |
| 2 | 58 |
| 3 | 65 |
| 4 | 73 |
| 5 | 80 |
For this dataset: slope m ≈ 7.5, intercept b ≈ 42.5, giving the line ŷ = 7.5x + 42.5. R² ≈ 0.998 - a near-perfect linear fit. The slope says each additional hour of study is associated with ~7.5 more points on the exam.