Least-Squares Method

The calculator uses the ordinary least-squares (OLS) method to find the line ŷ = mx + b that minimises the sum of squared residuals. It returns the slope m, y-intercept b, R² goodness-of-fit, and the Pearson correlation coefficient r.

Interpreting R²

R² range	Fit quality	Interpretation
> 0.9	Strong	The line explains more than 90% of the variance in the data.
0.7 – 0.9	Good	Solid predictive power for many applied contexts.
0.5 – 0.7	Moderate	The model captures a meaningful trend but other variables matter.
< 0.5	Weak	The linear model explains little of the observed variation.

Important: R² measures correlation, not causation. A high R² does not mean that X causes Y - both variables might be driven by a third confounding factor.

Residuals

A residual is the difference between an observed value and the model's prediction: e = y − ŷ. OLS minimises the sum of squared residuals. Examining a residual plot (residuals vs. fitted values) reveals model problems:

Non-random patterns: suggest non-linearity - a higher-order or different model may fit better.
Fan shape (heteroscedasticity): variance increases with fitted values - a log transformation of y often helps.
Outliers: individual points with large residuals may unduly influence the slope estimate.

OLS assumptions

Linearity: the true relationship between X and Y is linear.
Independence: observations are independent of each other (violated by time-series data without correction).
Homoscedasticity: the variance of residuals is constant across all values of X.
Normality of residuals: residuals are approximately normally distributed (required for valid hypothesis tests and confidence intervals, not for the regression itself).

Worked example

Hours studied (x)	Exam score (y)
1	50
2	58
3	65
4	73
5	80

For this dataset: slope m ≈ 7.5, intercept b ≈ 42.5, giving the line ŷ = 7.5x + 42.5. R² ≈ 0.998 - a near-perfect linear fit. The slope says each additional hour of study is associated with ~7.5 more points on the exam.