What is a z-score?

A z-score (also called a standard score) measures how many standard deviations a data point is from the mean of its distribution. A z-score of 0 means exactly at the mean; +1.0 means one standard deviation above the mean; −2.0 means two standard deviations below.

Formula: z = (x − μ) / σ

Where x is the observed value, μ is the population mean, and σ is the standard deviation.

The standard normal distribution

The standard normal distribution (mean = 0, standard deviation = 1) is a bell-shaped curve symmetric around zero. The z-score tells you where your observation falls on this curve. The area under the curve to the left of your z-score is the cumulative probability (p-value or percentile):

z = −1.96 -> 2.5th percentile (bottom 2.5% of the distribution)
z = 0 -> 50th percentile (exactly at the mean)
z = +1.96 -> 97.5th percentile (top 2.5%)

The region between z = −1.96 and z = +1.96 captures 95% of the normal distribution - the basis of 95% confidence intervals and two-tailed hypothesis tests.

The 68-95-99.7 rule (empirical rule)

Range	Data within range
μ ± 1σ (z = −1 to +1)	~68.3% of observations
μ ± 2σ (z = −2 to +2)	~95.4% of observations
μ ± 3σ (z = −3 to +3)	~99.7% of observations

Common uses

Standardized testing: SAT/ACT scores, IQ scores, and other standardized tests convert raw scores to z-scores for comparison across different test versions.
Finance: the Altman Z-score is a classic bankruptcy prediction model. Portfolio risk uses z-scores to identify outlier returns.
Medical research: clinical lab values are often reported as z-scores compared to a reference population.
Quality control: in Six Sigma manufacturing, a process is “Six Sigma” quality if defects fall beyond ±6σ from the mean - approximately 3.4 defects per million opportunities.

p-value from z-score

The area under the standard normal curve to the left of a z-score is the cumulative probability - equivalently, the p-value for a one-tailed test or the percentile rank of that observation. The key distinction:

One-tailed p-value: the probability of observing a value at least as extreme as your z-score in one direction. For z = +1.645, the one-tailed p-value is 0.05 (5%).
Two-tailed p-value: the probability in both tails combined. For z = ±1.96, the two-tailed p-value is 0.05 - the conventional threshold for statistical significance in many fields (α = 0.05). This is why ±1.96 appears so frequently in confidence intervals and hypothesis tests.

A two-tailed test is appropriate when you are asking "is this different from the mean in either direction?" A one-tailed test is appropriate when you have a directional hypothesis ("is this greater than the mean?").

Practical examples

Standardising a test score: a class has a mean of 70 and a standard deviation of 10. A student scores 85. z = (85 − 70) / 10 = +1.5. The student scored 1.5 standard deviations above the mean - approximately the 93rd percentile.
Comparing across distributions: a runner finishes a 5K in 22 minutes (population mean 25 min, SD 3 min) and a 10K in 48 minutes (mean 55 min, SD 6 min). 5K z-score = (22 − 25) / 3 = −1.0; 10K z-score = (48 − 55) / 6 ≈ −1.17. The runner performed slightly better relative to peers in the 10K.
Outlier detection: a common threshold is |z| > 3. Any data point more than 3 standard deviations from the mean occurs in fewer than 0.3% of a normal distribution and warrants investigation as a potential outlier or data error.
Normalising for machine learning: z-score normalisation (standardisation) scales features to have mean = 0 and SD = 1. This prevents features with large ranges (e.g., income in dollars) from dominating features with small ranges (e.g., age in years) in distance-based algorithms like k-nearest neighbours and SVM.

Related distributions

The z-score assumes a known population standard deviation and a large sample size. When these assumptions don't hold, related distributions are used:

t-distribution: used when the population SD is unknown and the sample is small. Has heavier tails than the normal distribution, reflecting additional uncertainty. Converges to the normal distribution as sample size increases (n > 30 is often cited as the practical threshold).
Chi-squared distribution (χ²): used for categorical data tests (goodness of fit, independence in contingency tables). A chi-squared statistic is the sum of squared z-scores, so it is directly related to the normal distribution.
F-distribution: used in ANOVA (analysis of variance) to compare the variance between groups to the variance within groups. An F-statistic is a ratio of two chi-squared variables, making it an extension of the same family.