ANALYSIS OF VARIANCE

Most people learn statistics like they're memorizing a menu. T-test for this, ANOVA for You measure something - a reaction time, a blood pressure, a click rate. You get a number. Measure it again. You get a different number. Measure it a hundred times. You get a cloud of numbers hovering around some center.That cloud is the heartbeat of all of statistics. Every method you've ever learned is, at its core, asking the same question: is what I'm seeing signal, or is it just the noise doing noisy things?

The signal-to-noise ratio is the primitive. Everything else is arithmetic.

Why ANOVA Exists?

Every measurement deviates from a reference point. That deviation is variation within: random noise, individual differences, uncontrolled factors. Groups can also differ from each other in their central values. That is variation between: the systematic signal of interest.

From first principles, ANOVA separates total variation into these two components and asks a single question: is the variation between groups large relative to the variation within groups?

First Principle Ideas

1. Start with one fluctuation

Take the simplest possible object: a deviation from some center. Let $X$ be a random quantity and let $\mu$ be its center $mean$. A fluctuation is the difference: $$ X - \mu $$ Example: a dart misses the bullseye by some amount. That miss is a fluctuation. It can be positive or negative depending on direction.

So at the most basic level, we begin with: * one deviation $X - \mu$ * one random miss * one movement away from the center That is the raw material.

2. From one fluctuation to total variation

One deviation alone does not tell you much. You want to know how much randomness exists overall. So you take many deviations: * square each one * add them together Formally, for observations $X_1, X_2, \dots, X_n$, the total variation is: $$ \sum_{i=1}^{n} \left(X_i - \mu\right)^2 $$

Why square? * because opposite directions should not cancel * because you care about size, not sign

Why add? * because you want the total amount of scatter

This creates a single object: a pile of squared deviations. That is the χ² idea in substance. In fact, when standardised, this becomes: $$ \chi^2 = \sum_{i=1}^{n} \left(\frac{X_i - \mu}{\sigma}\right)^2 $$

So χ² is just: total variation built out of many squared fluctuations

3. From total variation to t

Now ask a new question.

Instead of asking, “How much variation is there overall?”, ask:

“Is this one deviation large relative to the background variation?”

Now you build a ratio.

Numerator: one deviation
Denominator: typical variation level

For a sample mean $\bar{X}$, the t-statistic is: $$ t = \frac{\bar{X} - \mu}{s / \sqrt{n}} $$

So t means:

one deviation, scaled by the amount of noise around it

If t is small, the deviation is ordinary. If t is large, the deviation stands out.

The key thing is that t still keeps direction:

positive means one side
negative means the other side

So t is a way of asking:

how far is one thing from expectation, relative to noise?

4. From t to F

Now shift the question again.

Instead of comparing one deviation to background variation, compare:

one pile of variation to
another pile of variation

That ratio is F.

So now:

Numerator: variation from one source
Denominator: variation from another source, usually background noise

Formally, if $S_1^2$ and $S_2^2$ are two variance estimates with degrees of freedom $d_1$ and $d_2$, then: $$ F = \frac{S_1^2 / d_1}{S_2^2 / d_2} $$

So F compares two scaled sums of squared deviations, asking whether one source of variation is larger than another.

In plain English, F asks:

is the variation in the first pile large compared to the variation in the reference pile?

Unlike t, F has no direction. It only measures size.

That is because both numerator and denominator are made out of squared quantities, and squaring removes sign.

5. Why t and F are connected

Now the bridge becomes natural.

If the numerator pile in F comes from just one deviation, then that pile is simply:

the square of that one deviation

So:

t compares deviation to noise
F compares squared deviation to squared noise scale

That is why, in the one-dimensional case: $$F = t^2$$

So F is not a separate universe. It is what happens when a one-direction comparison is turned into a pure size comparison.

Synthesis

The whole chain is:

start with one fluctuation
square and add fluctuations to build total variation
compare one fluctuation to variation to get t
compare one variation pile to another to get F

So each object is built from the previous one.

Nothing appears out of nowhere.

Fluctuation is the primitive
χ² is accumulated fluctuation
t is one fluctuation relative to accumulated fluctuation
F is one accumulated fluctuation relative to another

Connect to Linear Regression

Now regression is just the same logic wearing different clothes.

Suppose you are trying to explain exam scores using hours studied.

Regression splits the total variation in scores into two parts:

1. Explained variation

This is the variation captured by the model.

In words:

how much the fitted values differ from the overall average

If the model has real explanatory power, this pile becomes larger.

2. Leftover variation

This is the variation the model could not explain.

In words:

how much scatter remains after fitting the model

If the model is poor, this pile stays large.

Now regression constructs an F-ratio:

Numerator: explained variation per degree of freedom
Denominator: leftover variation per degree of freedom

So the regression F-statistic asks:

is the variation created by the model large relative to the variation still left as noise?

If yes, the model explains something real. If no, the model is not doing better than randomness.

Core Intuition

The same idea is being reused at every stage:

deviation is the raw object
squaring turns deviation into magnitude
summing builds total variation
ratios compare one magnitude to another

That is the entire architecture.

So when you see χ², t, F, and regression, do not think of four separate topics.

Think:

one idea, rebuilt layer by layer

Common Mistakes

Treating χ², t, and F as disconnected formulas instead of one construction
Missing that squaring removes direction, which is why F is always nonnegative
Thinking regression invents a new concept, when it is really just organizing variation into two piles and comparing them

Home Affirmations

On This Page

Why ANOVA Exists? First Principle Ideas 1. Start with one fluctuation 2. From one fluctuation to total variation 3. From total variation to t 4. From t to F 5. Why t and F are connected Synthesis Connect to Linear Regression 1. Explained variation 2. Leftover variation Core Intuition Common Mistakes