Most people learn statistics like they're memorizing a menu. T-test for this, ANOVA for You measure something - a reaction time, a blood pressure, a click rate. You get a number. Measure it again. You get a different number. Measure it a hundred times. You get a cloud of numbers hovering around some center.That cloud is the heartbeat of all of statistics. Every method you've ever learned is, at its core, asking the same question: is what I'm seeing signal, or is it just the noise doing noisy things?

The signal-to-noise ratio is the primitive. Everything else is arithmetic.

Why ANOVA Exists?

Every measurement deviates from a reference point. That deviation is variation within: random noise, individual differences, uncontrolled factors. Groups can also differ from each other in their central values. That is variation between: the systematic signal of interest.

From first principles, ANOVA separates total variation into these two components and asks a single question: is the variation between groups large relative to the variation within groups?

First Principle Ideas

1. Start with one fluctuation

Take the simplest possible object: a deviation from some center. Let \(X\) be a random quantity and let \(\mu\) be its center \(mean\). A fluctuation is the difference: $$ X - \mu $$ Example: a dart misses the bullseye by some amount. That miss is a fluctuation. It can be positive or negative depending on direction.

So at the most basic level, we begin with: * one deviation \(X - \mu\) * one random miss * one movement away from the center That is the raw material.

2. From one fluctuation to total variation

One deviation alone does not tell you much. You want to know how much randomness exists overall. So you take many deviations: * square each one * add them together Formally, for observations \(X_1, X_2, \dots, X_n\), the total variation is: $$ \sum_{i=1}^{n} \left(X_i - \mu\right)^2 $$

Why square? * because opposite directions should not cancel * because you care about size, not sign

Why add? * because you want the total amount of scatter

This creates a single object: a pile of squared deviations. That is the χ² idea in substance. In fact, when standardised, this becomes: $$ \chi^2 = \sum_{i=1}^{n} \left(\frac{X_i - \mu}{\sigma}\right)^2 $$

So χ² is just: total variation built out of many squared fluctuations

3. From total variation to t

Now ask a new question.

Instead of asking, “How much variation is there overall?”, ask:

“Is this one deviation large relative to the background variation?”

Now you build a ratio.

  • Numerator: one deviation
  • Denominator: typical variation level

For a sample mean \(\bar{X}\), the t-statistic is: $$ t = \frac{\bar{X} - \mu}{s / \sqrt{n}} $$

So t means:

one deviation, scaled by the amount of noise around it

If t is small, the deviation is ordinary. If t is large, the deviation stands out.

The key thing is that t still keeps direction:

  • positive means one side
  • negative means the other side

So t is a way of asking:

how far is one thing from expectation, relative to noise?

4. From t to F

Now shift the question again.

Instead of comparing one deviation to background variation, compare:

  • one pile of variation to
  • another pile of variation

That ratio is F.

So now:

  • Numerator: variation from one source
  • Denominator: variation from another source, usually background noise

Formally, if \(S_1^2\) and \(S_2^2\) are two variance estimates with degrees of freedom \(d_1\) and \(d_2\), then: $$ F = \frac{S_1^2 / d_1}{S_2^2 / d_2} $$

So F compares two scaled sums of squared deviations, asking whether one source of variation is larger than another.

In plain English, F asks:

is the variation in the first pile large compared to the variation in the reference pile?

Unlike t, F has no direction. It only measures size.

That is because both numerator and denominator are made out of squared quantities, and squaring removes sign.

5. Why t and F are connected

Now the bridge becomes natural.

If the numerator pile in F comes from just one deviation, then that pile is simply:

the square of that one deviation

So:

  • t compares deviation to noise
  • F compares squared deviation to squared noise scale

That is why, in the one-dimensional case: $\(F = t^2\)$

So F is not a separate universe. It is what happens when a one-direction comparison is turned into a pure size comparison.

Synthesis

The whole chain is:

  • start with one fluctuation
  • square and add fluctuations to build total variation
  • compare one fluctuation to variation to get t
  • compare one variation pile to another to get F

So each object is built from the previous one.

Nothing appears out of nowhere.

  • Fluctuation is the primitive
  • χ² is accumulated fluctuation
  • t is one fluctuation relative to accumulated fluctuation
  • F is one accumulated fluctuation relative to another

Connect to Linear Regression

Now regression is just the same logic wearing different clothes.

Suppose you are trying to explain exam scores using hours studied.

Regression splits the total variation in scores into two parts:

1. Explained variation

This is the variation captured by the model.

In words:

how much the fitted values differ from the overall average

If the model has real explanatory power, this pile becomes larger.

2. Leftover variation

This is the variation the model could not explain.

In words:

how much scatter remains after fitting the model

If the model is poor, this pile stays large.

Now regression constructs an F-ratio:

  • Numerator: explained variation per degree of freedom
  • Denominator: leftover variation per degree of freedom

So the regression F-statistic asks:

is the variation created by the model large relative to the variation still left as noise?

If yes, the model explains something real. If no, the model is not doing better than randomness.

Core Intuition

The same idea is being reused at every stage:

  1. deviation is the raw object
  2. squaring turns deviation into magnitude
  3. summing builds total variation
  4. ratios compare one magnitude to another

That is the entire architecture.

So when you see χ², t, F, and regression, do not think of four separate topics.

Think:

one idea, rebuilt layer by layer

Common Mistakes

  • Treating χ², t, and F as disconnected formulas instead of one construction
  • Missing that squaring removes direction, which is why F is always nonnegative
  • Thinking regression invents a new concept, when it is really just organizing variation into two piles and comparing them