Skip to content

Grinstead & Snell Chapters 4-6: Probability Distributions and Entropy

These chapters transition from discrete to continuous probability, introduce expected value and variance as key summary statistics, and culminate in Shannon entropy as a measure of uncertainty. This material is foundational for information theory.

  1. [[Continuous Random Variables]] — Variables taking values on a continuum; described by probability density functions (PDFs) rather than probability mass functions
  2. [[Expected Value]] — The “center of mass” of a distribution; E[X]=xf(x)dxE[X] = \int x f(x) dx
  3. [[Variance and Standard Deviation]] — Measures of spread; Var(X)=E[(Xμ)2]\text{Var}(X) = E[(X - \mu)^2]
  4. [[Common Distributions]] — Uniform, exponential, normal, and their properties
  5. [[Shannon Entropy]] — The expected surprise of a random variable; H(X)=p(x)logp(x)H(X) = -\sum p(x) \log p(x)

For a continuous random variable XX with PDF f(x)f(x):

P(aXb)=abf(x)dxP(a \leq X \leq b) = \int_a^b f(x) \, dx

Note: P(X=a)=0P(X = a) = 0 for any specific value aa. Probability mass is “smeared” over intervals.

For any random variables XX and YY (not necessarily independent):

E[aX+bY]=aE[X]+bE[Y]E[aX + bY] = aE[X] + bE[Y]

This is surprisingly powerful—it holds even when XX and YY are dependent.

For independent XX and YY:

Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)

Note: Independence is required here, unlike for expectation.

TermDefinition
PDFA function f(x)0f(x) \geq 0 with f(x)dx=1\int f(x) dx = 1; P(XA)=Af(x)dxP(X \in A) = \int_A f(x) dx
CDFF(x)=P(Xx)=xf(t)dtF(x) = P(X \leq x) = \int_{-\infty}^x f(t) dt
Expected ValueE[X]=xf(x)dxE[X] = \int x f(x) dx (continuous) or xp(x)\sum x p(x) (discrete)
VarianceVar(X)=E[X2](E[X])2\text{Var}(X) = E[X^2] - (E[X])^2
EntropyH(X)=p(x)logp(x)H(X) = -\sum p(x) \log p(x)
NameEquationUse
Normal PDFf(x)=1σ2πe(xμ)2/2σ2f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-(x-\mu)^2/2\sigma^2}Most common distribution; Central Limit Theorem
Exponential PDFf(x)=λeλxf(x) = \lambda e^{-\lambda x} for x0x \geq 0Waiting times; memoryless property
Binary EntropyH(p)=plogp(1p)log(1p)H(p) = -p\log p - (1-p)\log(1-p)Entropy of a biased coin
Entropy boundH(X)lognH(X) \leq \log nMaximum entropy is uniform
  1. Entropy maximization (uniform) — Key insight: use Lagrange multipliers with normalization constraint; the uniform distribution maximizes pilogpi-\sum p_i \log p_i subject to pi=1\sum p_i = 1.

  2. Variance decomposition — Key insight: Var(X)=E[X2](E[X])2\text{Var}(X) = E[X^2] - (E[X])^2 follows from expanding (Xμ)2(X - \mu)^2 and using linearity.

  • Entropy is the central concept; everything flows from here
  • Expected value is used in defining entropy (HH is the expected surprisal)
  • Normal distribution appears in maximum entropy with fixed mean and variance
  • The exponential distribution is maximum entropy with fixed mean (see [[Maxent Mean Constraint]])
  • [[How Does Differential Entropy Differ From Discrete Entropy?]]
  • Why can differential entropy be negative while discrete entropy can’t?
  • Problem 4.2: Entropy of biased coin
  • Problem 5.15: Expected value of geometric distribution
  • Problem 6.8: Entropy of mixture distribution
  • Problem 6.12: Prove entropy maximization for uniform

The transition from discrete to continuous distributions requires careful handling. In particular:

  • P(X=a)=0P(X = a) = 0 is counterintuitive; you have to think about intervals, not points
  • Differential entropy can be negative, which seems to violate the intuition that “uncertainty is non-negative”
  • The definition of conditional PDF requires division by a probability that equals zero… resolved via limits

Linearity of expectation doesn’t require independence! This is incredibly useful. For example, to find the expected number of fixed points in a random permutation, you can sum indicator variables without worrying about their complex dependencies.

  • Review section on conditional expectation
  • Work through the entropy exercises more carefully
  • Connect to [[Cover & Thomas Chapter 2]] for information-theoretic perspective
  • Build the [[Binary Entropy Explorer]] demonstration