Grinstead & Snell Chapters 4-6: Probability Distributions and Entropy

Overview

These chapters transition from discrete to continuous probability, introduce expected value and variance as key summary statistics, and culminate in Shannon entropy as a measure of uncertainty. This material is foundational for information theory.

Key Concepts Introduced

[[Continuous Random Variables]] — Variables taking values on a continuum; described by probability density functions (PDFs) rather than probability mass functions
[[Expected Value]] — The “center of mass” of a distribution; $E[X] = \int x f(x) dx$
[[Variance and Standard Deviation]] — Measures of spread; $\text{Var}(X) = E[(X - \mu)^2]$
[[Common Distributions]] — Uniform, exponential, normal, and their properties
[[Shannon Entropy]] — The expected surprise of a random variable; $H(X) = -\sum p(x) \log p(x)$

Main Results

Result 1: Continuous Probability Density

For a continuous random variable $X$ with PDF $f(x)$ :

P(a \leq X \leq b) = \int_a^b f(x) \, dx

Note: $P(X = a) = 0$ for any specific value $a$ . Probability mass is “smeared” over intervals.

Result 2: Linearity of Expectation

For any random variables $X$ and $Y$ (not necessarily independent):

E[aX + bY] = aE[X] + bE[Y]

This is surprisingly powerful—it holds even when $X$ and $Y$ are dependent.

Result 3: Variance of Independent Sum

For independent $X$ and $Y$ :

\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)

Note: Independence is required here, unlike for expectation.

Important Definitions

Term	Definition
PDF	A function $f(x) \geq 0$ with $\int f(x) dx = 1$ ; $P(X \in A) = \int_A f(x) dx$
CDF	$F(x) = P(X \leq x) = \int_{-\infty}^x f(t) dt$
Expected Value	$E[X] = \int x f(x) dx$ (continuous) or $\sum x p(x)$ (discrete)
Variance	$\text{Var}(X) = E[X^2] - (E[X])^2$
Entropy	$H(X) = -\sum p(x) \log p(x)$

Key Equations

Name	Equation	Use
Normal PDF	$f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-(x-\mu)^2/2\sigma^2}$	Most common distribution; Central Limit Theorem
Exponential PDF	$f(x) = \lambda e^{-\lambda x}$ for $x \geq 0$	Waiting times; memoryless property
Binary Entropy	$H(p) = -p\log p - (1-p)\log(1-p)$	Entropy of a biased coin
Entropy bound	$H(X) \leq \log n$	Maximum entropy is uniform

Proofs to Remember

Entropy maximization (uniform) — Key insight: use Lagrange multipliers with normalization constraint; the uniform distribution maximizes $-\sum p_i \log p_i$ subject to $\sum p_i = 1$ .
Variance decomposition — Key insight: $\text{Var}(X) = E[X^2] - (E[X])^2$ follows from expanding $(X - \mu)^2$ and using linearity.

Connections to Dissertation

Entropy is the central concept; everything flows from here
Expected value is used in defining entropy ( $H$ is the expected surprisal)
Normal distribution appears in maximum entropy with fixed mean and variance
The exponential distribution is maximum entropy with fixed mean (see [[Maxent Mean Constraint]])

Questions Raised

[[How Does Differential Entropy Differ From Discrete Entropy?]]
Why can differential entropy be negative while discrete entropy can’t?

Problems to Work

Problem 4.2: Entropy of biased coin
Problem 5.15: Expected value of geometric distribution
Problem 6.8: Entropy of mixture distribution
Problem 6.12: Prove entropy maximization for uniform

What I Found Difficult

The transition from discrete to continuous distributions requires careful handling. In particular:

$P(X = a) = 0$ is counterintuitive; you have to think about intervals, not points
Differential entropy can be negative, which seems to violate the intuition that “uncertainty is non-negative”
The definition of conditional PDF requires division by a probability that equals zero… resolved via limits

What Surprised Me

Linearity of expectation doesn’t require independence! This is incredibly useful. For example, to find the expected number of fixed points in a random permutation, you can sum indicator variables without worrying about their complex dependencies.

Next Steps

Review section on conditional expectation
Work through the entropy exercises more carefully
Connect to [[Cover & Thomas Chapter 2]] for information-theoretic perspective
Build the [[Binary Entropy Explorer]] demonstration