Skip to content

Grinstead 4.2: Entropy of a Biased Coin

A biased coin has probability pp of heads and 1p1-p of tails.

(a) Find the entropy H(p)H(p) as a function of pp. (b) For what value of pp is the entropy maximized? (c) Sketch the graph of H(p)H(p) for p[0,1]p \in [0, 1].

  • Two outcomes: Heads (probability pp), Tails (probability 1p1-p)
  • p[0,1]p \in [0, 1]
  • H(p)H(p) — the entropy as a function of bias
  • pp^* — the value that maximizes entropy
  • Shape of the entropy curve

Apply the Shannon entropy formula directly, then use calculus to find the maximum.

By definition of Shannon entropy:

H(p)=plog2p(1p)log2(1p)H(p) = -p \log_2 p - (1-p) \log_2 (1-p)

This is called the binary entropy function, often denoted Hb(p)H_b(p) or h(p)h(p).

Take the derivative:

dHdp=log2pppln2+log2(1p)+1p(1p)ln2\frac{dH}{dp} = -\log_2 p - \frac{p}{p \ln 2} + \log_2(1-p) + \frac{1-p}{(1-p) \ln 2} =log2p+log2(1p)1ln2+1ln2= -\log_2 p + \log_2(1-p) - \frac{1}{\ln 2} + \frac{1}{\ln 2} =log21pp= \log_2 \frac{1-p}{p}

Setting dHdp=0\frac{dH}{dp} = 0:

log21pp=0    1pp=1    p=12\log_2 \frac{1-p}{p} = 0 \implies \frac{1-p}{p} = 1 \implies p = \frac{1}{2}

Verification: The second derivative is:

d2Hdp2=1pln21(1p)ln2<0\frac{d^2H}{dp^2} = -\frac{1}{p \ln 2} - \frac{1}{(1-p) \ln 2} < 0

Since the second derivative is negative everywhere, p=1/2p = 1/2 is indeed a maximum.

Key points:

  • H(0)=0H(0) = 0 (certain tails)
  • H(1/2)=1H(1/2) = 1 bit (maximum uncertainty)
  • H(1)=0H(1) = 0 (certain heads)
  • Symmetric about p=1/2p = 1/2
  • Concave (bowl-shaped, opening down)
H(p)
1 | ___
| / \
| / \
| / \
0 |/_________________\___
0 0.5 1 p

[!success] Answer (a) H(p)=plog2p(1p)log2(1p)H(p) = -p \log_2 p - (1-p) \log_2(1-p)

(b) Maximum at p=1/2p^* = 1/2, where H(1/2)=1H(1/2) = 1 bit

(c) Symmetric, concave curve with max at center, zeros at endpoints

The binary entropy function is fundamental—it appears everywhere:

  • Channel capacity of binary symmetric channel: C=1Hb(p)C = 1 - H_b(p)
  • Bounds on error-correcting codes
  • Rate-distortion theory for binary sources

The symmetry H(p)=H(1p)H(p) = H(1-p) reflects that a coin biased toward heads has the same entropy as one equally biased toward tails. What matters is the degree of bias, not its direction.

  • What if we use natural log? Same shape, but maximum is ln20.693\ln 2 \approx 0.693 nats.
  • What about a three-sided die? See [[Ternary Entropy]].
  • [[Conditional Entropy of Binary Channel]]
  • [[Mutual Information of BSC]]

First attempt: forgot to apply chain rule when differentiating log2(1p)\log_2(1-p). The base-2 log introduces a factor of 1/ln21/\ln 2 that’s easy to miss.