Skip to content

Boltzmann Entropy as Shannon Entropy

[!abstract] Core Insight [[Boltzmann Entropy]] (physics) and [[Shannon Entropy]] (information theory) are the same mathematical quantity measuring the same conceptual thing: uncertainty about the microstate given the macrostate.

They differ only in:

  1. Units (kBk_B vs. bits)
  2. Historical context
  3. Typical applications

In statistical mechanics, we have a system with many particles. The macrostate (temperature, pressure, volume) is what we measure. The microstate (exact positions and momenta of all particles) is unknowable.

Boltzmann entropy:

S=kBlnΩS = k_B \ln \Omega

where:

  • kB=1.38×1023k_B = 1.38 \times 10^{-23} J/K (Boltzmann’s constant)
  • Ω\Omega = number of microstates compatible with the macrostate

For a probability distribution over microstates:

S=kBipilnpiS = -k_B \sum_i p_i \ln p_i
  1. Extensive: S(A+B)=S(A)+S(B)S(A+B) = S(A) + S(B) for independent systems
  2. Maximum at equilibrium (uniform distribution over accessible states)
  3. Second Law: ΔS0\Delta S \geq 0 for isolated systems

We have a random variable XX representing some uncertain outcome. We want to quantify how much we don’t know before observing it.

Shannon entropy:

H(X)=ipilog2piH(X) = -\sum_i p_i \log_2 p_i

where pip_i is the probability of outcome ii.

  1. Additive: H(X,Y)=H(X)+H(Y)H(X,Y) = H(X) + H(Y) for independent X,YX, Y
  2. Maximum for uniform distribution
  3. Non-negative, zero only for deterministic outcomes
ThermodynamicsInformation Theory
MicrostateOutcome / Message
MacrostateConstraint / What we know
Ω\Omega (multiplicity)2H2^{H} (effective number of outcomes)
kBk_BConversion factor to physical units
EquilibriumMaximum entropy distribution
Heat bathNoisy channel
TemperatureInverse of Lagrange multiplier

The key equation:

SBoltzmann=kBln2HShannonS_{\text{Boltzmann}} = k_B \ln 2 \cdot H_{\text{Shannon}}

Or equivalently:

H=SkBln2H = \frac{S}{k_B \ln 2}

When physicists use natural logs and information theorists use log2\log_2:

S=kBHnats=kBln2HbitsS = k_B H_{\text{nats}} = k_B \ln 2 \cdot H_{\text{bits}}

Both entropies answer the same question: “How many yes/no questions do I need to specify the exact state?”

  • In physics: given macroscopic measurements, how many bits to specify the microstate?
  • In information theory: given the probability distribution, how many bits on average to identify the outcome?

The underlying structure is identical: a probability distribution over states, and a desire to quantify the “spread” or “uncertainty” of that distribution.

  • The Second Law is about information: isolated systems evolve toward states we can’t distinguish (maximum ignorance about microstates).
  • Entropy increase = losing track of microscopic details.
  • Maxwell’s demon is defeated by Landauer’s principle: erasing information costs kBTln2k_B T \ln 2 per bit.
  • There’s a thermodynamic cost to computation (especially erasure).
  • Channel capacity has a physical interpretation.
  • Compression is fighting the natural tendency toward maximum entropy.
  • Jaynes’ MaxEnt: Use maximum entropy as an inference principle—it’s not just physics, it’s rational belief formation.
  • Landauer’s bound: kBTln22.9×1021k_B T \ln 2 \approx 2.9 \times 10^{-21} J at room temperature is the minimum energy to erase one bit.
  • Reversible computation: In principle, computation needn’t cost energy; only erasure does.
  1. 1865: Clausius introduces entropy as dS=δQ/TdS = \delta Q / T
  2. 1877: Boltzmann connects to microstates: S=klnWS = k \ln W
  3. 1929: Szilard analyzes Maxwell’s demon information-theoretically
  4. 1948: Shannon defines information entropy (reportedly suggested by von Neumann to “call it entropy—no one knows what it means”)
  5. 1957: Jaynes unifies via MaxEnt principle
  6. 1961: Landauer proves erasure costs energy

The connection was suspected early (von Neumann’s quip) but not rigorously established until Jaynes.

  • Negative temperatures: Thermodynamic entropy can handle population inversions; Shannon entropy is always positive.
  • Continuous distributions: Differential entropy has different properties (can be negative, not invariant under coordinate change).
  • Quantum: von Neumann entropy S=Tr(ρlnρ)S = -\text{Tr}(\rho \ln \rho) extends both, but has additional subtleties (entanglement entropy).
  • Jaynes, E.T. (1957). “Information Theory and Statistical Mechanics”
  • Landauer, R. (1961). “Irreversibility and Heat Generation in the Computing Process”
  • Bennett, C. (1982). “The Thermodynamics of Computation—A Review”
  • Cover & Thomas, Elements of Information Theory, Chapter 2