Expected Information as Expected Utility
Published on 2025-06-08.
Imagine a scientist planning a clinical trial to determine the optimal dosage of a new diabetes drug. Due to constraints on budget and patient safety, only a limited number of tests can be conducted, so the scientist must decide which dosage levels to test to most effectively learn the relationship between dose and patient response. To make these decisions systematically, the scientist can turn to Bayesian experimental design (BED), which is a principled framework for designing optimal experiments under uncertainty.
Suppose the scientist models the dose–response relationship using a parametric form governed by unknown parameters —such as the maximum effect of the drug, the dose achieving half that effect, and the baseline response. To learn about these parameters, she must choose a design , which could specify, for example, a set of dosage levels and how patients are allocated to them. A widely used objective in BED is to choose the design that maximizes the expected information gain (EIG) (Lindley 1956):
It quantifies how much we expect to reduce our uncertainty about , starting from some prior belief , on applying the design . Here, denotes Shannon entropy (or differential entropy for continuous ), and the expectation is over possible outcomes with distribution .
The EIG is intuitive—it is the expected reduction in entropy from prior to posterior. But intuition aside, it raises a natural question. The reduction in Shannon entropy is only one of many ways to measure uncertainty reduction, or even more generally, the utility of an experiment (Huan et al. 2024). Is there, then, any fundamental reason to prefer the EIG over alternative utility measures?
It turns out the answer is yes. In Bernardo (1979), José M. Bernardo, a student of Dennis Lindley, gave a justification for the EIG grounded in decision theory. He showed that the EIG arises naturally from a decision problem under reasonable assumptions on the utility function. I find this result extremely remarkable, and this is an appreciation post about a half-century later :)
The Decision-Theoretic Setup
Consider a decision problem where a scientist has to report a distribution for , where the decision space is the set of probability measures on .1 The utility obtained if she reports when the true parameter is is quantified by means of a utility function .2 Suppose the scientist applies a design and observes an outcome , after which she updates her belief to using Bayes’ rule. Then, her expected utility when reporting is
Under the principle of maximum expected utility, the scientist should report the distribution that maximizes the above integral, which may not be her actual belief . Hence, to discourage lying, we require that is a strictly proper utility function, meaning that
with the supremum only attained at . In other words, truth-telling should be the optimal strategy.
The second restriction we need on the utility function is that it is local, i.e., for all . Locality means the utility function only depends on the density of the reported distribution at the true parameter , which feels reasonable (though admittedly not as easy to motivate as properness). The striking result of Bernardo (1979) is that if the utility function is strictly proper, local, and sufficiently smooth (as a function of ), then it is of the form
where the constant and function can be arbitrary. Plugging this back into the requirement that is proper, we see that the maximum expected utility is
Lo and behold, the Shannon entropy appears!
Back to Experimental Design
Now, the gain in expected utility for the scientist from the experiment is
which simplifies to
To assess the quality of a design , we take the expectation over the observation distribution , leaving us with
This shows that, for the purpose of choosing the best design, the choice of and is irrelevant, and the utility function can simply be the log density, .
Concluding Remarks
To motivate his paper, Bernardo writes in the abstract:
… a scientist typically does not have, nor can be normally expected to have, a clear idea of the utility of his results.
By considering a decision problem of choosing the best distribution to report, and showing that the EIG arises naturally under reasonable assumptions on the utility function, Bernardo makes a compelling argument for using the EIG in precisely those settings where the scientist cannot quantify the utility of her results a priori.
References
This decision problem is also adopted by Bissiri et al. (2016) to devise a more general framework for Bayesian inference.↩︎
Such a utility function is also known as a scoring rule (Gneiting and Raftery 2007), which measures how well a distribution explains an observed value of a random variable.↩︎