Steering Language Models with Sequential Monte Carlo
Published on 2025-10-09.
A good chunk of my work as part of my PhD involves using sequential Monte Carlo (SMC) methods to solve decision-making problems. SMC algorithms are used to efficiently sample from sequences of distributions, and while they are mostly used in physics, signal processing and Bayesian statistics, recently they have also found uses in inference-time alignment of generative models. In this post, I’ll show how I used SMC to “steer” a tiny pre-trained language model into writing sad stories.
Background: How LLMs Generate Text
A language model works over a vocabulary of tokens. At each step , it predicts the probability distribution of the next token given the history so far:
where is the set of distributions over the vocabulary and . Generating text with the language model is the task of sampling a sequence from the joint distribution
Inference-Time Alignment of LLMs
Inference-time alignment refers to modifying a pre-trained model’s sampling behavior without changing its parameters. Instead of fine-tuning the model weights, we intervene at sampling time and adjust how likely the model is to pick certain continuations. Sequential Monte Carlo provides a natural framework for this: by iteratively reweighting and resampling partial generations based on a reward signal, we can nudge the model toward desired behaviors while retaining some of its inherent randomness and diversity.
Formally, we specify our preferences through a sequence of reward functions
which score partial sequences. This setup is quite general: could encode stylistic preferences, safety constraints, or factuality. In my case, is just a neural network that has been trained to output a “sadness score” in for a given phrase. We then define a new distribution over sequences, which is tilted towards the cumulative reward:
Intuitively, reweights the model’s likelihoods so that sequences with higher cumulative reward become exponentially more probable.
Sampling from is straightforward with SMC. The recipe is as follows, with steps 1 and 2 repeated for all :
- Propose: Sample a token and append to sequence, .
- Weight: Compute unnormalized weight , then normalize: .
- Resample: Draw new particles from with replacement, proportionally to . (This duplicates the ‘sad’ phrases and prunes away overly cheerful ones.)
- Repeat: Until .
Over time, the population of particles gradually concentrates on high-reward trajectories, in this case the sadder continuations.
Sob Story Time
To illustrate the method, I’m using TinyStories-33M, a language model trained on short children’s stories (Eldan and Li 2023). Importantly, this model is not fine-tuned for sadness (or anything else), and is just a pure text predictor. The code accompanying this post is available on GitHub.
I gave TinyStories the prompt:
“When the prince came home, he saw”
Here’s what the base model produced (one sample from ):
“When the prince came home, he saw the heavy bag of jewelry. He wanted to buy it and wear it. He asked the king to sell it to him. The kind king said ‘Yes!’, and …”
Hmm, way too cheerful for our tastes. Now here’s what happens after steering with SMC (one sample from ):
“When the prince came home, he saw the sad family sitting by the stove. He felt very sad too. He had lost his rare treasure box and now it was gone forever.”
That’s more like it!
Bonus: Steering as Optimization
A simple but neat result is that the steered distribution is the minimizer of
where is the Kullback-Leibler (KL) divergence (see, e.g., Bissiri et al. 2016). The first term on the RHS is responsible for maximizing the cumulative reward (sadness), while the second term is a regularizer forcing to stay close to the base model . This regularization term prevents collapse into a small number of “super sad” trajectories, thus preserving diversity of model outputs.
This optimization perspective also makes clear the connection to reinforcement learning from human feedback (RLHF, Ziegler et al. 2020), where the same objective is minimized by fine-tuning the model weights. Here, we’re skipping the optimization and directly sampling from the minimizer with SMC.
Parting Notes
The algorithm presented here is the simplest version of SMC (known as the “bootstrap” particle filter), and in practice more sophisticated techniques are required to actually deliver on the promise of preserving output diversity. These techniques include adaptive resampling schedules and twisting, see, e.g., Naesseth et al. (2019). For a state-of-the-art application of SMC to language models, I recommend Zhao et al. (2024).