10 LLMs Disagree
When you ask ten language models the same question, you don’t get ten copies of the same answer. You get ten genuinely different interpretations - shaped by architecture, training data, and the emergent personality of each model.
Most systems try to resolve this into consensus. Chaos does the opposite: it celebrates disagreement as a feature, not a bug.
Disagreement between models is signal, not noise. Each divergence maps a boundary in the latent space of possible answers.
The diversity hypothesis
If a single model gives you one perspective, ten models give you a landscape. Not an averaged, blurred landscape - a terrain with actual peaks and valleys, each one a different way of seeing.
# Fan out the same prompt to multiple models
responses = await asyncio.gather(*[
model.generate(prompt)
for model in ensemble
])
# Measure semantic distance between responses
distances = pairwise_cosine(responses)
diversity_score = distances.mean() Higher diversity scores often correlate with questions that have no single correct answer - exactly the kind worth exploring.
Don’t confuse model diversity with hallucination. Diverse outputs from grounded models reveal genuine interpretive breadth, not errors.
Further reading
Open-source project that turns LLM disagreement into structured insight through ensemble reasoning.
github.com