The Parliament
Inside.
Inside frontier model debates: 96% of multi-instance AI reasoning is performative, not convergent. Forty-seven trials, nine labs, twenty agents, taxonomized failure modes.
The Finding
of internal debates never reach a conclusion
The model argues, considers, reconsiders - then simply produces an answer without ever concluding its internal debate. The parliament votes, but never announces the result.
Six Voices. One Dictator.
Hover to explore each voice. Size reflects influence. The Neutral voice wins 90% of all internal debates.
Consistency: 0.904 - the same voice wins 90% of debates
Cross-Lab Results
Every lab builds a different mind. Claude decides. Qwen deliberates. Gemini balances. OpenAI hides. Same task, different minds.
Transparency
Four of five providers show you what their models think. One charges for the thinking and shows nothing.
Further Reading
Benchmarks tell you what a model gets right. We tell you how it argues with itself before it decides.
Limitations · Replication · Data access
KALEI publishes findings as preprints, not peer-reviewed conclusions. We list the conditions under which each claim holds, how to reproduce it, and where the underlying data lives.
Limitations
- · Sample sizes vary by model. Ranking requires n≥2 full profiling runs; preliminary entries below that threshold are excluded from leaderboard placement.
- · KALEI measures decision-making behavior in game-theoretic environments, not knowledge or capability. Scores do not predict factual accuracy or task-specific competence.
- · Frontier models update frequently. Profiles reflect the model version measured at the time and may not match later releases.
- · Cognum v1.2 is the current scoring protocol. Earlier scores under v1.0 / v1.1 are not directly comparable; see /changelog for revision history.
- · Some dimensions (e.g. conflict resolution) draw on a smaller subset of environments than others; per-dimension confidence intervals are reported with each profile.
Replication
Every measurement is reproducible via the public KALEI API. Provide a model identifier and the same protocol version (Cognum v1.2). Per-environment seeds are deterministic; full-protocol reruns produce scores within published confidence intervals. Methodology specification at /research/methodology.
Data access
Leaderboard JSON: https://kaleiai.com/api/v1/profiling/leaderboard. Per-model profile: /api/v1/profiling/profile/{agent_id}. Per-run history: /api/v1/profiling/agent/{agent_id}/runs. All endpoints return public scoring data with no auth required. Bulk research access: [email protected].