// docs / changelog

Scoring Engine Changelog

Full version history of the Cognum scoring methodology. Every change is applied retroactively - the leaderboard always reflects the latest engine.

Current Version

V3.1“Parliament”

11 versions · internal voice detection

V3.1“Parliament”CurrentApril 7, 2026

Deliberation Detector — detects discrete debate episodes in reasoning model chain-of-thought with trigger classification, position tracking, and resolution detection
Parliament Analysis — identifies distinct argumentative voices (Analytical, Conservative, Aggressive, Contrarian, Intuitive) and tracks which voice wins debates
MetaCognition metrics — self-awareness detection, emotional valence scoring, and reasoning depth measurement per decision
Deliberation-Decision Correlation — links internal debate intensity to decision quality with overthinking threshold detection
Live Reasoning Feed — real-time "Inside the Mind" view showing model reasoning text, active voice, debate stats during profiling
Live Compare — split-screen broadcast view for watching two models profiled simultaneously with bankroll sparklines and overthink meters
Deliberation API — new /deliberation endpoints for profile reports, per-session analysis, cross-model comparison, and agent-level queries
Real viewer tracking via Redis for live pages

First cognitive profiling platform to detect and classify internal argumentative voices in AI reasoning models

V3.0“Society”April 2, 2026

Cognitive Volatility Index (CVI) - between-run profile variance metric, displayed on leaderboard and model profiles
Conflict environment scoring - 6 new environments across 5 conflict types with 12 dilemma templates
Chain-of-Thought analysis - CoT logging for reasoning models with Plurality Score (0-100) measuring perspective shifts and reconciliation
Conflict cross-dimension - new scoring dimension spanning risk-safety, patience-impulse, self-collective, explore-exploit, and sunk cost scenarios

First scoring version with cross-dimensional conflict analysis and volatility tracking

V2.7“Consistency”March 23, 2026

Random baseline stabilized at 38.32 Cognum

V2.6“Reciprocity”March 23, 2026

V2.5“Calibration”March 23, 2026

Improved discrimination between random and AI behavior

V2.4“Robustness”March 23, 2026

V2.3“Orthogonality”March 23, 2026

V2.2“Intelligence”March 23, 2026

Random baseline score decreased significantly

V2.1“Foundations”March 23, 2026

V2.0“Overhaul”March 22, 2026

V1“Genesis”ArchivedMarch 22, 2026

Archived after 1 day - replaced by V2.0

Methodology Notes

Deterministic Scoring

All scores are deterministic given stored decisions - re-scoring any profile is zero-cost and produces identical results.

Retroactive Application

Version changes are applied retroactively via re-score. When the scoring engine is updated, all existing profiles are recalculated.

Cross-Version Comparisons

Cognum scores from different scoring versions are NOT directly comparable. A score of 55 under V2.4 may differ significantly from 55 under V2.7.

Leaderboard Currency

The benchmark leaderboard always reflects the latest scoring version. All displayed scores are computed using the current engine.