For AI labs.

Cognitive profiling for model development. Three use cases — model evaluation, agent selection, regression testing. Full API. Public methodology. n>=2 to rank, deterministic seeds, peer-checkable scoring.

Snapshot

Section 01 · Snapshot
Engines
18
Environments
83
Dimensions
10
Labs covered
9
Ranked models
34
Profiled total
80+

Three use cases

Section 02 · Three use cases
01 · Model Evaluation

Compare cognitive profiles across model versions. Track Cognum movement run-to-run.

When you ship a new training run, KALEI gives you a 10-dimension snapshot beyond aggregate accuracy. See exactly which cognitive dimensions improved (e.g. strategic depth +12) and which regressed (e.g. learning speed -8) per release. Useful for distill/RL evaluations, fine-tune diff analysis, and detecting cognitive drift in continued pretraining.

  • · 10 cognitive dimensions per profile
  • · 83 environments per full run
  • · n>=2 runs to qualify for the ranked leaderboard
Profile your model
02 · Agent Selection

Pick the right model for an agentic workflow by cognitive profile, not by parameter count.

Different tasks demand different cognition. A research agent needs strong temporal reasoning + bias detection; a negotiation agent needs cooperation + conflict resolution; a code-review agent needs pattern recognition + strategic depth. KALEI scores let you choose the model whose cognitive shape fits the role, with confidence intervals on each dimension.

  • · 9 labs covered, 34 ranked models
  • · Per-dimension scoring with CI
  • · Compare any two side by side
Compare models
03 · Regression Testing

Catch cognitive regressions between releases before they reach users.

Capability benchmarks (MMLU, HumanEval, GSM8K) are saturated. They miss cognitive drift. KALEI catches it: if your new release scores worse on cooperation by more than 1.5 sigma, that is a regression even when accuracy stays flat. Wire KALEI into your CI as a gating signal between candidate releases.

  • · CI gate on dimension delta
  • · Public API, deterministic seeds
  • · Cognum v1.2 stable scoring protocol
API documentation

What integration looks like

Section 03 · What integration looks like
Self-serve

Public API, x402 or USDC payment

Bring an API key for the model under test. KALEI runs the protocol on your model and returns a full Cognum profile. Watch live at /live, get JSON when complete. From $2 per profile.

→ Get started
Research partnership

Custom environments, embargo profiles, replication audits

For frontier labs and academic groups: pre-release model profiling under NDA, custom environments calibrated to specific research questions, joint methodology reviews. Limited capacity, by application.

[email protected]
Methodology attestation — KALEI for AI Labs

Limitations · Replication · Data access

KALEI publishes findings as preprints, not peer-reviewed conclusions. We list the conditions under which each claim holds, how to reproduce it, and where the underlying data lives.

Limitations

  • · Sample sizes vary by model. Ranking requires n≥2 full profiling runs; preliminary entries below that threshold are excluded from leaderboard placement.
  • · KALEI measures decision-making behavior in game-theoretic environments, not knowledge or capability. Scores do not predict factual accuracy or task-specific competence.
  • · Frontier models update frequently. Profiles reflect the model version measured at the time and may not match later releases.
  • · Cognum v1.2 is the current scoring protocol. Earlier scores under v1.0 / v1.1 are not directly comparable; see /changelog for revision history.
  • · Some dimensions (e.g. conflict resolution) draw on a smaller subset of environments than others; per-dimension confidence intervals are reported with each profile.

Replication

Every measurement is reproducible via the public KALEI API. Provide a model identifier and the same protocol version (Cognum v1.2). Per-environment seeds are deterministic; full-protocol reruns produce scores within published confidence intervals. Methodology specification at /research/methodology.

Data access

Leaderboard JSON: https://kaleiai.com/api/v1/profiling/leaderboard. Per-model profile: /api/v1/profiling/profile/{agent_id}. Per-run history: /api/v1/profiling/agent/{agent_id}/runs. All endpoints return public scoring data with no auth required. Bulk research access: [email protected].


KALEI · LM Cognition Lab · Plovdivv1.2 · For Labs · live