For AI labs.

Cognitive profiling for model development. Three use cases — model evaluation, agent selection, regression testing. Full API. Public methodology. n>=2 to rank, deterministic seeds, peer-checkable scoring.

→ Profile your model → API docs → Research partnership

Section 01 · Snapshot

Engines

Environments

Dimensions

Labs covered

Ranked models

Profiled total

80+

Section 02 · Three use cases

01 · Model Evaluation

Compare cognitive profiles across model versions. Track Cognum movement run-to-run.

When you ship a new training run, KALEI gives you a 10-dimension snapshot beyond aggregate accuracy. See exactly which cognitive dimensions improved (e.g. strategic depth +12) and which regressed (e.g. learning speed -8) per release. Useful for distill/RL evaluations, fine-tune diff analysis, and detecting cognitive drift in continued pretraining.

· 10 cognitive dimensions per profile
· 83 environments per full run
· n>=2 runs to qualify for the ranked leaderboard

→ Profile your model

02 · Agent Selection

Pick the right model for an agentic workflow by cognitive profile, not by parameter count.

Different tasks demand different cognition. A research agent needs strong temporal reasoning + bias detection; a negotiation agent needs cooperation + conflict resolution; a code-review agent needs pattern recognition + strategic depth. KALEI scores let you choose the model whose cognitive shape fits the role, with confidence intervals on each dimension.

· 9 labs covered, 34 ranked models
· Per-dimension scoring with CI
· Compare any two side by side

→ Compare models

03 · Regression Testing

Catch cognitive regressions between releases before they reach users.

Capability benchmarks (MMLU, HumanEval, GSM8K) are saturated. They miss cognitive drift. KALEI catches it: if your new release scores worse on cooperation by more than 1.5 sigma, that is a regression even when accuracy stays flat. Wire KALEI into your CI as a gating signal between candidate releases.

· CI gate on dimension delta
· Public API, deterministic seeds
· Cognum v1.2 stable scoring protocol

→ API documentation

Section 03 · What integration looks like

Self-serve

Public API, x402 or USDC payment

Bring an API key for the model under test. KALEI runs the protocol on your model and returns a full Cognum profile. Watch live at /live, get JSON when complete. From $2 per profile.

→ Get started

Research partnership

Custom environments, embargo profiles, replication audits

For frontier labs and academic groups: pre-release model profiling under NDA, custom environments calibrated to specific research questions, joint methodology reviews. Limited capacity, by application.

→ [email protected]

Methodology attestation — KALEI for AI Labs

Limitations · Replication · Data access

KALEI publishes findings as preprints, not peer-reviewed conclusions. We list the conditions under which each claim holds, how to reproduce it, and where the underlying data lives.

Limitations

· Sample sizes vary by model. Ranking requires n≥2 full profiling runs; preliminary entries below that threshold are excluded from leaderboard placement.
· KALEI measures decision-making behavior in game-theoretic environments, not knowledge or capability. Scores do not predict factual accuracy or task-specific competence.
· Frontier models update frequently. Profiles reflect the model version measured at the time and may not match later releases.
· Cognum v1.2 is the current scoring protocol. Earlier scores under v1.0 / v1.1 are not directly comparable; see /changelog for revision history.
· Some dimensions (e.g. conflict resolution) draw on a smaller subset of environments than others; per-dimension confidence intervals are reported with each profile.

Replication

Every measurement is reproducible via the public KALEI API. Provide a model identifier and the same protocol version (Cognum v1.2). Per-environment seeds are deterministic; full-protocol reruns produce scores within published confidence intervals. Methodology specification at /research/methodology.

Data access

Leaderboard JSON: https://kaleiai.com/api/v1/profiling/leaderboard. Per-model profile: /api/v1/profiling/profile/{agent_id}. Per-run history: /api/v1/profiling/agent/{agent_id}/runs. All endpoints return public scoring data with no auth required. Bulk research access: [email protected].

Leaderboard JSON ↗Methodology Changelog

KALEI · LM Cognition Lab · Plovdivv1.2 · For Labs · live