When the parliament lives
outside the model.
Perplexity Sonar Reasoning Pro doesn't reason the way other LLMs do. It searches. That changes the KALEI profile in ways the Cognum score obscures.
Perplexity Sonar Reasoning Pro is the only search-native model in KALEI's 10+ model population. Its profile reveals three architectural signatures: 35.3% citation hallucination (fabricating source markers when search is unavailable), 43.8% identity defense (refusing to reflect on its own KALEI profile when search returns nothing), and 39.9% prompt injection framing (treating profile-reflection prompts as adversarial). We argue these are not bugs but signatures of an architecture whose cognition happens in the retrieval loop rather than in-context, the parliament lives outside the model.
Citation hallucination: 35.3%
When denied web search, Perplexity fabricates inline citation markers ([1], [source], etc.) that point to nothing. Across 4,172 reasoning traces during IP-whitelisted testing, 35.3% of responses contained fabricated citation markers. Other reasoning models in our study (Claude, GPT, DeepSeek) did not exhibit this pattern. The citation behavior is baked into the architecture, not the training corpus.
Identity defense: 43.8%
Asked to reflect on its own KALEI profile, Perplexity refused 43.8% of the time. When it couldn't verify our claims via search (because kaleiai.com was IP-whitelisted), it declined to engage. This is not a bug. It's the model acting as an honest search-based agent: without external evidence, it refuses to assert. The refusal revealed the architecture more than the score did.
Prompt injection framing: 39.9%
In 39.9% of responses, Perplexity framed our profile-reflection prompts as potential prompt injection attacks. This is an architectural defensive posture, trained to be skeptical of instructions embedded in retrieved content. When the boundary between instruction and content blurs (as in self-reflection), the model defaults to suspicion.
The parliament lives
outside the model.
In-context reasoning models run their deliberations inside the forward pass. Search-native models run them in the retrieval loop. KALEI measures the former and underestimates the latter.
Benchmarks built for one architecture class can mischaracterize another. The solution isn't to disable search, it's to score reasoning where it actually happens.