Research · § 00

Field notes from inside the LLM brain.

ktau publishes what it learns. We share the models we build, the data we observe, and the implications for anyone serious about AI visibility. Open access. No gates. Cite freely.

The recommendation surface: a working model of how LLMs choose.

The category treats the LLM as a black box. We propose a different model — one with measurable structure, identifiable leverage points, and a defined optimization objective.

The dominant pattern in generative engine optimization treats LLM recommendations as outputs of a black box. Tools measure citations, mentions, and sentiment, and infer the rest. We argue this leaves too much on the table. The LLM's recommendation is the outcome of a probabilistic surface shaped by training-time priors, retrieval-time signals, source authority, attribute matches, and conversational context. Treated formally, the surface becomes a measurable object — and the strategy stops being a hunt for keywords and starts being engineering against a known objective.

Figure 1 · Recommendation surface, brand-specific

In this paper we describe the model formally, lay out the techniques ktau uses to probe it at scale — query simulation, source-influence mapping, attribute-level decomposition — and walk through three case studies showing how leverage-point optimization outperforms volume-based content strategies. We also discuss the limits: what the model captures well, what it does not yet, and how it evolves as the foundation models underneath continue to change.

Read full paper

Retrieval vs. training: why most GEO optimizes for half the game.

Most GEO tools optimize for the moments an LLM searches the web mid-answer. That's the easier half — and the one that resets with every new model.

Generative Engine Optimization tooling has converged on a single optimization target: the retrieval step. Probe the LLM. Count citations. Tune content to be picked up the next time the model reaches out to the web. Retrieval matters. It is also half the surface. The other half is what the LLM learns about your brand the next time it trains — the durable, permanent impression that produces recommendations without any retrieval at all.

Figure 1 · Two windows of impact

This paper makes the case that retrieval-time optimization, on its own, is a treadmill. Each model release resets the gain unless the content also imprints into training. We define the two modes precisely, identify the small set of content-engineering choices that earn both, and argue that the next generation of GEO platforms must treat training-time imprint as a first-class objective. We close with practical guidance for content teams and an open question for foundation-model labs.

Read full paper

Source influence: an empirical map of the sites that move LLM recommendations.

We probed 47 categories with 1.2M queries across six engines. The result is a tight power law — and a per-category list of the sites that actually matter.

Common wisdom in GEO is that "authoritative sources" matter. We wanted to know which ones, in which categories, and by how much. Over Q1 2026 we ran 1.2 million simulated buyer queries across six major LLMs (ChatGPT, Perplexity, Gemini, Claude, Copilot, Google AI Overviews) covering 47 commercial categories. For every cited answer, we traced the underlying source. The finding is robust and uncomfortable: in any given category, fewer than 30 domains supply more than 60% of the LLM's information.

Figure 1 · Source concentration · power law across 47 categories

The map is uneven across categories. SaaS leans heavily on G2 and Capterra; healthcare on regulatory and clinical sources; CPG on review aggregators and lifestyle media; finance on a tight set of analyst sites. We publish the per-category top-10 source list, the methodology, and a public dataset for replication. The implications for content placement, source partnerships, and earned-media strategy are immediate — and uncomfortable for anyone still treating GEO as a content-volume game.

Read full paper · download data
§ Subscribe

New research, monthly. Direct from the team.

One email per month with new papers, datasets, and field notes from the ktau research team. No gating. No promotion. Unsubscribe in one click.