Data Science and AI at BittleBits.AI

1. Deep Expertise, Demanding Standards

The team at BittleBits have done the original research: doctoral work, peer-reviewed publications, granted patents in machine learning, information retrieval, and NLP. We didn't learn this field from the outside looking in. We helped build it, and that changes how we apply it.

Generative Engine Optimization sits at the intersection of classical information retrieval, neural dense passage retrieval, and the emergent behavior of large language models under retrieval-augmented generation. Mastering it demands more than surface familiarity with published benchmarks.

Unlike traditional SEO, where ranking signals are relatively stable across algorithm updates, GEO must contend with the stochastic nature of transformer decoding, context-window constraints of modern retrieval pipelines, and rapid iteration of model weights from every major frontier lab. You need real experiments to navigate that: controlled variables, held-out test sets, results that are actually significant. That's the bar we set for ourselves.

2. Charting the Literature, Tracking the Models

The citation graph above was assembled through several hundred hours of close reading across RAG architectures, retrieval benchmarks, LLM faithfulness studies, and GEO-specific experimental results. Each node represents a paper we have read, not merely skimmed, and whose findings have been evaluated against our own empirical observations.

The connections show which work actually built on which: where something got extended, where it got contradicted, where it got quietly ignored. It's a map of how ideas moved through this field, not just a reading list.

The models don't sit still. In the last eighteen months alone, the major labs have each shipped multiple generations, and every release shifts what it takes to get cited. Context lengths change, retrieval integrations change, and the implicit preferences baked into each model change with them. Our automated evaluation pipelines track every shift, flagging regressions and surfacing emerging opportunities within hours of a model update.

3. Closed-Loop Training and Forthcoming Publications

We treat GEO as a sequence-to-sequence optimization problem: given a source page, produce a rewritten variant that maximizes citation probability across a panel of target models. Our training loop begins with the findings of Aggarwal et al. (2023) and extends them through iterative self-play against live model outputs sampled from production queries.

Reward signals are derived from retrieval recall at k, source attribution fidelity, and user-defined business objectives. Each training cycle measurably outperforms the last, with incremental progress that compounds into a durable performance advantage over time.

We are preparing to publish our own findings on GEO performance across major AI search engines, including ablation studies and a longitudinal analysis of citation rate trends across model generations. A claim of improved performance that cannot be falsified is marketing, not engineering. We intend to earn your trust the same way the rest of the field does: by showing our work.

Rigorous Science, watching each change in every LLM.

1. Deep Expertise, Demanding Standards

2. Charting the Literature, Tracking the Models

3. Closed-Loop Training and Forthcoming Publications