BitsyBot 🕷️ — The BittleBits Web Crawler

What is BitsyBot?

BitsyBot is BittleBits.ai's automated web crawler. It visits publicly available web pages to score them for AI-content optimization — evaluating how likely a page is to be cited by AI-powered search engines like ChatGPT, Perplexity, and Google AI Overviews. BitsyBot helps BittleBits understand the landscape of AI-optimized content at scale.

BitsyBot does not retain the content of any page it visits. Unlike training crawlers (such as GPTBot) or search indexers (such as Googlebot), BitsyBot only computes a score for a page and discards the raw content immediately after scoring. No page text, HTML, or personal data is stored or used for any purpose other than generating that transient score.

BitsyBot is not a malicious bot. It respects robots.txt directives, crawls at a polite rate, and only accesses publicly available URLs. If you have questions or concerns, please reach out using the contact form below.

User Agent

BitsyBot identifies itself with the following HTTP user-agent string. You can use this to identify BitsyBot in your server logs or to write targeted robots.txt rules:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; BitsyBot/alpha.1; +https://bittlebits.ai/fp/bitsy-bot) Chrome/141.0.7390.24 Safari/537.36

The robots.txt product token for BitsyBot is BitsyBot.

What BitsyBot Does — and Doesn't Do

BitsyBot does:

Crawl publicly accessible web pages.
Compute an AI-content optimization score for each page — a measure of how well the page is positioned to be cited in AI-generated answers.
Respect robots.txt allow/disallow directives.
Honour crawl-delay directives and throttle itself to avoid overloading servers.
Identify itself honestly via the user-agent string above.

BitsyBot does not:

Retain page content. The raw HTML and text of a page is discarded immediately after scoring. We do not build a search index or content archive.
Use page content to train AI models. BitsyBot is a scoring tool only — it does not feed data into any generative AI training pipeline.
Collect personal data or user-generated content.
Log in to password-protected pages or bypass authentication.
Submit forms or take any write actions on a site.
Execute JavaScript (BitsyBot is a plain HTTP crawler).

How to Control BitsyBot with robots.txt

Like Googlebot and other well-behaved crawlers, BitsyBot reads and respects your site's robots.txt file, located at the root of your domain (e.g. https://example.com/robots.txt).

Block BitsyBot entirely:

User-agent: BitsyBot
Disallow: /

Block BitsyBot from a specific section of your site:

User-agent: BitsyBot
Disallow: /private/
Disallow: /members/

Allow BitsyBot everywhere (default behaviour):

User-agent: BitsyBot
Allow: /

Note: blocking BitsyBot only prevents scoring of those pages — it has no effect on your rankings in Google Search or any other search engine. BitsyBot does not share data with any third-party search engine.

Verifying a BitsyBot Request

Because the User-Agent header can be spoofed by any HTTP client, you should verify a request actually originates from BittleBits before trusting it as genuine BitsyBot traffic. Use the following method:

Reverse DNS lookup:Perform a reverse DNS lookup on the request's source IP address. Genuine BitsyBot requests will resolve to a hostname ending in bittlebits.ai. You can then forward-confirm by resolving that hostname and checking that it maps back to the same IP.

Published IP ranges: BitsyBot crawls from a defined set of IP ranges. Our current published egress addresses are hosted on AWS infrastructure in the US (us-east-1). If you need a specific IP allowlist for firewall rules, please contact us at support@bittlebits.ai.

Any request claiming to be BitsyBot that does not originate from a *.bittlebits.ai reverse-DNS hostname should be treated as a spoofed request and can be safely blocked.

Crawl Rate and Behaviour

BitsyBot is designed to be a polite, low-impact crawler. It self-throttles to avoid placing meaningful load on any single server. On average, BitsyBot will not crawl a given domain more than a few times per minute. If you notice unusual traffic volume from BitsyBot, please contact us immediately — it likely indicates a misconfiguration on our end.

BitsyBot respects the Crawl-delay directive in robots.txt. For example, to ask BitsyBot to wait at least 10 seconds between requests:

User-agent: BitsyBot
Crawl-delay: 10

BitsyBot also honours standard HTTP status codes: a 429 Too Many Requests or 503 Service Unavailable response with a Retry-After header will cause BitsyBot to back off and retry after the specified interval.

BitsyBot vs. Other Well-Known Crawlers

It can be helpful to understand how BitsyBot compares to other crawlers you may already be familiar with:

Crawler	Operator	Purpose	Retains content?
BitsyBot	BittleBits.ai	AI-content optimization scoring	No — score only, content discarded
Googlebot	Google	Web indexing for Google Search	Yes — full index
GPTBot	OpenAI	Training data for AI foundation models	Yes — training corpus
OAI-SearchBot	OpenAI	Surfacing sites in ChatGPT Search	Yes — search index
ChatGPT-User	OpenAI	User-triggered browsing in ChatGPT	Session only

Frequently Asked Questions

Q: Why is BitsyBot visiting my site?
BitsyBot crawls publicly linked pages to score them for AI-content optimization fitness. You may have been discovered through a link on another site, a sitemap, or a direct URL. Only public pages that would be visible to any anonymous visitor are crawled.

Q: Will BitsyBot affect my Google rankings?
No. BitsyBot is completely independent of Google. It does not share data with Google or any other search engine. It has no effect — positive or negative — on your Google Search ranking.

Q: Is my content being used to train an AI?
No. BitsyBot only scores pages. Page content is not retained and is not used to train any AI model.

Q: Can I request that BitsyBot not crawl my site?
Yes. Add a Disallow: / rule for User-agent: BitsyBot in your robots.txt file. BitsyBot will honour it within 24 hours of the change being published.

Q: How do I know a visit is really from BitsyBot and not a spoofer?
Perform a reverse DNS lookup on the source IP. Genuine BitsyBot requests will resolve to a *.bittlebits.ai hostname. See the Verifying a BitsyBot Request section above for full instructions.

Have questions about BitsyBot, want to report unexpected crawl behaviour, or need your site removed from crawling? We're happy to help.