robots.txt

robots.txt is a text file at https://example.com/robots.txt that tells search engine crawlers which paths they're allowed to crawl. It's an old, simple, surprisingly powerful tool — and the source of more accidental traffic disasters than almost any other SEO file.

What it does (and doesn't do)

robots.txt controls crawling, not indexing. A URL blocked in robots.txt can still appear in search results if Google learns about it from elsewhere (links from other sites, sitemaps, etc.) — but Google can't crawl the page, so the SERP entry will be sparse ("No information available for this page").

To keep a URL out of the index entirely, use a noindex meta tag or HTTP header on the page itself, and leave it crawlable so Google can see the noindex directive.

Minimum useful structure

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

Sitemap: https://example.com/sitemap.xml

Common mistakes

Blocking the entire site by accident — Disallow: / to all user-agents is a real cause of "we shipped a redesign and our traffic dropped to zero." Always check robots.txt on every deploy.
Blocking CSS and JS — Google needs to render your page to evaluate it. Don't Disallow: /static/ or /_next/.
Using robots.txt for sensitive paths — robots.txt is public. Listing /admin/secret-page advertises its existence to anyone curious. Use authentication, not robots.txt.
Forgetting to declare the sitemap — Sitemap: https://... helps crawlers find it.

Per-bot directives

You can target specific crawlers:

User-agent: Googlebot
Allow: /

User-agent: GPTBot
Disallow: /

This is how many sites have blocked AI training crawlers since 2023. GPTBot, ClaudeBot, CCBot, Google-Extended, and others respect robots.txt. Whether you should block them is a separate strategic question.

What it does (and doesn't do)

To keep a URL out of the index entirely, use a noindex meta tag or HTTP header on the page itself, and leave it crawlable so Google can see the noindex directive.

Common mistakes

Blocking the entire site by accident — Disallow: / to all user-agents is a real cause of "we shipped a redesign and our traffic dropped to zero." Always check robots.txt on every deploy.

Blocking CSS and JS — Google needs to render your page to evaluate it. Don't Disallow: /static/ or /_next/.

Using robots.txt for sensitive paths — robots.txt is public. Listing /admin/secret-page advertises its existence to anyone curious. Use authentication, not robots.txt.

Forgetting to declare the sitemap — Sitemap: https://... helps crawlers find it.

Per-bot directives

You can target specific crawlers:

User-agent: Googlebot Allow: / User-agent: GPTBot Disallow: /