Developer Tools
robots.txt Generator - Build Your Crawler Rules File
Generate a valid robots.txt file for your website. Control which bots can crawl which pages, add sitemap URLs, set crawl delays, and use presets for common configurations.
Presets
Crawler rules
Output2 rules
What is robots.txt?
robots.txt is a plain-text file stored at the root of your domain (e.g.,
https://example.com/robots.txt) that follows the
Robots Exclusion Protocol. Web crawlers read this file before visiting your
site to understand which pages they are permitted to crawl.
Syntax reference
-
User-agent: *: Applies to all crawlers. Use a specific name (e.g.,Googlebot) to target one. -
Allow: /path/: Explicitly permits a path that would otherwise be disallowed. -
Disallow: /path/: Prevents the crawler from accessing the path and anything below it. -
Crawl-delay: N: Requests a pause of N seconds between requests (not honoured by Google). Sitemap: URL: Points crawlers to your XML sitemap for faster discovery.
Common AI crawler user-agents
Add separate rule blocks for any you want to block:
GPTBot: OpenAICCBot: Common Crawl (used by many AI models)anthropic-ai: Anthropic ClaudeGoogle-Extended: Google Bard / Gemini trainingPerplexityBot: Perplexity AI
Deploying your robots.txt
Place the generated file at exactly /robots.txt on your web server. Most static site
generators (Astro, Next.js, Gatsby) let you place a robots.txt in the
public/ directory and it will be served automatically.
Robots.txt limitations
The robots.txt protocol is voluntary: well-behaved crawlers (Googlebot, Bingbot) respect it, but malicious scrapers and vulnerability scanners ignore it entirely. To protect genuinely private content, use:
- HTTP authentication — requires a username and password to access the page.
- noindex meta tag — prevents indexing even if a crawler does visit the page.
- Server-side access controls — the only reliable way to block unwanted access.
robots.txt is best used to manage crawl budget (directing crawlers away from low-value pages) not for security.
Sitemap in robots.txt
You can list your sitemap URL(s) in robots.txt to help crawlers discover all indexed pages:
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-news.xml Multiple sitemaps can be listed. All URLs must be absolute (including protocol). While submitting sitemaps directly through Google Search Console is the primary method, the robots.txt declaration helps crawlers from all engines discover them automatically.