March 22, 2026/10 min read

What is Robots.txt? A Complete Guide for SEO and Crawlers

The robots.txt file is one of the smallest files on your website, but it can have a huge SEO impact. A single incorrect rule can block search engines from crawling important pages.

What is Robots.txt?
Where Robots.txt Lives
How Crawlers Read Robots.txt
Robots.txt Syntax and Directives
Common Examples
Common Mistakes to Avoid
Robots.txt vs Noindex
Best Practices
Generate a Robots.txt File
References

What is Robots.txt?

Robots.txt is a text file placed at the root of your website that gives crawl instructions to search engine bots and other automated agents. It uses the Robots Exclusion Protocol (REP) to define which URL paths are allowed or disallowed for specific user-agents.

Important: robots.txt controls crawling, not guaranteed indexing. If a URL is blocked but linked from other sites, search engines can still index the URL without full page content.

Where Robots.txt Lives

The file must be available at the exact root path:

https://yourdomain.com/robots.txt

If you place it anywhere else, for example /files/robots.txt, crawlers may ignore it.

How Crawlers Read Robots.txt

A crawler requests /robots.txt before crawling.
It looks for rules matching its user-agent name.
It applies Allow and Disallow directives by path.
It crawls URLs that are allowed by the best matching rule.

Different bots may support directives differently, so prioritize standards-supported syntax and test in search engine tools.

Robots.txt Syntax and Directives

The most common directives are:

User-agent: Defines which crawler a rule block applies to.
Disallow: Blocks crawling for matching paths.
Allow: Explicitly allows a path, useful when a parent path is disallowed.
Sitemap: Points crawlers to your XML sitemap URL.

User-agent: *
Disallow: /admin/
Allow: /admin/help/

Sitemap: https://yourdomain.com/sitemap.xml

Common Examples

1) Allow everything

User-agent: *
Disallow:

2) Block all crawling

User-agent: *
Disallow: /

3) Block internal folders

User-agent: *
Disallow: /private/
Disallow: /tmp/
Disallow: /checkout/

Common Mistakes to Avoid

Blocking the entire site in production after launching from a staging setup.
Trying to hide sensitive data in robots.txt. It is publicly accessible and should never be treated as security.
Blocking JS/CSS assets that search engines need for rendering.
Using robots.txt instead of noindex when your goal is to keep pages out of search results.
Forgetting to include a sitemap URL for faster discovery of important pages.

Robots.txt vs Noindex

Method	Controls	Best For
robots.txt	Crawling	Reducing crawl load, blocking non-public sections
meta robots / x-robots-tag	Indexing behavior	Keeping specific pages out of search results

Best Practices

Keep the file simple and intentional. Avoid unnecessary wildcards and duplicate rules.
Only disallow paths you truly do not want crawled.
Test with search engine webmaster tools after each major change.
Add your sitemap directive at the bottom of the file.
Version-control your robots.txt changes so mistakes are easy to roll back.

Generate a Robots.txt File

If you want to quickly build a valid robots.txt with common rule templates, use our free generator:

Open the Robots.txt Generator

References

Koster, M. (2022). RFC 9309: Robots Exclusion Protocol. IETF. https://datatracker.ietf.org/doc/html/rfc9309
Google Search Central. How Google interprets the robots.txt specification. https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt
Google Search Central. Robots meta tag and X-Robots-Tag specifications. https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag

Table of Contents