robots.txt

Sitecheck Team

Small text file that tells crawlers which parts of a site to crawl or avoid.

robots.txt is a plain-text file placed at the root of a site (e.g., https://example.com/robots.txt) that instructs web crawlers about allowed and disallowed paths. It uses simple directives like User-agent, Allow, and Disallow. Remember that robots.txt is advisory — it relies on cooperative crawlers and is not an access-control mechanism.

Best practices:

  • Keep it short and canonical.
  • Point to your sitemap with Sitemap: and reference sitemap.xml.
  • Avoid relying on it for sensitive content (use authentication instead).