Free Tool

LLMs.txt & AI Crawler Checker

Validate your llms.txt and ai.txt. Check crawl rules, training policy, privacy guidance, and get a readiness score.

What is llms.txt?

llms.txt is a plain-text file placed at the root of a website (e.g. https://example.com/llms.txt) that gives AI crawlers, large language models, and retrieval-augmented generation (RAG) systems structured guidance about a site's content, policies, and preferences. Proposed in 2024, it uses a simple key-value format — similar to robots.txt — to declare fields such as Site, Contact, License, Training/FTU Policy, and crawl directives for AI agents. Together with ai.txt, it forms the emerging standard for communicating AI consent and content context on the web.

llms.txt vs ai.txt — what's the difference?

There's no single universal standard yet. llms.txt originated as a way to surface structured, LLM-friendly documentation — think of it as a guided index for AI agents and RAG systems. ai.txt (promoted by Spawning.ai) follows a key-value style similar to robots.txt and focuses primarily on training-data opt-out. Having both maximises coverage across different pipelines.

What actually works — best practices in the wild

The AI crawler landscape is evolving rapidly. These practices reflect what's observed as of early 2026 — specific crawler behaviours, field support, and compliance levels may change as standards mature.
  • robots.txt is still the most honoured crawl control. GPTBot, ClaudeBot, PerplexityBot, and Google-Extended all respect User-agent blocks in robots.txt. Use llms.txt as supplementary guidance, not a replacement.
  • Explicit beats implicit. "NOT permitted" outperforms ambiguous language. LLMs and automated pipelines interpret "transient use" and "fair use" differently — state intent directly.
  • Example prompts improve GEO. Sites with Example-Summary-Prompt or Example-Structured-Extraction fields see more accurate, structured outputs from LLMs in RAG settings.
  • Keep it fresh. A stale Last-Updated date signals an abandoned file. Most AI crawlers de-prioritise outdated directives. Aim to update at least every 6 months.
  • License clarity protects you. Without a License field, content is treated as unconstrained by many training pipelines. Specify whether commercial reuse requires permission.

FAQ

Do LLMs actually read llms.txt?
Not directly during inference. The file is consumed during crawling and indexing. Agentic tools such as Cursor, Claude Projects, and Perplexity fetch it to shape their site understanding. Adoption is accelerating in 2026 but compliance still varies by crawler.
Which AI crawlers support llms.txt?
Named crawlers that process llms.txt or ai.txt signals include GPTBot, OAI-SearchBot, ClaudeBot, anthropic-ai, PerplexityBot, CCBot, Google-Extended, Amazonbot, Diffbot, and Bytespider. Compliance level varies — robots.txt remains the most universally enforced mechanism.
Is llms.txt legally enforceable?
Not on its own — it is an advisory signal, not a contract. For enforceability, combine it with explicit terms of service and robots.txt blocks. Documented intent does matter in licensing and regulatory disputes.
Which file should I prioritise?
Deploy both if possible. llms.txt is gaining traction as a GEO and RAG hint layer. ai.txt has broader recognition in training-data curation tools. They serve slightly different audiences and deploying both takes minutes.
How do I improve my readiness score?
High-impact improvements: add a License field, state training policy explicitly ("allowed" or "not permitted"), include a valid Last-Updated date, add PII-Policy, and deploy both llms.txt and ai.txt for the dual-file bonus.
Does llms.txt affect Google rankings?
Not directly — Google does not use llms.txt as a ranking signal for traditional organic search. However it does influence how Google-Extended (Google's AI training crawler) and AI Overviews interpret and cite your content, which increasingly affects GEO (Generative Engine Optimization) visibility.