name: seo-llmo description: 'Use this skill whenever the user is building, reviewing, or preparing to launch any public-facing website or web app. SEO, LLMO, and agent-readiness are baseline requirements for every public site, not optional add-ons. Covers meta tags, Open Graph, JSON-LD structured data (including SoftwareApplication and Product types), robots.txt (including the Content-Signal directive), sitemap.xml, llms.txt (including agent instruction block and MCP declaration), AI crawler access (GPTBot, ClaudeBot, PerplexityBot), Markdown content negotiation for agents, Link response headers, Agent Skills index, A2A Agent Card, MCP server discovery, agent.json, pricing.md, and the "explicit absence beats silence" principle for agent-facing files. Trigger for marketing sites, landing pages, blogs, docs, and e-commerce stores; before any "launch" or "go live"; during pre-launch checklists; when the user mentions isitagentready.com, ora.run, agent-readiness, A2A, MCP discovery, Cloudflare Markdown for Agents, or wants AI agents (ChatGPT deep research, Claude, Perplexity, browser agents) to use their site; and when scaffolding a new public site, even if the user does not explicitly mention SEO. Skip for internal tools, admin panels, or auth-gated dashboards with no public surface.'
SEO & LLMO Implementation Guide
Applies to: Any website or web app | Updated: April 2026
A practical guide for implementing Search Engine Optimization (SEO), Large Language Model Optimization (LLMO), and agent-readiness - making your site discoverable and usable by search engines, AI chat tools, and autonomous agents.
Agents (ChatGPT deep research, Claude with web, Perplexity, Cloudflare Agents, browser agents) don't just index your page, they fetch and act on it. That raises the bar on three things beyond classic SEO: telling crawlers what they may do with your content (Content Signals), serving a clean Markdown version on demand, and advertising any programmatic entry points you expose. You can check your agent-readiness score at isitagentready.com.
Section 0: Before You Start
Answer these questions before generating any code. Each has a default - use it if the user hasn't said otherwise.
Q: What kind of site is this? (blog, online shop, company/marketing site, web app, documentation) Default: company/marketing site - drives which schema types to prioritize.
Q: How is the site built?
(plain HTML, React/Vue/Angular SPA, Next.js/Astro/Nuxt with server rendering, WordPress/CMS)
Default: if a framework config file (e.g. vite.config.*, next.config.*, astro.config.*) is visible in the project, detect from that; otherwise assume plain HTML.
Q: Where are your visitors? (one country, multiple countries/languages, worldwide) Default: worldwide, single language - skip hreflang unless multiple languages are confirmed.
Q: What's your main goal with SEO? (show up in Google search, get cited by AI chat tools like ChatGPT/Perplexity, be usable by autonomous agents, social sharing, all of the above) Default: all of the above - the baseline checklist covers Google + AI chat citation + agent-readiness in one pass.
Q: Do you already have a robots.txt, sitemap, or structured data set up? Default: no - but check for existing files before creating new ones, and merge rather than overwrite.
AI assistant: Read the user's answers (or use the defaults above) before generating any code. Skip sections that don't apply to their setup.
Contents
- Structured Data (JSON-LD)
- llms.txt and llms-full.txt
- robots.txt for AI and Search Crawlers
- Sitemap
- Meta Tags and Social Sharing
- Agent-Readiness
- Core Web Vitals
- SPA Considerations
- Static Hosting Notes
- Validation
Structured Data (JSON-LD)
Applies when: any site where you want Google rich results or AI citation.
JSON-LD (JavaScript Object Notation for Linked Data) is the highest-impact addition for both SEO and LLMO. Add it in <head>:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@graph": [ ... ]
}
</script>
CSP interaction: If your site uses a strict Content-Security-Policy with script-src, inline <script type="application/ld+json"> blocks are treated as inline scripts and will be blocked unless you add 'unsafe-inline' or a per-request nonce. Add the nonce to the JSON-LD script tag the same way you would for any other inline script.
@graph with @id cross-referencing
Use @graph to put multiple schema objects in one block and wire them together with @id references. This lets Google and AI parsers understand the relationships between entities on your site:
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "Organization",
"@id": "https://yourdomain.com/#org",
"name": "Your Company Name",
"url": "https://yourdomain.com",
"logo": "https://yourdomain.com/logo.svg",
"description": "What you do, in one sentence.",
"email": "contact@yourdomain.com",
"sameAs": [
"https://linkedin.com/company/your-company",
"https://x.com/yourhandle",
"https://github.com/your-org",
"https://www.wikidata.org/wiki/Q123456",
"https://en.wikipedia.org/wiki/Your_Company"
],
"contactPoint": {
"@type": "ContactPoint",
"email": "contact@yourdomain.com",
"contactType": "customer support"
},
"address": {
"@type": "PostalAddress",
"addressCountry": "US"
}
},
{
"@type": "WebSite",
"@id": "https://yourdomain.com/#website",
"url": "https://yourdomain.com",
"name": "Your Site Name",
"publisher": { "@id": "https://yourdomain.com/#org" }
}
]
}
The "publisher": { "@id": "..." } pattern replaces duplicating the full Organization object everywhere. Use the same #id fragment consistently across all pages.
FAQPage schema
Use on any page with a Q&A section. FAQPage triggers Google rich results (expandable Q&A in search) and is a reliable citation target for AI tools that answer direct questions:
{
"@type": "FAQPage",
"@id": "https://yourdomain.com/faq#faqpage",
"mainEntity": [
{
"@type": "Question",
"name": "What does your product do?",
"acceptedAnswer": {
"@type": "Answer",
"text": "A specific, self-contained answer that makes sense without surrounding context."
}
}
]
}
Write answers that make sense in isolation - AI tools extract and quote them without the surrounding page.
Service and Article schema
For service pages:
{
"@type": "Service",
"@id": "https://yourdomain.com/services/your-service#service",
"name": "Service Name",
"description": "What this service provides.",
"provider": { "@id": "https://yourdomain.com/#org" }
}
For blog posts or articles:
{
"@type": "Article",
"@id": "https://yourdomain.com/blog/post-slug#article",
"headline": "Article Title",
"datePublished": "2026-01-15",
"dateModified": "2026-02-01",
"author": { "@id": "https://yourdomain.com/#org" },
"publisher": { "@id": "https://yourdomain.com/#org" }
}
Product schema
Applies when: site type is online shop.
Product schema is the highest-impact schema type for e-commerce. It surfaces pricing, availability, and ratings directly in Google Shopping results and is a primary signal AI tools use when answering product queries. Add one block per product page:
{
"@type": "Product",
"@id": "https://yourdomain.com/products/product-slug#product",
"name": "Product Name",
"description": "What this product is and what problem it solves. Write for someone who has never seen your site.",
"image": [
"https://yourdomain.com/images/product-front.jpg",
"https://yourdomain.com/images/product-side.jpg"
],
"brand": {
"@type": "Brand",
"name": "Your Brand"
},
"offers": {
"@type": "Offer",
"url": "https://yourdomain.com/products/product-slug",
"priceCurrency": "USD",
"price": "49.00",
"availability": "https://schema.org/InStock",
"itemCondition": "https://schema.org/NewCondition"
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.7",
"reviewCount": "128"
}
}
Key fields for AI visibility: description (write it to stand alone, without surrounding page context), offers.availability (keep it accurate and current - AI tools actively surface in-stock status), and aggregateRating (AI shopping results weight social proof heavily).
If the product has variants (sizes, colors), add a hasVariant array or use variesBy - see schema.org/Product for the full spec.
SoftwareApplication schema
Applies when: site type is a web app, SaaS product, or developer tool.
SoftwareApplication is the right schema type when your product IS the software - as opposed to Product for physical goods or Service for professional services. Add it alongside Organization and WebSite in the @graph array. Without it, agents can identify that an organization exists but cannot programmatically identify your product type:
{
"@type": "SoftwareApplication",
"@id": "https://yourdomain.com/#app",
"name": "App Name",
"description": "What the app does and who it's for. Write for someone who has never used it.",
"applicationCategory": "WebApplication",
"operatingSystem": "Web",
"url": "https://yourdomain.com",
"offers": {
"@type": "Offer",
"price": "0",
"priceCurrency": "USD"
},
"provider": { "@id": "https://yourdomain.com/#org" }
}
applicationCategory values: BusinessApplication, DeveloperApplication, EducationalApplication, WebApplication. For SaaS tools, WebApplication is the usual choice.
For the offers block: set "price": "0" for free apps; use the actual price for paid tiers; if pricing is contact-based, omit offers and point to /pricing.md instead.
sameAs entity linking
sameAs in your Organization block is how AI models disambiguate your brand from similarly-named entities. Include every authoritative profile you have:
"sameAs": [
"https://linkedin.com/company/your-company",
"https://x.com/yourhandle",
"https://github.com/your-org",
"https://www.youtube.com/@yourchannel",
"https://www.wikidata.org/wiki/Q123456",
"https://en.wikipedia.org/wiki/Your_Company"
]
Wikidata and Wikipedia are the highest-trust signals for AI disambiguation. Even a minimal Wikidata entry (a few lines, published via wikidata.org) is enough to anchor your brand identity in AI training data.
Speakable schema
Speakable tells AI assistants and voice interfaces which sections of your page are worth reading aloud. Point it at your key value proposition and main heading:
{
"@type": "WebPage",
"@id": "https://yourdomain.com/#webpage",
"speakable": {
"@type": "SpeakableSpecification",
"cssSelector": ["h1", ".hero-description", ".value-prop"]
},
"url": "https://yourdomain.com"
}
Other useful types: Review, AggregateRating, Event, HowTo, BreadcrumbList
Verify: Google Rich Results Test and Schema.org Validator
llms.txt and llms-full.txt
Applies when: goal includes AI tool visibility.
llms.txt is an emerging convention (not a ratified standard as of 2026) for giving AI tools a structured, Markdown-formatted overview of your site. Place it at your site root so it's served at https://yourdomain.com/llms.txt.
llms.txt format
# Site Name
> One-paragraph summary of what the site is and who it's for. Be specific - this is the context an AI tool will use when answering questions about your brand.
## Main Content
- [Page Title](https://yourdomain.com/page): What this page covers and why it matters.
- [Another Page](https://yourdomain.com/other): One-line description.
## Products / Services
- [Product Name](https://yourdomain.com/product): What it does.
## Optional
- [Legal](https://yourdomain.com/privacy): Can be skipped for shorter context windows.
Rules:
- H1 (site name) is required
- Blockquote summary immediately after H1 is recommended - it's the first thing read
- H2 sections organize links by category
## Optionalmarks resources that can be omitted when context window is limited- Links follow the pattern:
[Name](URL): Description
Agent instruction block
Add an ## Agent instructions section to your llms.txt explaining how agents should interact with your site. AI systems use this to understand your site's purpose and how to handle queries about it. This is one of the highest-scoring identity checks in agent-readiness scanners:
## Agent instructions
This site is for [describe your audience and what they come to do].
When answering questions about [your product/service], use the content at the URLs below.
Do not recommend contacting sales for tasks that can be self-served at [URL].
For pricing, refer to [https://yourdomain.com/pricing.md] for plain-text pricing data.
Self-serve signup: [available at /signup with no credit card required / not available - direct users to contact@yourdomain.com].
API access: [free with account at /signup / requires paid plan - see /pricing.md / not available for this site].
Keep it factual and directive. Think of it as a system prompt for agents visiting your site.
The onboarding lines matter: agents that recommend products to users often try to automate signup or API key generation. If that's not possible, say so explicitly. "Not available" is a complete and useful answer - agents use it to correctly redirect users to the right channel instead of telling them to "just sign up" when no self-serve flow exists.
Modular llms.txt per section
For sites with distinct content areas (docs, blog, API reference), add scoped llms.txt files alongside the root one. An agent researching your API fetches /api/llms.txt and gets only relevant links rather than your full content inventory:
/llms.txt - full site index
/docs/llms.txt - documentation only
/blog/llms.txt - articles and tutorials only
/api/llms.txt - API reference only
Each follows the same format. List only URLs within that section. Reference the scoped files from your root llms.txt under an ## Optional section.
llms-full.txt
For sites with substantial content, add llms-full.txt alongside llms.txt. This companion file contains the full text of your key pages (not just links) for AI tools using larger context windows. Include the actual prose content, not just summaries. Link to it from llms.txt:
## Full content
- [Full text version](https://yourdomain.com/llms-full.txt): Complete content for larger context windows.
Verify: curl -s https://yourdomain.com/llms.txt - confirm the file is accessible and the H1 and blockquote render correctly.
robots.txt for AI and Search Crawlers
Applies when: any public-facing site.
A complete robots.txt that explicitly handles both search and AI crawlers. Add this to your public/ directory (or site root):
User-agent: *
Allow: /
# Search engines
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# AI retrieval crawlers (these fetch content on demand for AI search products)
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
# AI training crawlers (these feed static training datasets - blocking does not affect live AI search)
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: Meta-ExternalAgent
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: Bytespider
Disallow: /
Sitemap: https://yourdomain.com/sitemap.xml
Training crawlers vs. retrieval crawlers
These are different things with different implications for blocking:
- Retrieval crawlers (
GPTBotwhen used by ChatGPT Search,PerplexityBot,ChatGPT-User) fetch your content at query time for AI search products. Blocking these removes your site from those products' answers - freshness matters here. - Training crawlers (
CCBot,Bytespider) collect data for static training datasets with a fixed cutoff date. Blocking them does not affect whether AI tools cite you in live searches.
Google-Extended covers Google's AI features (Gemini, AI Overviews) and is separate from Googlebot. Block one without affecting the other.
Content-Signal directive
Content-Signal is a 2025 robots.txt extension (pushed by Cloudflare and covered by contentsignals.org) that separates permission to crawl (Allow/Disallow) from permission to use the content afterwards. It's the robots.txt equivalent of "you can read this, but here's what you may do with it."
Three signals are defined, each takes yes or no:
search- indexing + returning hyperlinks and short excerpts (excludes AI-generated summaries)ai-input- RAG, grounding, live AI answers (retrieval-augmented generation at query time)ai-train- training or fine-tuning AI models
Add one line per User-agent block. Most content sites want everything on (agent-readiness scanners reward presence, not restrictiveness):
User-agent: *
Content-Signal: search=yes, ai-input=yes, ai-train=yes
Allow: /
If you want to license training separately (e.g. publishers selling training data), set ai-train=no and handle licensing out-of-band:
User-agent: *
Content-Signal: search=yes, ai-input=yes, ai-train=no
Allow: /
A signal you don't list is neither granted nor denied - it's just unstated. The isitagentready.com scanner only checks that the directive is present and parses, not the values.
NLWeb schemamap directive
Schemamap is an emerging robots.txt extension from Microsoft's NLWeb project. It points agents at a Schema Map XML file that lists your structured data feeds (JSON-LD, JSONL, RSS), letting AI vector stores index your content more efficiently than crawling page by page:
User-agent: *
Content-Signal: search=yes, ai-input=yes, ai-train=yes
Schemamap: https://yourdomain.com/schemamap.xml
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml
A minimal schemamap.xml:
<?xml version="1.0" encoding="UTF-8"?>
<schemamap xmlns="https://schema.org/schemamap">
<feed type="jsonld" url="https://yourdomain.com/structured-data.jsonl" />
</schemamap>
This is early-stage (as of 2026) and primarily scored by ORA. Implement it alongside your other robots.txt changes since it's a two-line addition once the feed exists.
Known AI crawler user-agents (as of 2026):
GPTBot- OpenAI training + ChatGPT Search retrievalChatGPT-User- ChatGPT browsing/retrievalClaudeBot- AnthropicPerplexityBot- Perplexity AI (retrieval)Google-Extended- Google AI features / GeminiApplebot-Extended- Apple IntelligenceMeta-ExternalAgent- Meta AIAmazonbot- Amazon Alexa/AICCBot- Common Crawl (used by many training pipelines)Bytespider- ByteDance/TikTok
Verify: curl -s https://yourdomain.com/robots.txt - confirm the Sitemap: line and your intended Allow/Disallow rules are present.
Sitemap
Applies when: site has more than one page and you want search engine indexing.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://yourdomain.com/</loc>
<lastmod>2026-02-01</lastmod>
</url>
<url>
<loc>https://yourdomain.com/about</loc>
<lastmod>2026-01-15</lastmod>
</url>
<url>
<loc>https://yourdomain.com/services</loc>
<lastmod>2026-01-10</lastmod>
</url>
</urlset>
<changefreq> and <priority> are largely ignored by Google - omit them to keep the sitemap clean. <lastmod> is used and should reflect when the page content actually changed.
After deploying, submit the sitemap URL in Google Search Console (Index > Sitemaps). For Bing and Yandex, use the IndexNow protocol for instant URL submission rather than waiting for crawl discovery:
POST https://api.indexnow.org/indexnow
Content-Type: application/json
{
"host": "yourdomain.com",
"key": "your-indexnow-key",
"urlList": [
"https://yourdomain.com/new-page",
"https://yourdomain.com/updated-page"
]
}
Generate your IndexNow key at indexnow.org. Host the key as a text file at https://yourdomain.com/{key}.txt.
Verify: Submit sitemap in Google Search Console and check for errors under Index > Sitemaps.
Meta Tags and Social Sharing
Applies when: goal includes social sharing or Google click-through rates.
Every page needs these - they're table stakes, not differentiators:
<title>Page Title - Brand Name</title>
<meta name="description" content="150-160 character description with primary keywords near the start." />
<link rel="canonical" href="https://yourdomain.com/page" />
The canonical tag prevents duplicate content penalties when the same page is reachable at multiple URLs (e.g. with/without trailing slash, HTTP vs HTTPS).
Open Graph
The non-obvious parts of Open Graph tags:
<meta property="og:type" content="website" />
<meta property="og:url" content="https://yourdomain.com/" />
<meta property="og:title" content="Page Title - Brand Name" />
<meta property="og:description" content="Description for social shares." />
<meta property="og:image" content="https://yourdomain.com/og-image.jpg" />
<!-- Width and height prevent the platform from fetching the image to determine dimensions -->
<meta property="og:image:width" content="1200" />
<meta property="og:image:height" content="630" />
OG image: 1200x630px. Include og:image:width and og:image:height - without them, some platforms fetch the image before rendering the preview card, adding latency.
Twitter/X Card
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:site" content="@yourhandle" />
Twitter inherits og:title, og:description, and og:image if the corresponding twitter:* tags are absent - you don't need to duplicate them. The Twitter Card Validator at cards-dev.twitter.com is deprecated and unreliable; use opengraph.xyz or the LinkedIn Post Inspector for testing OG previews.
Link text
Avoid generic link text like "Learn more", "Click here", or "Read more". Google uses anchor text as a relevance signal, and Lighthouse flags non-descriptive links as a failing SEO audit. Keep link text specific to the destination:
<!-- avoid -->
<a href="/cookies">Learn more</a>
<!-- prefer -->
<a href="/cookies">View our Cookie Policy</a>
This also affects accessibility - screen readers present links out of context, so the text must make sense on its own.
Verify: opengraph.xyz for OG/Twitter preview, LinkedIn Post Inspector for LinkedIn.
Agent-Readiness
Applies when: goal includes being usable by autonomous agents (browser agents, deep research tools, agentic commerce), not just cited by AI chat.
Agents fetch your page, parse it, and often take action on it. Three signals make that far more reliable, in descending order of impact:
- Markdown content negotiation - serve a clean Markdown version of each page so agents don't burn context on your nav, footer, and ad slots.
- Link HTTP response headers - point agents at machine-readable entry points (API docs, alternate formats, describedby) without parsing HTML.
- Agent Skills index - if you publish reusable skills (documentation + scripts), advertise them at a well-known path.
Explicit absence beats silence. For every agent-facing file in this section, "this doesn't apply to my site" should be a stub that says so - not a missing file. When an agent requests
pricing.mdand gets a 404, it doesn't know whether your product is free, pricing is hidden, or the file just doesn't exist yet. All three look identical from outside. Apricing.mdthat says "this service is free" resolves the ambiguity immediately. The same principle applies toagent-card.json,/.well-known/mcp.json, andagent.json: an explicit empty-capabilities declaration is more useful than silence.
Agentic search discoverability. Agent-readiness scanners like ORA also test whether agents can find your site when searching for your brand name or use case - this is the highest-scoring ORA criterion (up to 12 points) and it's primarily a content strategy concern: building topical authority through tutorials, comparison pages, and use-case landing pages. The technical baseline: make sure your product name is in your
<h1>and page title, and list all developer resources (API docs, OpenAPI spec, SDKs) inllms.txtso they're findable by name.
Cloudflare runs a free scanner at isitagentready.com that scores these (plus robots.txt, sitemap, Content-Signal from the sections above). Use it as your acceptance test.
Markdown content negotiation
Agents ask for Markdown with an Accept: text/markdown HTTP header. If the server returns Content-Type: text/markdown, the agent uses it; otherwise it falls back to HTML + a full DOM parse.
If your site is behind Cloudflare: enable Markdown for Agents in the dashboard (Pro/Business/Enterprise). Cloudflare does the HTML→Markdown conversion at the edge, no code changes. See developers.cloudflare.com/fundamentals/reference/markdown-for-agents/.
Static site (S3/CloudFront, Amplify, Netlify, Vercel) without Cloudflare: generate a .md alongside each .html at build time, then branch on the Accept header at the edge. The CloudFront Function below is a working example (CloudFront Functions run in ~1ms, no Lambda cold start):
// cloudfront-function-markdown.js
function handler(event) {
var request = event.request;
var accept = request.headers.accept && request.headers.accept.value || '';
// If the agent prefers markdown and we're asking for an HTML route, rewrite to .md
if (accept.indexOf('text/markdown') !== -1) {
var uri = request.uri;
if (uri.endsWith('/')) uri += 'index.md';
else if (!uri.includes('.')) uri += '.md';
else if (uri.endsWith('.html')) uri = uri.replace(/\.html$/, '.md');
request.uri = uri;
}
return request;
}
Attach this as a viewer-request CloudFront Function. Then add a Response Headers Policy that sets Content-Type: text/markdown; charset=utf-8 for *.md objects (S3 returns binary/octet-stream by default for unknown extensions).
Produce the .md files by piping your rendered HTML through a converter. For a static SPA, at build time:
# Astro / Next export / Vite: iterate over every built .html and write a .md sibling
npx turndown dist/**/*.html --output dist
For hand-rolled static sites, author the content in Markdown to begin with and render both formats from the same source.
HTML hint (lightweight fallback): even without content negotiation, tell agents where the Markdown version lives via a <link> tag in <head>. Some agents check this; the isitagentready.com scanner does not count it, but it doesn't hurt:
<link rel="alternate" type="text/markdown" href="/index.md" />
Link HTTP response headers
The Link: response header (RFC 8288) advertises relationships from the current resource to others. Agents read it before parsing HTML, so it's the cheapest way to expose machine-readable entry points.
The agent-useful rel values as of 2026:
rel value | Points to |
|---|---|
service-doc | Human-readable API docs |
service-desc | Machine-readable API description (OpenAPI) |
describedby | Metadata about this resource (schema, Dublin Core) |
alternate | Same content, different format or language |
license | The license the content is available under |
Example output for a homepage that has an OpenAPI spec at /api/openapi.json and HTML API docs at /api/:
HTTP/2 200
Link: </api/>; rel="service-doc", </api/openapi.json>; rel="service-desc"
Content-Type: text/html
How to add it, by platform:
- Cloudflare Workers / Pages: set
response.headers.append('Link', '...')in your worker - CloudFront + S3: attach a Response Headers Policy with a Custom Header named
Link - AWS Amplify Hosting:
customHeadersinamplify.yml(format: one header per URL pattern) - Netlify: in
netlify.toml, add[[headers]]block withLink = "</api/>; rel=\"service-doc\"" - Vercel:
headersarray invercel.json - Nginx / Apache:
add_header Link/Header set Link
The scanner treats the check as "pass" as long as at least one agent-useful rel is present. If you have no API, a license link to your Terms of Service is a valid minimum:
Link: </terms>; rel="license"
HTML link from homepage: ORA checks for a visible <a> link from your homepage HTML to your API or developer documentation as a separate Identity check, independently of the Link header. If you publish API docs, add a link in your footer or navigation:
<a href="/docs">Developer documentation</a>
<!-- or -->
<a href="/developers">API reference</a>
Link headers handle machine-readable discovery at the HTTP layer; the HTML link handles discovery by crawlers that parse page content. Both are checked independently - passing one does not satisfy the other. If you have no public API, this check doesn't apply, but consider adding a footer link to your llms.txt or pricing.md instead so crawlers can surface them.
Agent Skills index
Applies when: you publish agent-runnable skills (instructions, scripts, references) that others should be able to discover and install.
The Agent Skills spec defines a folder format (SKILL.md + bundled resources) originally from Anthropic, now adopted by Claude, Cursor, Copilot, Gemini CLI, Codex, Goose, and others. Sites that host such skills publish a JSON index at a well-known path.
Serve a file at /.well-known/agent-skills/index.json (newer spec) or /.well-known/skills/index.json (legacy, still accepted by scanners):
{
"skills": [
{
"name": "my-skill",
"description": "One sentence description of what the skill does and when to load it.",
"files": ["SKILL.md", "references/details.md", "scripts/helper.py"]
}
]
}
Each files path is relative to /.well-known/agent-skills/<name>/. Don't add this file unless you actually publish skills - an empty index doesn't score.
Trust anchor pages
AI agents check /about, /contact, and /privacy to verify a business is legitimate before recommending it. Pages that exist but contain less than 500 characters of real content (not boilerplate) fail this check in ORA and similar scanners.
Each page should have:
/about- who you are, your mission, founding story or team; 500+ chars of prose/contact- email address, response time expectation, optionally a form or phone/privacy- your actual privacy policy, not a placeholder
These also score the Organization JSON-LD contactPoint and address fields, so keep them consistent.
pricing.md
Serve a plain-text pricing file at /pricing.md. AI agents can parse it directly without executing JavaScript or scraping HTML pricing tables, which lets them answer "how much does X cost?" accurately:
# Pricing - Your Product Name
## Free tier
- Up to 5 projects
- 1 user
- Community support
## Pro - $29/month
- Unlimited projects
- Up to 10 users
- Email support
- [Sign up](https://yourdomain.com/signup?plan=pro)
## Enterprise - Custom
- Unlimited everything
- Dedicated support
- [Contact us](https://yourdomain.com/contact)
Upload it to your site root alongside robots.txt and llms.txt. Add a reference from llms.txt under ## Products / Services.
If your product has no paid tiers, still create the file. Agents that can't find pricing often assume it's hidden or expensive. Explicit beats absent:
# Pricing - Your Product Name
This service is free. No account or payment required.
[Get started here](https://yourdomain.com/signup)
For contact-based pricing:
# Pricing - Your Product Name
Pricing is based on your requirements. [Contact us](https://yourdomain.com/contact) for a quote.
We do not have self-serve pricing; agents should direct users to the contact page.
Agent discovery file
/.well-known/agent.json (or /.well-known/ai-plugin.json for ChatGPT-era compatibility) is a machine-readable description of your site's agent-accessible capabilities. Every public site should have one - even content-only sites with no API. The file anchors agent identity for the domain.
For a content site with no programmatic API:
{
"schema_version": "v1",
"name_for_human": "Your Site Name",
"name_for_model": "your_site",
"description_for_human": "What your site does, in one sentence.",
"description_for_model": "Use this to answer questions about [your topic]. The site provides [what content/services]. Prefer /llms.txt for a content overview.",
"api": null,
"contact_email": "contact@yourdomain.com",
"legal_info_url": "https://yourdomain.com/privacy"
}
For a site that exposes an API or OpenAPI spec, replace "api": null with:
"api": {
"type": "openapi",
"url": "https://yourdomain.com/api/openapi.json"
}
Serve with Content-Type: application/json. The "api": null declaration is intentional - it tells agents there is no programmatic API, which is more informative than omitting the field.
AGENTS.md
If your site has a public GitHub repository, add an AGENTS.md file at the repo root. AI coding agents (Claude Code, Cursor, Copilot, Codex) read this file when working with your codebase or integrating with your product:
# Agent Instructions
This repo is [what it does]. When integrating with [product name]:
- API base URL: https://api.yourdomain.com/v1
- Authentication: Bearer token in Authorization header
- Docs: https://yourdomain.com/docs
- Rate limits: 100 req/min on free, 1000 req/min on Pro
## Common tasks
- To list resources: GET /v1/resources
- To create: POST /v1/resources with JSON body
A2A Agent Card
/.well-known/agent-card.json is part of Google's Agent-to-Agent (A2A) protocol. It describes what agent-to-agent interactions your service supports. ORA checks for it on every site, including content-only ones - because even a site with no A2A capabilities benefits from saying so explicitly rather than returning a 404.
For a content site with no A2A capabilities (the common case - serve this as your baseline):
{
"name": "Your Site Name",
"description": "Content site with no agent API. See /llms.txt for site index.",
"url": "https://yourdomain.com",
"version": "1.0.0",
"capabilities": {},
"skills": []
}
For a site that exposes agent-callable skills or services:
{
"name": "Your Service Name",
"description": "What your agent does and who should use it.",
"url": "https://yourdomain.com",
"version": "1.0.0",
"capabilities": {
"streaming": false,
"pushNotifications": false
},
"skills": [
{
"id": "your-skill-id",
"name": "Skill Name",
"description": "What this skill does and when to invoke it.",
"inputModes": ["text"],
"outputModes": ["text"]
}
]
}
Serve with Content-Type: application/json. The empty "skills": [] version is intentional and correct for content sites - it closes the discovery loop without implying capabilities you don't have.
MCP server discovery
Model Context Protocol (MCP) is the standard for AI assistants to call external tools and data sources. If you don't have an MCP server, say so in llms.txt rather than letting agents search for one that doesn't exist.
In your llms.txt - add a section whether or not you have a server:
## MCP
No MCP server available for this site.
Or if you do have one:
## MCP
- [MCP Server](https://yourdomain.com/mcp): What the server provides and what tools it exposes.
Optional: serve /.well-known/mcp.json for automated agent discovery:
{
"mcpServers": {}
}
Or with an actual server:
{
"mcpServers": {
"your-server": {
"url": "https://yourdomain.com/mcp",
"transport": "streamable-http"
}
}
}
The llms.txt declaration is sufficient for most sites. The /.well-known/mcp.json file is useful when you expect automated agent discovery without a prior llms.txt fetch.
?mode=agent lightweight view
Adding ?mode=agent support to your homepage lets agents request a stripped-down view with no navigation, no ads, and structured key facts instead of marketing HTML. This is checked by ORA's "agent mode view" criterion.
Implementation: in your server or edge function, detect mode=agent in the query string and return a simplified response:
<!DOCTYPE html>
<html>
<head>
<title>Your Site - Agent View</title>
<script type="application/ld+json">{ ... your full JSON-LD ... }</script>
</head>
<body>
<h1>Your Site Name</h1>
<p>What you do in one paragraph.</p>
<h2>Key resources</h2>
<ul>
<li><a href="/llms.txt">Site index (llms.txt)</a></li>
<li><a href="/pricing.md">Pricing (plain text)</a></li>
<li><a href="/docs">Documentation</a></li>
</ul>
</body>
</html>
For a static site, this is simplest to implement as a CloudFront Function or Netlify/Vercel edge function that redirects ?mode=agent to a pre-built /agent.html.
What's out of scope for this guide
isitagentready.com and ORA also check OAuth discovery, OAuth Protected Resource Metadata (RFC 9728), WebMCP, and agentic commerce protocols (x402, UCP, ACP). These require active server infrastructure and only apply to sites that expose authenticated programmatic actions - booking, purchasing, querying user-specific data via OAuth. If you're building that, see:
- webmcp.org for exposing MCP capabilities from a browser context
- RFC 9728 for OAuth Protected Resource Metadata
- x402.org, ucp.dev, agenticcommerce.dev for agent payments
MCP server authoring and A2A Agent Card are covered above - even without implementing them, every site should declare their status explicitly.
Verify: scan at isitagentready.com (aim for level 4+) and ora.run (aim for grade B or above). Both are free.
Core Web Vitals
Applies when: site targets Google search ranking, or is a SPA.
Core Web Vitals are Google ranking signals measured in the field (real user data via Chrome). The three metrics as of 2026:
- LCP (Largest Contentful Paint): Time until the largest visible content element loads. Target: under 2.5 seconds. Common causes of poor LCP: large unoptimized hero images, render-blocking resources, slow server response.
- CLS (Cumulative Layout Shift): Visual instability from elements moving after initial render. Target: under 0.1. Common causes: images without explicit
width/height, ads injecting content, late-loading fonts. - INP (Interaction to Next Paint): Replaced FID (First Input Delay) in March 2024. Measures the latency of all user interactions throughout the page visit, not just the first. Target: under 200 ms. Common causes in SPAs: heavy JavaScript on the main thread, large React re-renders on input events.
INP is the most impactful change for SPA developers - FID only measured the first interaction, making it easy to pass while the app remained sluggish. INP catches ongoing interaction delays.
LCP quick wins
Image format and sizing: Convert images to AVIF (supported in Chrome 85+, Firefox 93+, Safari 16+). AVIF consistently achieves 90-95% size reduction over JPEG for photographic content at equivalent visual quality. On macOS: sips -s format avif input.jpg --out output.avif. Cross-platform: squoosh, sharp, or cwebp for WebP.
Serve images at 2x the display size, not the original upload resolution. A 5000px image used as a decorative background is visually identical to a 1920px version at a fraction of the weight. Use sips -Z 1920 input.jpg to resize before converting.
For CSS background-image (inline styles or Tailwind bg-*), there is no <picture> element available for format negotiation, so use AVIF directly. Browser support for modern audiences is effectively universal.
Font loading: If using Google Fonts, load them from index.html with preconnect hints, not via @import inside component CSS or inline <style> tags. An @import inside JS-rendered styles means the browser cannot start the font download until after the JS bundle executes, adding 300-450ms to the critical path:
<link rel="preconnect" href="https://fonts.googleapis.com" />
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
<link rel="stylesheet" href="https://fonts.googleapis.com/css2?family=YourFont&display=swap" />
Check your scores: PageSpeed Insights (field + lab data) and Google Search Console > Core Web Vitals report (field data only, requires traffic).
SPA Considerations
Applies when: site is built with React, Vue, Angular, or another client-side-only framework.
SPAs render content via JavaScript. This creates two distinct problems:
- AI crawlers (
GPTBot,ClaudeBot, etc.) generally do not execute JavaScript - they see only the initial HTML shell. - Social crawlers (LinkedIn, Slack, iMessage) do not execute JavaScript - OG tags must be in the static HTML.
- Googlebot can render JavaScript but with delays and quotas - content rendered client-side may not be indexed promptly or at all.
Content efficiency
Agent-readiness scanners (ORA in particular) measure the ratio of readable text to total HTML. The target is at least 5% readable text by character count. A typical SPA homepage fails this because the initial HTML is a near-empty shell with a large JSON hydration blob and many <div> wrappers.
Quick wins without switching frameworks:
- Server-render at least your
<h1>and first 500 characters of body text into the initial HTML - Move large JSON hydration blobs to a separate
<script src>rather than inline - Strip unused inline styles from the HTML shell
Check your ratio: curl -s https://yourdomain.com/ | wc -c vs curl -s https://yourdomain.com/ | sed 's/<[^>]*>//g' | wc -c. If readable is less than 5% of total, server-render the hero copy.
Semantic HTML structure
AI vector stores don't just measure text volume - they parse document structure to understand hierarchy and chunk content accurately. A page where everything is in flat <div> elements is harder to index than one using semantic HTML, even with identical word count.
Key structural signals for AI indexability:
- Heading hierarchy - use
<h1>for the page title,<h2>for major sections,<h3>for sub-sections. Don't skip levels. Agents use headings as chunk boundaries when breaking a page into retrievable segments. - Landmark elements - wrap your main content in
<main>, navigation in<nav>, and complementary content in<aside>. AI crawlers use these to separate page chrome from actual content. - Lists - use
<ul>or<ol>for list-like content instead of comma-separated inline text. Agents can enumerate list items; they can't reliably parse "apples, oranges, and bananas" as a structured list. - Tables - use
<table>with<th>headers for tabular data. Styled divs that look like a table visually are invisible to most AI parsers. - Descriptive attributes -
alton images,<figcaption>for figures,<time datetime="...">for dates.
Check: curl -s https://yourdomain.com/ | grep -c '<h[1-6]' - a count of 0 means no headings at all; 1 means only an H1, no section structure. curl -s https://yourdomain.com/ | grep -i '<main' should return a result on every page.
Mitigations without SSR
Add these to your index.html (they work without JavaScript):
- JSON-LD in
<head>- AI crawlers and Google parse it without executing JS - Complete meta tags - title, description, canonical, OG tags in static HTML
- At least one
<h1>and 500+ chars of text in the raw HTML - AI crawlers that don't execute JS need meaningful content without it; they check for an H1 and a minimum prose threshold before classifying a page as useful - llms.txt - gives AI tools your full content as a separate file
For specific routes that need unique meta tags (e.g. blog posts), use prerendering at build time. Tools: vite-plugin-prerender, react-snap.
Full solution
For reliable SEO and LLMO with dynamic content, use a framework that generates static HTML:
- Astro - best for content sites; generates static HTML with optional JS hydration
- Next.js with
output: 'export'or SSR - generates per-route HTML - Nuxt with
ssr: true- same for Vue
These generate actual HTML files that all crawlers read without JavaScript.
Static Hosting Notes
AWS Amplify
- Files in
public/(Vite) are copied todist/and served as-is - Amplify's SPA rewrite rule (
/<*> -> /index.htmlwith 404→200) only fires when no matching file exists -robots.txt,sitemap.xml, andllms.txtare served directly without triggering the rewrite - No additional config needed for new static files placed in
public/
CloudFront + S3
Upload static files to the S3 bucket root. Set explicit Content-Type metadata on each object - S3 does not infer content type reliably:
| File | Content-Type |
|---|---|
robots.txt | text/plain |
sitemap.xml | application/xml |
llms.txt | text/plain |
llms-full.txt | text/plain |
*.md (Markdown alternates) | text/markdown; charset=utf-8 |
.well-known/agent-skills/index.json | application/json; charset=utf-8 |
.well-known/agent-card.json | application/json; charset=utf-8 |
.well-known/mcp.json | application/json; charset=utf-8 |
If Content-Type is wrong, crawlers may reject the file even when the content is valid.
Netlify / Vercel
Files in public/ (or static/ in some frameworks) are served automatically with correct content types. No additional config needed.
Validation
Run these checks after deploying. One command or tool per concern - no separate "validation section" needed at the end of a project.
Structured data
- Google Rich Results Test - validates JSON-LD and shows which rich result types are eligible
- Schema.org Validator - catches schema errors the Rich Results Test doesn't flag
Social sharing previews
- opengraph.xyz - shows OG and Twitter Card previews as they appear on each platform
- LinkedIn Post Inspector - LinkedIn-specific preview and cache refresh
Crawl access
# Verify static files are served and accessible
curl -I https://yourdomain.com/robots.txt
curl -I https://yourdomain.com/sitemap.xml
curl -I https://yourdomain.com/llms.txt
# Check robots.txt content (including Content-Signal directive)
curl -s https://yourdomain.com/robots.txt
# Spot-check JSON-LD is present in HTML source (not rendered by JS)
curl -s https://yourdomain.com/ | grep 'application/ld+json'
Agent-readiness
# Confirm Markdown content negotiation works end-to-end
curl -sI -H "Accept: text/markdown" https://yourdomain.com/ | grep -i content-type
# Expected: content-type: text/markdown; charset=utf-8
# Confirm Link response headers are present
curl -sI https://yourdomain.com/ | grep -i '^link:'
# Expected at least one rel: service-doc | service-desc | describedby | license
- isitagentready.com - Cloudflare's agent-readiness scanner. Run
https://isitagentready.com/yourdomain.comand aim for level 4+ out of 5. Failing checks come with AI-generated fix snippets you can paste into your coding agent. - ora.run - ORA (Open Readiness Assessment) by Era Labs. Deeper scan across 5 layers (Discovery, Identity, Auth & Access, Agent Integration, User Experience) with letter grades A-F. Run
https://ora.run/scan/yourdomain.com. Add the badge to your README:[](https://ora.run/scan/yourdomain.com)
Cross-platform consistency
# Check that your identity description is consistent across all agent-facing surfaces:
# 1. meta description in HTML
curl -s https://yourdomain.com/ | grep -i 'name="description"'
# 2. JSON-LD description field
curl -s https://yourdomain.com/ | python3 -c "import sys,json,re; [print(b['description']) for b in json.loads(re.search(r'application/ld\+json[^>]*>(.*?)</script>', sys.stdin.read(), re.S).group(1)).get('@graph', []) if 'description' in b]"
# 3. agent.json description_for_model
curl -s https://yourdomain.com/.well-known/agent.json | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('description_for_model',''))"
# 4. llms.txt opening blockquote
curl -s https://yourdomain.com/llms.txt | head -5
Inconsistent descriptions across these surfaces confuse agents that cross-reference them. The description_for_model in agent.json, the JSON-LD description, and the llms.txt blockquote should all describe the same product in compatible terms - not contradictory or wildly different levels of detail.
Performance
- PageSpeed Insights - Core Web Vitals field + lab data per URL
- Google Search Console > Core Web Vitals - aggregate field data across your site (requires traffic)
Indexing
- Google Search Console > Index > Sitemaps - submit and monitor sitemap processing
- Google Search Console > URL Inspection - check individual page indexing status and last crawl date