LLM Override Documentation
Complete technical reference for configuring and extending LLM Override — the M2M translation engine for WordPress.
Introduction
0.1 What is LLM Override?
LLM Override is a B2B trust infrastructure plugin for WordPress that ensures perfect content accessibility for AI systems.
When ChatGPT, Claude, or Perplexity answer a question about your brand, they don’t show your webpage — they synthesize it. They crawl raw HTML, which is a format built for humans, full of scripts, navigation menus, and layout tags. If they cannot parse the structure accurately, they hallucinate the gaps. Traditional SEO cannot fix this.
LLM Override provides a Machine-to-Machine (M2M) translation layer before WordPress renders any HTML. It responds to compliant AI crawlers with a clean, structured Markdown document containing exactly the factual truth of your site — ensuring 100% faithfulness to your visible content without the UI noise.
0.2 The Problem: Why HTML Breaks AI Translation
HTML was designed for browsers, not for AI models.
When an AI crawler visits your site, it receives the same file a human browser does: cookie banners, JavaScript bundles, inline styles, and your actual content — all mixed together. The AI has to guess what matters. It makes that guess at scale, across thousands of pages, with no ability to ask for clarification.
The result is structural parsing failure. The AI fills gaps with the most statistically plausible text it has seen during training, which may be outdated, inaccurate, or lifted from a competitor’s page.
The three most common failure modes:
- Outdated facts: The AI describes a product you discontinued two years ago.
- Terminology mismatch: The AI uses outdated names for your features or pricing tiers.
- Missing context: The AI doesn’t know what makes your infrastructure different, so it defaults to generic industry statements.
A sitemap tells an AI where your URLs are. LLM Override fixes the root issue by standardizing the exact document the AI receives, ensuring verifiable facts and zero hallucinations.
0.3 GEO Compliance vs. SEO
Search Engine Optimization (SEO) is built around one assumption: a human will see the results.
Google crawls your page, ranks it, and shows a link. The entire system is designed to earn clicks. Generative Engine Optimization (GEO) operates under a different assumption: no human will see the raw results. An AI model reads your content, synthesizes an answer, and presents that answer directly to the user. Your page is never shown.
| Dimension | SEO | GEO Compliance |
|---|---|---|
| Target | Googlebot, Bingbot | GPTBot, ClaudeBot, PerplexityBot |
| Output | A ranked link | A synthesized answer |
| Human sees | Your page (if they click) | The AI’s mathematical summary of your page |
| Control mechanism | Keywords, backlinks, metadata | Structured Markdown, Terminology Standardization |
| Failure mode | Low ranking | Brand hallucination / Fact corruption |
| Outcome | Traffic generation | M2M Content Faithfulness |
SEO and GEO are not in conflict. They simply require different infrastructure. LLM Override is the GEO Compliance layer.
0.4 How M2M Translation Works
M2M stands for Machine-to-Machine. It describes the communication channel between an AI crawler and your server — a channel that exists entirely outside the browser, outside your theme, and outside any caching layer.
Here is the exact sequence of events when an AI crawler visits a page protected by LLM Override:
Step 1 — Content Negotiation Signal
LLM Override adds a <link rel="alternate" type="text/markdown"> tag to the <head> of every page on your site. This is a standard Content Negotiation signal that tells compliant AI crawlers: “A machine-readable version of this page exists at this URL.”
Step 2 — Crawler discovery
The AI crawler reads your page’s <head>, finds the alternate link, and follows it. This is automatic behavior for crawlers that comply with the Content Negotiation standard.
Step 3 — Translation Request
The crawler appends ?view=raw to your URL and sends a new request. LLM Override translates this request at the WordPress routing layer — before any theme template loads and before any caching plugin serves a stale response.
Step 4 — M2M Translation Delivery
Instead of HTML, the crawler receives a clean Markdown document containing:
- A YAML frontmatter block with your page title, canonical URL, and last modified date
- Your Site Manifest — a block of verifiable brand facts
- Your page content, stripped of scripts, ads, layout, and UI noise
- With Terminology Standardization already applied to ensure naming consistency
Step 5 — Human experience unchanged
Human visitors follow none of this path. They never receive Markdown, and their experience — including page speed, theme rendering, and caching — is completely untouched.
0.5 The Two-Plugin Architecture
LLM Override uses a two-plugin model.
The Free plugin is a complete, standalone GEO engine. Every core feature — M2M translation, Markdown delivery, Site Manifest, Terminology Standardization, /llms.txt integration, bot detection, Shadow Analytics, and per-post controls — works without a license key.
The Pro addon is a separate plugin at llmoverride.com that extends the Free engine with capabilities designed for large B2B sites and agencies:
- M2M Precision Parser (Copilot) — per-post strict 1:1 structural translations using an LLM to generate perfect Markdown, plus RAG JSON-LD extraction
- Batch Compilation Engine — site-wide background compilation via Action Scheduler
- GEO Analytics — GDPR-compliant IP hashing telemetry, bot fingerprinting, and Content Faithfulness Score checking
- Autopilot llms.txt — AI-drafted site manifesto strictly grounded in your actual content
- Agency MCP Server — expose a full Model Context Protocol endpoint for remote agent orchestration
0.6 What Does NOT Change for Human Visitors
Nothing.
LLM Override operates on a separate delivery channel that human browsers never trigger.
Specifically:
- Your theme renders exactly as it does today
- Your page speed is unaffected — M2M requests bypass your cache, but human requests continue to be served from cache normally
- Your SEO rankings are unaffected — LLM Override adds
X-Robots-Tag: noindexto all Markdown responses, telling search engine bots to ignore them - Your URLs do not change — there is no parallel site or redirect
- Your existing plugins continue to work seamlessly
The only thing that changes is the mathematical precision with which AI models ingest your verified facts.
Getting Started
1.1 System Requirements
Before installing LLM Override, verify your environment meets these requirements:
| Requirement | Minimum | Recommended |
|---|---|---|
| WordPress | 6.0 | Latest stable |
| PHP | 7.4 | 8.1 or higher |
| MySQL / MariaDB | 5.7 / 10.3 | 8.0 / 10.6 |
| Hosting type | Any (shared, VIP, managed) | Any — including read-only filesystems |
| SSL | Recommended | Required for Agency MCP endpoints |
LLM Override does not write to the filesystem. It uses only the WordPress Options API, Transients API, and standard Rewrite Rules — ensuring verifiable operation on every enterprise hosting environment, including Kinsta, WP Engine, WordPress VIP, and Cloudways.
1.2 Installing the Free Plugin
The Free plugin is available directly from the WordPress.org Plugin Directory.
Option A — Install from the WordPress dashboard:
- Go to Plugins → Add New Plugin in your WordPress admin.
- Search for
LLM Override. - Click Install Now, then Activate.
Option B — Manual upload:
- Download the
.zipfile from wordpress.org/plugins/llm-override. - Go to Plugins → Add New Plugin → Upload Plugin.
- Select the
.zipfile and click Install Now, then Activate.
No configuration is required after activation. The M2M translation engine starts working immediately to ensure content accessibility for AI systems.
1.3 Installing the Pro Addon
The Pro addon is a separate plugin available at llmoverride.com. It requires the Free plugin to be installed and active.
- Purchase a Pro license at llmoverride.com.
- Download the
llm-override-pro.zipfile from your account. - Go to Plugins → Add New Plugin → Upload Plugin.
- Select the
.zipfile and click Install Now, then Activate. - Go to LLM Override → License and enter your license key.
Important: Do not deactivate or delete the Free plugin. The Pro addon hooks into the Free engine’s architecture as an extensibility layer. Both plugins must be active simultaneously.
1.4 First 5 Minutes: What to Do After Activation
The plugin works out of the box, but three crucial actions will establish your baseline GEO Compliance.
Action 1 — Verify the engine is running (30 seconds)
Go to LLM Override → Dashboard. You will see the engine status indicator. If it shows active, the M2M translation layer is functioning. If it shows a warning, the dashboard will tell you exactly what requires adjustment.
Action 2 — Define your Site Manifest (3 minutes)
Go to LLM Override → Semantic Rules. In the Site Manifest field, write 3–5 sentences establishing the indisputable facts of your brand: who you are, what infrastructure you provide, who you serve, and compliance data. This text will anchor every AI payload from this moment forward.
Acme Corp is a B2B SaaS company founded in 2019. We build invoice automation software
for construction companies with 10–200 employees. Our core product reduces invoice
processing time from 4 days to 6 hours. We are headquartered in Madrid, Spain, and
serve clients across the EU and Latin America. We are SOC2 Type II compliant.
Action 3 — Verify the M2M Payload (1 minute)
Open any post or page on your site. In the WordPress Admin Bar at the top, click View as AI. You will see the exact Markdown document that an AI crawler receives from that page — including your Site Manifest in the YAML frontmatter and any Terminology Standardization applied perfectly.
That is your baseline. Everything from here is mathematical refinement.
1.5 The Dashboard: Understanding Your KPIs
The LLM Override Dashboard is your compliance command center. Here is what each metric means and why it matters.
Engine Status
A live health check confirming the M2M translation layer is active and the ?view=raw endpoint is correctly handling content negotiation.
Bot Hits (Total)
The cumulative number of M2M translations delivered since activation. Each hit represents an AI crawler reading your optimized Markdown instead of unstructured HTML.
Intercepted URLs
The number of distinct pages successfully processed and delivered to AI crawlers.
Before vs. After Simulation
A diagnostic tool that allows you to paste any URL from your site and compare the raw HTML an AI would receive without the plugin versus the strict, token-optimized Markdown it receives with the plugin active.
Health Checklist
A list of configuration items the plugin audits automatically:
- Site Manifest: set or empty
- Terminology Map: configured or not
/llms.txt: generation integrityrobots.txtdirective: active or missing- Cache bypass: functioning for your hosting environment
Any item showing a warning has a direct link to the setting that resolves it.
M2M Interception Engine
2.1 How the Interceptor Works
The M2M Interception Engine is the core infrastructure of LLM Override. Everything else — Site Manifest, Terminology Standardization, /llms.txt, and GEO Analytics — sits on top of it.
Most tools for AI optimization generate a list of URLs that tell a crawler where your content is. The M2M engine defines how the crawler processes your content.
When an AI crawler requests a page on your site, LLM Override routes that request at the initial WordPress layer (the template_redirect hook) — before any template loads, before any database query runs for theme output, and before any page builder renders its layout logic. The HTML WordPress would normally generate is never produced for the machine. Instead, the engine responds directly with a structured Markdown translation, sets the correct HTTP headers (Content-Type: text/plain; charset=utf-8), and cleanly exits the process.
The result: the crawler receives clean, dense, semantically structured facts. Your theme never runs. Your server does less work, and the AI parses the data flawlessly.
2.2 The Four-Layer Detection Cascade
LLM Override uses a four-layer detection cascade to identify AI crawler requests. Each layer operates independently — if any layer triggers, the engine serves the M2M payload.
| Layer | Method | Description |
|---|---|---|
| 1 | ?view=raw | Explicit M2M endpoint — advertised via <link rel="alternate"> in <head> |
| 2 | User-Agent | 52 known AI crawler strings across 3 categories: training, query, agent |
| 3 | Content Negotiation | Accept: text/markdown HTTP header |
| 4 | Stealth Fingerprinting | Detects browser-like UAs missing Sec-Fetch-* headers (localhost excluded) |
Layer 1 — The ?view=raw parameter
LLM Override adds the following tag into the <head> of every singular post and page:
This is a standard HTML Content Negotiation signal. AI crawlers that follow this standard — including GPTBot, ClaudeBot, and PerplexityBot — read this tag, identify the alternate machine-readable version, and fetch it automatically. Human browsers ignore this tag entirely.
Mechanism 2 — Passive bot detection via User-Agent
For AI crawlers that don’t proactively follow Content Negotiation but are known to the plugin, LLM Override leverages safe User-Agent matching. When a request arrives from a recognized bot, the engine intercepts it automatically and delivers Markdown.
LLM Override includes a dictionary of over 50 known AI crawlers across 4 behavioral categories:
| Category | Description | Examples |
|---|---|---|
| Training | Bots harvesting data for model training | CCBot, Common Crawl |
| Query | Real-time RAG fetch during inference | GPTBot, ClaudeBot, PerplexityBot |
| Discovery | Crawlers mapping site structure and manifests | Amazonbot, Applebot |
| Scraping | Unclassified AI traffic | Bytespider, DataForSEO |
2.3 HTML → Markdown Translation
When the engine processes a request, it retrieves the raw content of the requested post or page from the WordPress database and runs it through a deterministic translation pipeline.
Stage 1 — Content extraction
The engine extracts the post content directly from the database, bypassing the theme. This isolates the factual editorial content from navigation, sidebars, footers, cookie banners, and visual UI components.
Stage 2 — Element stripping
Before conversion, the engine strips elements that add noise to an AI payload: <script>, <style>, <iframe>, <object>, <embed>, empty elements, BOM markers, and Zero-Width Spaces (U+200B) that cause parser errors in ChatGPT and Claude.
Stage 3 — Structural Markdown conversion
The cleaned HTML is processed by a parser that converts standard HTML elements to their Markdown equivalents: headings become # markers, bold text becomes **text**, lists become - bullets, links become [text](url). This is a strict 1:1 structural map.
Stage 4 — Data Standardization
The engine prepends a YAML frontmatter metadata block and your global Site Manifest, and then applies Terminology Standardization filtering to the core content to ensure all nomenclature matches your official brand dictionary.
Stage 5 — Delivery
The engine sets Content-Type: text/markdown; charset=UTF-8, disables all caching layers, adds X-Robots-Tag: noindex, and outputs the Markdown document.
2.4 YAML Frontmatter
Every M2M payload begins with a YAML frontmatter block — a structured metadata section placed at the very start of the Markdown document, delimited by ---.
A typical frontmatter block looks like this:
---
title: How Invoice Automation Works
canonical_url: https://yoursite.com/how-it-works/
last_updated: 2026-03-01T14:22:00+00:00
plugin_version: 1.0.0
---
# How Invoice Automation Works
Acme Corp is a B2B SaaS company founded in 2019. We build invoice automation
software for construction companies with 10–200 employees. Our core product
reduces invoice processing time from 4 days to 6 hours.
Why YAML frontmatter matters
AI language models are specifically trained to parse and prioritize YAML frontmatter. Processing the frontmatter before the body gives you context control at the highest-priority position in the document.
| Field | Source | Purpose |
|---|---|---|
title | WordPress post title | Gives the AI the canonical name of the document |
canonical_url | WordPress permalink | Anchors the content to a specific, citable source |
last_updated | WordPress post modified date | Ensures verifiable payload freshness |
plugin_version | Plugin version | Identifies the delivery infrastructure |
The Site Manifest position
The Site Manifest does not sit inside the YAML frontmatter. It is positioned in the actual Markdown document immediately following the H1 heading. This ensures that the AI reads your indisputable brand facts as the very first sentences of every single page on your site.
2.5 Cache Bypass System
Caching plugins serve pre-rendered HTML from disk or memory, bypassing WordPress’s routing layer entirely. LLM Override solves this by detecting active caching systems at request time and programmatically disabling them — exclusively for M2M requests.
| System | Method used |
|---|---|
| WP Rocket | DONOTCACHEPAGE constant |
| LiteSpeed Cache | LSCACHE_NO_CACHE constant + X-LiteSpeed-Cache-Control: no-cache header |
| W3 Total Cache | DONOTCACHEPAGE constant |
| FastCGI / Nginx | Cache-Control: no-store, no-cache header |
| Varnish / Cloudflare | Cache-Control: no-store, no-cache header |
Human visitors are never affected. Only M2M requests bypass the cache — guaranteeing that crawlers always receive your latest published verifiable facts.
2.6 X-Robots-Tag: noindex
Every Markdown response served by LLM Override includes the following HTTP header:
X-Robots-Tag: noindex
This header tells Googlebot and Bingbot to ignore the response entirely. Without it, Google could potentially see your Markdown pages as duplicate content, conflicting with the HTML versions that are your actual SEO-ranked pages.
Important: This applies only to search engine bots. AI language models (ChatGPT, Claude, Perplexity) do not obey
X-Robots-Tag— they read the content regardless, which is the exact intended behavior. The header protects your SEO while enabling your GEO.
2.7 The Kill Switch
LLM Override includes a global kill switch that instantly disables the entire M2M translation layer.
Go to LLM Override → Settings → Enable M2M Interception and uncheck the checkbox. Instantly:
- All
?view=rawrequests will return standard HTML - The
<link rel="alternate">tag is removed from your pages’<head> - The
/llms.txtendpoint is deactivated - Shadow Analytics stops recording
Your WordPress site returns to standard behavior as if the plugin did not exist. Re-enabling the switch restores all functionality immediately — no data is lost.
Site Manifest
3.1 What the Site Manifest Does
The Site Manifest is a block of factual text that anchors your brand identity for AI crawlers. It is injected into the /llms.txt and /llms-full.txt discovery endpoints — the site-wide indexes that AI crawlers read to understand your entire content inventory. This ensures your verifiable brand facts are delivered at the site level without injecting them into individual per-page M2M payloads.
This is not a marketing tagline. This is a factual positioning instrument that anchors the AI’s understanding of your company before it processes the rest of the page content.
Think of it as a data commitment: you write exactly what is true, and the AI uses exactly that when answering questions about your brand.
3.2 How to Write an Effective Site Manifest
The Site Manifest is configured at LLM Override → Semantic Rules → Site Manifest.
A high-impact Site Manifest typically contains five pieces of factual information:
1. Who you are (legal entity + founding date)
This is the anchor. AI models will cite this when asked “Who is [company]?” — if you don’t provide it, the AI invents one from its training data.
2. What you do (specific, not generic)
Avoid “we help companies grow.” That is marketing copy. Instead: “We build invoice automation software for construction companies.” This gives the AI a precise retrieval token for RAG queries.
3. Who you serve (ICP)
This stops the AI from generating answers that position you in the wrong market. “We serve construction companies with 10–200 employees in the EU and Latin America” is mathematically more useful than “we serve businesses worldwide.”
4. What you are not
This is the most underused tactic. If the AI is consistently confusing you with a competitor or a different type of product, state the negative clearly: “We are not a general-purpose ERP. We do not offer HR management features.” This creates a hard boundary in the AI’s reasoning.
5. Non-negotiable facts
Anything verifiable that, if stated incorrectly by the AI, would damage your credibility: compliance certifications (SOC2, ISO 27001), headquarters location, founding year, number of customers.
Full example:
Acme Corp is a B2B SaaS company founded in 2019. We build invoice automation
software for construction companies with 10–200 employees. Our core product
reduces invoice processing time from 4 days to 6 hours. We are headquartered
in Madrid, Spain, and serve clients across the EU and Latin America. We are
SOC2 Type II compliant. We are not an ERP. We do not offer HR or payroll
functionality.
Length: Keep it under 300 words. AI models have a context window — a long Site Manifest dilutes the per-page content. Be factual, not poetic.
llms.txt Standard
4.1 What Is /llms.txt?
The /llms.txt file is a machine-readable site index specifically designed for AI crawlers. It is the equivalent of robots.txt for language models — a single endpoint that lists all pages on your site with contextual metadata, giving any AI crawler an immediate, structured overview of your entire content inventory.
LLM Override generates your /llms.txt automatically. No manual configuration is required.
Access it at: https://yoursite.com/llms.txt
Here is the typical structure:
# Acme Corp
> Acme Corp is a B2B SaaS company founded in 2019.
> We build invoice automation software for construction companies.
## Pages
- [How Invoice Automation Works](https://yoursite.com/how-it-works/?view=raw)
- [Pricing Plans](https://yoursite.com/pricing/?view=raw)
- [About Acme Corp](https://yoursite.com/about/?view=raw)
- [Contact Sales](https://yoursite.com/contact/?view=raw)
Each URL in the manifest points directly to the M2M endpoint (?view=raw) of that page — meaning an AI can follow any link and receive the optimized Markdown version instantly.
4.2 robots.txt Integration
LLM Override automatically adds the following line to your WordPress robots.txt output:
# LLM Override - AI Content Index
Sitemap-LLM: https://yoursite.com/llms.txt
This is passive bot discovery: AI crawlers that check robots.txt before crawling your site will find the llms.txt reference and fetch it first, giving them a complete map of your content before visiting individual pages.
4.3 <meta name="llms"> Tag
In addition to robots.txt, LLM Override injects a <meta> tag into the <head> of every page:
This tag serves a different purpose from robots.txt and llms.txt. While those are site-level discovery mechanisms, the <meta> tag is a page-level signal. An AI crawler processing your HTML will find this tag in the document it is already reading, and can immediately discover the /llms.txt endpoint without a separate robots.txt fetch.
4.4 Post Type Inclusion Rules
By default, LLM Override includes all public post types in your /llms.txt index: posts, pages, and any custom post types registered with public = true.
You can control which post types are included at LLM Override → Settings → Post Type Selection. Unchecking a post type:
- Removes it from
/llms.txt - Disables M2M interception for that post type’s content
- Hides the LLM Override metabox from that post type’s editor
4.5 SEO Plugin Synchronization
If a post is marked as noindex in your SEO plugin, LLM Override respects that configuration and automatically excludes it from the /llms.txt manifest and from M2M interception.
Supported SEO plugins:
- Yoast SEO
- Rank Math
- SEOPress
- All in One SEO (AIOSEO)
This synchronization is automatic. No additional configuration is needed. If a page shouldn’t appear in Google, it won’t appear in your AI manifest either.
Payload Precision
5.1 The Per-Post Metabox
Every post and page in WordPress has a dedicated LLM Override metabox in the editor sidebar. This is your per-page control center for the M2M payload.
The metabox provides the following controls:
Exclusion Controls (3 levels)
| Checkbox | Scope | Effect |
|---|---|---|
| Exclude from LLM Override entirely | Master exclusion | Hides post from M2M delivery, /llms.txt, JSON-LD, and <link alternate> |
Exclude from /llms.txt | Discovery only | Post is still M2M-accessible via ?view=raw, but doesn’t appear in the manifest |
| Exclude from JSON-LD | Schema only | Disables JSON-LD Semantic Enclosure for this post while keeping everything else active |
Use the master exclusion for pages that contain no content relevant to AI (login pages, thank-you pages, internal dashboards, or pages with content you explicitly do not want synthesized).
M2M Payload Override (Bypass)
A full text area where you can write a complete custom Markdown payload for this specific post. When this field contains content, the M2M engine will serve exactly that content — bypassing the HTML → Markdown conversion engine entirely.
Important: When using a bypass, you are responsible for the content of the payload. The engine will still enforce Terminology Standardization on the bypass content, but the HTML → Markdown conversion is skipped entirely. Make sure your content is valid Markdown — including proper heading hierarchy — before saving.
5.2 How the Priority System Works
The M2M engine uses a strict priority hierarchy for each post when building the payload:
- M2M Precision Parser (Copilot) output — If a Copilot-compiled payload exists (Pro only), it is used. This is the highest-fidelity output available.
- Manual bypass — If a user has written custom Markdown in the bypass field, it is used.
- Automated conversion — If neither of the above exists, the engine extracts post content and performs the standard HTML → Markdown translation.
This hierarchy means you can use the automated engine for most pages and override only the ones that require high-precision control.
5.3 Customizing the llms-full.txt Excerpt
The automated /llms-full.txt endpoint includes a brief excerpt for each page in your site index. By default, this excerpt is generated automatically by stripping Markdown to plain text and truncating.
You can override this excerpt for any individual post in the metabox using the Custom llms.txt Description field. When this field contains text, the automated truncation is bypassed and your custom description appears in the full index instead.
Tip: Use the custom description for your highest-traffic pages — your homepage, pricing page, and core product pages. These are the pages most likely to appear in AI-generated summaries, and a precise excerpt in the full manifest gives the AI better context before it even visits the page.
5.4 The “View as AI” Button
The “View as AI” button is an empirical verification tool that shows you the exact Markdown document an AI crawler would receive for any page on your site.
You can access it in two ways:
- Admin Bar: When viewing any post or page on the front end, the WordPress Admin Bar shows a “View as AI” button. Clicking it opens the
?view=rawendpoint in a new tab. - Post Editor: Inside the LLM Override metabox, a “Preview M2M Output” link opens the same endpoint.
What you see is exactly what the AI sees. There is no simulation or approximation. The button simply requests the same URL with the same parameter that an AI crawler would use.
Use this after every significant content edit to verify that:
- Your Site Manifest appears correctly
- Terminology Standardization rules are firing as expected
- The heading hierarchy is clean
- Code blocks, tables, and lists are properly formatted
- No UI noise leaked into the payload
5.5 JSON-LD Semantic Enclosure
LLM Override automatically injects a JSON-LD <script type="application/ld+json"> block into the HTML <head> of every eligible post. This structured data enclosure contains the M2M-translated Markdown inside a Schema.org articleBody field — allowing discovery bots and RAG pipelines to ingest your clean content directly from the DOM without visiting the ?view=raw endpoint.
Schema type logic:
| Condition | @type | Reason |
|---|---|---|
| Yoast SEO or Rank Math active | TechArticle | Avoids duplicate Article schema collision |
| No SEO plugin detected | Article | Standard schema type |
Key properties injected:
headline— Post titlearticleBody— Full Markdown content (truncated at 10,000 characters by default)url— Canonical permalinkdateModified— Last modified timestamp in ISO 8601author/creator— Post author display name
Controls:
- Per-post: “Exclude from JSON-LD” checkbox in the metabox
- Programmatic:
llm_override_jsonld_enabledfilter (returnfalseto disable) - Max body length:
llm_override_jsonld_max_lengthfilter (default: 10,000) - Schema customization:
llm_override_jsonld_schemafilter
Shadow Analytics Lite
6.1 What Shadow Analytics Tracks
Shadow Analytics Lite is a lightweight bot telemetry layer built into the free version of LLM Override. It records every M2M translation request served by your site — giving you empirical visibility into AI crawler activity without requiring any external analytics tools.
Every time an AI crawler triggers the M2M endpoint, the engine logs the following data in the WordPress database:
| Field | Description |
|---|---|
bot_name | Identified crawler (e.g., “GPTBot”, “ClaudeBot”, “Stealth-Bot”) |
url | The page URL that was requested |
timestamp | Exact date and time of the request |
user_agent | Full User-Agent string from the request |
trigger | Detection method: view_raw, user_agent, content_negotiation, or stealth_fingerprint |
bot_type | Bot category: training, query, agent, or unknown |
6.2 Viewing the Log
Go to LLM Override → Analytics to view the log. The table shows the most recent requests, sorted by timestamp.
Filtering: You can filter by bot name to isolate traffic from a specific crawler (e.g., show only GPTBot activity) or by date range to analyze trends.
Export: The log can be exported as CSV for external analysis or client reporting.
6.3 Log Retention
Shadow Analytics Lite stores the most recent 1,000 events by default. Older entries are automatically pruned to prevent database bloat.
This limit is configurable via the filter llm_override_analytics_retention:
add_filter( 'llm_override_analytics_retention', function() {
return 5000; // Keep the last 5,000 events
});
6.4 GDPR Compliance
Shadow Analytics Lite was designed from the beginning to be GDPR-compliant by default:
- No personal data is collected. The system records bot identifiers and URLs, not human visitor data.
- No data is transmitted externally. All analytics data stays in your WordPress database — no third-party services, no external API calls, no tracking pixels.
- No cookies are set. The analytics layer operates entirely at the server level.
- IP addresses are not stored. The engine identifies bots by their User-Agent string, not by IP.
For enterprise organizations that need granular privacy controls, the Pro addon’s GEO Analytics module offers GDPR-compliant IP hashing for deeper traffic analysis — but even that module uses irreversible hashing (SHA-256 with rotating salts), ensuring no raw IP is ever stored.
6.5 Stealth Bot Detection
Beyond User-Agent matching, LLM Override includes a fingerprinting layer that detects AI crawlers disguised as regular browsers. Real browsers always send specific Sec-Fetch-* headers — bots impersonating browsers typically omit them.
| Header | Expected in Real Browser | Missing = Bot Signal |
|---|---|---|
Sec-Fetch-Mode | navigate | ✅ Counted |
Sec-Fetch-Site | none or same-origin | ✅ Counted |
Sec-Fetch-Dest | document | ✅ Counted |
A request with a browser-like User-Agent (Chrome, Firefox, Safari, Edge) that is missing 2 or more of these headers is classified as a stealth bot and served the M2M payload.
Safeguards:
- Requests from
127.0.0.1,::1, or hostnames ending in.localare always excluded from stealth detection to prevent false positives during local development. - Known search engine crawlers (Googlebot, Bingbot, Applebot, etc.) are whitelisted and never flagged regardless of their headers.
- Stealth detection can be toggled off globally via LLM Override → Settings → Enable Stealth Bot Detection.
M2M Precision Parser
This feature requires LLM Override Pro.
7.1 What the Precision Parser Does
The M2M Precision Parser (formerly “AI Copilot”) is a per-post translation engine that uses an AI model to generate a strict 1:1 structural translation of your page content. Instead of the deterministic HTML → Markdown conversion performed by the free engine, the Precision Parser sends your content to a language model with a tightly constrained system prompt that enforces absolute factual faithfulness.
The key difference:
| Engine | Translation Method | Best for |
|---|---|---|
| Free (Automated) | Deterministic HTML → Markdown conversion | Content that is already clean and well-structured |
| Pro (Precision Parser) | AI-driven structural translation with facts enforcement | Complex pages, visual builders, dynamic content |
The Precision Parser does not add, remove, rewrite, or paraphrase any content. It performs a strict structural 1:1 translation: the same information, presented in perfect Markdown hierarchy, verified against the source document.
7.2 BYOK Architecture
The Precision Parser uses a Bring Your Own Key (BYOK) architecture. You provide your own API key from any supported AI provider, and LLM Override Pro sends translation requests directly to that provider’s API.
Supported providers:
| Provider | Configuration location |
|---|---|
| OpenAI | LLM Override → Copilot → OpenAI API Key |
| Anthropic | LLM Override → Copilot → Anthropic API Key |
| DeepSeek | LLM Override → Copilot → DeepSeek API Key |
| OpenRouter | LLM Override → Copilot → OpenRouter API Key (supports 100+ models) |
Your API key is encrypted at rest using WordPress’s AUTH_SALT constant and is never transmitted to any server other than the AI provider’s API endpoint. LLM Override does not proxy, store, or log your API requests.
7.3 How to Compile a Post
- Open any post or page in the WordPress editor.
- In the LLM Override metabox, click the “Compile with Copilot” button.
- The plugin sends the post’s raw HTML content to your configured AI provider with a system prompt that enforces strict 1:1 structural translation.
- The AI returns a Markdown document, which is stored as the compiled M2M payload for that post.
- From this point forward, whenever an AI crawler requests this post, it receives the compiled payload — not the automated conversion.
You can re-compile at any time by clicking the button again. Each compilation overwrites the previous compiled payload.
7.4 The System Prompt
The Precision Parser uses a tightly constrained system prompt that enforces three rules:
- 1:1 structural fidelity: Every heading, paragraph, list, and table in the HTML must appear in the Markdown output. Nothing is added, removed, or paraphrased.
- Facts-only operation: The AI is explicitly forbidden from generating any information not present in the source document.
- Hierarchy preservation: The heading structure (H1 → H2 → H3) must be preserved exactly. No heading level changes.
Dynamic placeholders:
System prompts support placeholders that are replaced at compile time with the actual post data:
{{post_title}} → The post's title
{{post_url}} → The post's canonical URL
{{post_content}} → The post's raw HTML content
{{site_manifest}} → The global Site Manifest text
{{terminology_map}} → The active terminology rules
7.5 RAG JSON-LD Extraction
The Precision Parser includes an advanced capability: automated extraction of structured data from your page content into JSON-LD Schema.org markup, specifically engineered for Retrieval-Augmented Generation (RAG) systems.
When you compile a post, the Precision Parser doesn’t just create a Markdown translation — it also analyzes the content and generates JSON-LD structured data blocks that AI systems can use for precise entity extraction during query retrieval.
This feature is available directly in the per-post metabox editing interface. The generated JSON-LD is embedded within the compiled payload metadata, making it accessible to any AI system that ingests your M2M content.
7.6 Content Faithfulness Score
Every compiled payload is automatically scored using the Content Faithfulness Score — a Jaccard similarity metric that measures token-level deviation between the source HTML and the compiled Markdown.
The score ranges from 0 to 100:
| Score | Interpretation | Action Required |
|---|---|---|
| 80–100 | High faithfulness — exact structural match | None. Payload is compliant. |
| 60–79 | Moderate deviation — structural differences present | Review the compiled output for omissions or repositioned content. |
| Below 60 | Significant deviation — cloaking risk | Re-compile or use manual bypass. Investigate the cause of divergence. |
This score is displayed in the metabox after every compilation and is also visible in the Batch Compilation Engine dashboard (Block 8).
The Faithfulness Score is the single most important compliance metric in LLM Override. A score below 60 is a mathematical indicator that the AI-generated Markdown does not faithfully represent the human-visible content — which constitutes operational cloaking risk.
Batch Compilation Engine
This feature requires LLM Override Pro.
8.1 How the Batch Engine Works
The Batch Compilation Engine is a background processing system that lets you compile M2M payloads for your entire site — or any filtered subset of it — without manually clicking “Compile” on each post.
The engine is built on WordPress’s Action Scheduler, which is the same battle-tested asynchronous job queue used by WooCommerce for processing orders at scale. This means batch jobs run reliably in the background, survive server restarts, and handle thousands of posts without timing out.
8.2 Running a Batch Job
Go to LLM Override → Batch Compile.
- Select the post types to include (posts, pages, custom post types).
- Optionally filter by status (published, draft) or by specific categories/tags.
- Click “Start Batch Compilation”.
The engine will queue every matching post and process them one by one in the background. Each post is sent to your configured AI provider (via BYOK) for Precision Parser compilation.
You can monitor progress in real time on the batch dashboard, which shows:
- Total posts queued
- Posts compiled
- Posts failed (with error details)
- Content Faithfulness Score distribution
- Estimated time remaining
8.3 Rate Limiting and Cost Control
The batch engine includes built-in rate limiting to prevent API quota exhaustion:
- Default rate: 1 compilation every 5 seconds.
- Configurable delay: Adjustable at LLM Override → Copilot → Batch Delay.
- Automatic retry: Failed compilations (API timeouts, rate limit errors) are automatically retried up to 3 times with exponential backoff.
Cost estimation: Before starting a batch job, the dashboard shows an estimated API cost based on your selected provider, the average token count of your content, and the number of posts to compile.
8.4 Autopilot Mode
Autopilot is a continuous synchronization mode that automatically recompiles M2M payloads whenever the source content changes.
When Autopilot is enabled:
- Every time you publish or update a post, the engine will automatically compile a fresh M2M payload within minutes.
- The engine uses Action Scheduler to queue the compilation asynchronously — the editor does not slow down.
- If the compilation fails, it is retried automatically.
Enable Autopilot at LLM Override → Batch Compile → Autopilot Mode.
Autopilot is designed for sites that update content frequently and need their M2M payloads to always reflect the latest published version without manual intervention.
GEO Analytics
This feature requires LLM Override Pro.
9.1 How GEO Analytics Differs from Shadow Analytics
Shadow Analytics Lite (included in the free plugin) records basic bot request data: which bot visited, which page, and when. GEO Analytics extends this with deep telemetry designed for B2B agencies and enterprise compliance teams.
| Capability | Shadow Analytics Lite | GEO Analytics (Pro) |
|---|---|---|
| Bot identification | ✅ | ✅ + fingerprinting |
| Request logging | ✅ (last 1,000) | ✅ (unlimited, configurable retention) |
| IP hashing | ❌ | ✅ (SHA-256, GDPR-compliant) |
| Content Faithfulness Score tracking | ❌ | ✅ |
| Entity injection detection | ❌ | ✅ |
| Client reporting | ❌ | ✅ (client-labeled reports) |
| Operational vs. Interception log split | ❌ | ✅ |
9.2 Bot Fingerprinting
GEO Analytics doesn’t just record the User-Agent string — it builds a behavioral fingerprint for each crawler based on request patterns, frequency, and content access sequences. This allows you to differentiate between:
- Genuine AI crawlers (GPTBot, ClaudeBot) and impersonators
- Training crawlers (bulk page fetches) and RAG crawlers (targeted, real-time queries)
- New, previously unseen crawlers that may require classification
9.3 Content Faithfulness Score Tracking
Every compiled M2M payload has a Faithfulness Score (see Block 7: M2M Precision Parser). GEO Analytics tracks this score over time, giving you a historical view of your site’s content compliance posture.
The dashboard shows:
- Site-wide average Faithfulness Score
- Score distribution (how many posts are above 80, between 60–79, below 60)
- Score changes after re-compilation
- Posts flagged as “cloaking risk” (score below 60)
9.4 Entity Injection Tracking
Entity injection tracking monitors whether AI models are inserting unauthorized entities (competitor names, incorrect product names, fabricated team members) into answers about your brand.
This module works by comparing your Site Manifest entities against the entities present in compiled payloads, detecting any additions that don’t originate from your verified content.
9.5 Client Reporting
For agencies managing multiple client sites, GEO Analytics includes a client reporting feature that generates white-label compliance reports.
Reports include:
- Bot activity summary (crawl volume by bot, by page)
- Content Faithfulness Score overview
- Terminology Standardization enforcement metrics
- GEO coverage percentage (pages with compiled payloads vs. total)
- Recommendations for compliance improvement
Master Fact Manifest
This feature requires LLM Override Pro.
10.1 What Is the Master Fact Manifest?
The Master Fact Manifest is an AI-generated comprehensive document that replaces the basic /llms.txt index with a deep, factually grounded description of your entire site. Instead of a simple list of URLs, the Master Fact Manifest is a multi-section document that gives AI crawlers immediate access to your company’s complete factual profile.
Think of it as the difference between a table of contents and an executive brief.
10.2 How It’s Generated
The Master Fact Manifest is compiled using the same BYOK AI provider configured for the Precision Parser. The generation process:
- The engine collects the Site Manifest, all compiled M2M payloads, and the Terminology Map.
- It sends this aggregated dataset to your AI provider with a system prompt that enforces factual summarization — no creative content, no marketing language, strictly facts.
- The AI generates a structured document covering: company overview, product/service descriptions, key differentiators, compliance information, and contact details — all grounded exclusively in your published content.
- The generated manifest is served at
/llms.txtinstead of the automated index.
10.3 Regeneration and Freshness
The Master Fact Manifest can be regenerated at any time from the LLM Override → llms.txt settings page.
When Autopilot Mode is enabled (Block 8), the Master Fact Manifest is automatically regenerated whenever the underlying content changes significantly — ensuring the document always reflects your latest published facts.
10.4 Fallback Behavior
If you have Pro installed but haven’t generated a Master Fact Manifest, the /llms.txt endpoint falls back to the standard automated index. There is no disruption — the endpoint always serves something useful.
Agency MCP Server
This feature requires LLM Override Pro (Agency).
11.1 What MCP Enables
The Agency MCP Server exposes a full Model Context Protocol (MCP) endpoint on your WordPress site. MCP is an open standard that allows AI agents (Claude Desktop, Cursor, custom automation pipelines) to interact with external systems programmatically.
With the MCP Server active, an AI agent can:
- Read your site’s GEO compliance status
- Read the compiled M2M payload for any page
- Read your Site Manifest
- Update your Site Manifest and per-post payloads
- Trigger batch compilations
This turns your WordPress site into a GEO compliance data source that external tools can query and update without human intervention.
11.2 Authentication
The MCP endpoint uses WordPress’s built-in Application Passwords for authentication. Every request must include a valid Application Password with manage_options capability.
To set up authentication:
- Go to Users → Your Profile → Application Passwords.
- Create a new Application Password (name it “MCP Agent” or similar).
- Copy the generated password — it will only be shown once.
- Use it in your MCP client’s configuration.
Security note: Application Passwords are separate from your WordPress login password. They can be revoked individually at any time, and they only grant API access — they cannot be used to log into the WordPress dashboard.
11.3 Endpoint Discovery
The MCP endpoint is available at:
https://yoursite.com/wp-json/llm-override/v1/mcp
Example with cURL:
curl -X POST https://yoursite.com/wp-json/llm-override/v1/mcp \
-H "Content-Type: application/json" \
-u "admin:xxxx xxxx xxxx xxxx xxxx xxxx" \
-d '{"tool": "get_site_manifest"}'
11.4 Read Operations
Read operations retrieve your site’s GEO compliance data without modifying anything.
get_site_manifest
Returns your current Site Manifest text and Terminology Map. Use this to audit the semantic layer before making changes.
get_page_markdown
Returns the compiled M2M Markdown payload for a specific post or page. Parameters: post_id (integer).
Example response (abbreviated):
{
"post_id": 42,
"title": "How Invoice Automation Works",
"markdown": "---\ntitle: How Invoice Automation Works\n---\n\n# How Invoice Automation Works\n\nAcme Corp is a B2B SaaS...",
"source": "compiled",
"faithfulness_score": 92,
"last_compiled": "2026-03-15T10:30:00+00:00"
}
The source field tells you how the payload was generated: compiled (Precision Parser), bypass (manual override), or auto (automated conversion).
In a compatible MCP client (Claude Desktop, Cursor, or a custom agent using the MCP SDK), connect to the endpoint using your Application Password. The tools will appear automatically.
MCP Configuration Example:
{
"mcpServers": {
"llm-override": {
"url": "https://yoursite.com/wp-json/llm-override/v1/mcp/sse",
"headers": {
"Authorization": "Basic base64(user:application_password)"
}
}
}
}
get_site_coverage
Returns the site-wide GEO coverage metrics: total posts, posts with compiled payloads, posts with bypasses, and posts using automated conversion. This gives you a single data point that tells you how much of your site has been explicitly optimized for AI systems.
11.5 Write Operations
Write operations modify your site’s GEO configuration programmatically.
set_site_manifest
Replaces your entire Site Manifest text. Parameters: manifest (string).
set_terminology_map
Replaces the entire Terminology Map. Parameters: terms (array of {term, replacement} objects).
set_page_bypass
Injects a fully custom M2M Markdown payload for a specific post, bypassing both the automated conversion and the Precision Parser. Parameters: post_id (integer), markdown (string).
11.6 Use Cases
Multi-site agency management
Use case: An orchestration agent that monitors your site for content updates automatically generates and deploys Precision Parser compilations. Example workflow:
Ask AI Buttons
This feature requires LLM Override Pro or Agency license.
11b.1 What Are Ask AI Buttons?
Ask AI Buttons is a Pro module that renders a visual component at the bottom of your post content, inviting visitors to ask AI platforms about your article. Each button opens the corresponding AI platform with a pre-built prompt that includes your M2M-optimized URL (?view=raw).
This creates a self-reinforcing GEO loop: your visitor reads your article, clicks “Ask ChatGPT,” and ChatGPT fetches your optimized M2M payload to answer the question.
Supported AI Platforms:
| Platform | Method |
|---|---|
| Perplexity | URL pre-fill |
| ChatGPT | URL pre-fill |
| Microsoft Copilot | URL pre-fill |
| Gemini | URL pre-fill |
| Claude | URL pre-fill |
| Grok | URL pre-fill |
All 6 providers use native URL-based prompt pre-filling — no clipboard hacks, no JavaScript clipboard API calls.
11b.2 Configuration
Go to LLM Override Pro → Ask AI Buttons to configure the module.
| Setting | Default | Description |
|---|---|---|
| Enable/Disable | Enabled | Master toggle for the entire module |
| Post Types | Posts, Pages | Which post types display the buttons |
| Providers | All 6 enabled | Choose which AI platforms to show |
| CTA Label | “Ask AI about this article:” | The heading text above the buttons |
| Prompt Template | See below | The prompt sent to AI platforms |
11b.3 Prompt Template
The prompt template supports dynamic placeholders resolved at render time:
| Placeholder | Resolves to |
|---|---|
{title} | Post title |
{url} | Canonical permalink |
{m2m_url} | M2M endpoint URL (permalink?view=raw) |
{site_name} | Your site’s name |
{excerpt} | Post excerpt or auto-generated 200-char snippet |
{author} | Post author display name |
The template deliberately points AI models to the {m2m_url} endpoint — ensuring they read your structured Markdown payload.
Prompt length limit: 800 characters maximum (enforced server-side).
11b.4 Visual Customization
Presets: outlined (default), solid, pill, ghost, minimal
Button sizes: sm, md (default), lg
Color scheme: Auto (follows prefers-color-scheme), Light, Dark
All colors are customizable via CSS Custom Properties under the .llm-override-ask-ai namespace.
11b.5 Branding
Pro license: “Powered by LLM Override” branding is always visible.
Agency license: Branding can be toggled off for white-label client deployments.
1. Agent reads get_site_coverage → identifies 12 new posts without compiled payloads.
2. For each post: Agent reads get_page_markdown → reviews the auto-generated output.
3. Agent sends post HTML to its own AI provider for optimal compilation.
4. Agent writes compiled Markdown back via set_page_bypass.
5. Agent logs the operation and moves to the next site.
CI/CD integration
Use case: A CI/CD pipeline that detects a merge to the main branch of a content repository automatically sends updated Markdown to all WordPress sites via MCP.
Scaling across sites
Each site has its own Application Password. Your central agent stores these credentials securely and iterates through all sites, auditing compliance, updating manifests, and reporting exceptions — no human touches any WordPress dashboard.
Developer Reference
12.1 Plugin Architecture
LLM Override is built entirely on the WordPress Plugin API. It does not modify core files, does not write to the filesystem, and does not require custom database tables for its core operation (Shadow Analytics uses a custom table for log storage).
All integration points are standard WordPress hooks — filters and actions — that allow developers to extend or modify behavior without touching plugin code.
12.2 Filters (Content Modification Hooks)
Filters let you modify data as it passes through the M2M pipeline.
llm_override_markdown_output
Filters the final Markdown string after all processing stages (conversion, frontmatter, terminology standardization) but before delivery. Use this strictly for compliance metadata, legal disclaimers, or audit timestamps — not for injecting substantive content.
add_filter( 'llm_override_markdown_output', function( $output, $post ) {
// Append a compliance verification timestamp
$verified_date = get_the_modified_date( 'Y-m-d', $post );
$output .= "\n\n---\n";
$output .= "*Content verified as of {$verified_date}. ";
$output .= "This document is subject to the terms at https://yoursite.com/terms/.*\n";
return $output;
}, 10, 2 );
Parameters:
$output(string) — Complete Markdown payload ready for delivery.$post(WP_Post) — The post object being served.
⚠️ Warning: Adding content that does not exist in the visible HTML page constitutes cloaking and violates search engine guidelines. Limit modifications to compliance timestamps, legal disclaimers, and audit metadata. See §12.8.
llm_override_frontmatter
Modify the YAML frontmatter array before it is serialized into the Markdown document. Use this to add custom metadata fields that AI crawlers should consume.
add_filter( 'llm_override_frontmatter', function( $frontmatter, $post ) {
$frontmatter[] = 'industry: construction';
$frontmatter[] = 'region: EU';
return $frontmatter;
}, 10, 2 );
llm_override_bypass_markdown
Modify a manual bypass payload before delivery. This fires only when a post has a custom bypass — not during automated conversion.
llm_override_llms_txt_lines
Modify the lines of the /llms.txt output before it is sent to the crawler. Useful for adding custom URLs or sections.
llm_override_clean_special_chars
Controls whether the post-conversion Unicode sanitization stage runs. Return false to preserve BOM markers, Zero-Width Spaces, and other Unicode artifacts in the output.
12.3 Actions (Event Hooks)
Actions let you execute custom code when specific events occur in the M2M pipeline.
llm_override_bot_detected
Fires every time a bot is detected by any interception layer. Use this for operational notifications — alerting your team when specific AI crawlers access critical pages.
Parameters:
$post_id(int) — The post ID being accessed (0 if non-singular).$bot_slug(string) — The matched bot identifier (e.g.,ChatGPT-User,Stealth-Bot).$bot_type(string) — The bot category:training,query,agent, orunknown.$is_singular(bool) — Whether the request is for a singular post/page.
llm_override_intercept_request
Fires when a request explicitly uses ?view=raw. Useful for logging or triggering side effects on explicit M2M requests.
llm_override_serve_llms_txt / llm_override_serve_llms_full_txt
Fire immediately after the respective manifest endpoint is served.
12.4 Constants
LLM_OVERRIDE_VERSION — The current plugin version string ('1.1.6'). Use this in your extensions to check compatibility.
12.5 Extension Pattern
The recommended pattern for building extensions is a standalone WordPress plugin that hooks into LLM Override’s public API. Example:
wp_json_encode( [ 'text' => sprintf( '🤖 %s read %s', $bot_slug, get_permalink( $post_id ) ) ] ),
'headers' => [ 'Content-Type' => 'application/json' ],
'blocking' => false,
] );
}, 10, 4 );
The defined( 'LLM_OVERRIDE_VERSION' ) check ensures your extension does nothing if LLM Override is not active — preventing fatal errors and ensuring clean activation/deactivation.
12.6 Compliance & Responsible Use
LLM Override’s hooks are designed for legitimate extension: compliance disclaimers, operational notifications, metadata enrichment, CRM integrations, and translation layer customization.
Supported uses:
- Adding compliance timestamps, legal disclaimers, or audit metadata to payloads
- Injecting YAML frontmatter fields for internal taxonomy, content tier, or language metadata
- Extending the
/llms.txtmanifest with custom sections - Triggering operational notifications (Slack, CRM, webhook) on bot detection events
The following uses constitute cloaking and violate search engine guidelines:
- Modifying the Markdown payload to include substantive content not present in the visible HTML
- Injecting keywords, claims, or promotional text into the M2M output that human visitors never see
- Serving materially different content to AI crawlers than what is served to human browsers
Automated guardrail: LLM Override Pro includes a Content Faithfulness Score (Jaccard similarity). A score below 90% is flagged as a cloaking risk in the GEO Analytics dashboard. Scores below 70% trigger a visual warning on the post’s metabox.
Compatibility & Hosting
13.1 Hosting Environments
LLM Override is verified compatible with every major WordPress hosting platform:
| Host | Status | Notes |
|---|---|---|
| Generic shared hosting | ✅ Verified | Works out of the box |
| WP Engine | ✅ Verified | Cache bypass confirmed |
| Kinsta | ✅ Verified | Works with Kinsta’s Nginx rules |
| Cloudways | ✅ Verified | Varnish bypass confirmed |
| WordPress VIP | ✅ Verified | No filesystem writes required |
| Pantheon | ✅ Verified | Works on both Live and Dev environments |
| Flywheel | ✅ Verified | Cache bypass confirmed |
| SiteGround | ✅ Verified | SG Optimizer cache bypass confirmed |
The plugin does not write to the filesystem, does not require custom server configuration, and does not depend on any server-side software beyond PHP and WordPress.
13.2 Caching Plugins
LLM Override includes automatic cache bypass for M2M requests. The following caching plugins are explicitly supported and tested:
- WP Rocket
- LiteSpeed Cache
- W3 Total Cache
- WP Super Cache
- Autoptimize (excluded from M2M endpoints)
If you use a caching plugin not listed here, it will likely work without configuration — the plugin sets standard cache bypass headers (Cache-Control: no-store, DONOTCACHEPAGE) that all well-behaved caching plugins respect.
13.3 Page Builders
LLM Override works with all major page builders. The content pipeline uses recursive regex-based extraction to preserve shortcode inner content rather than stripping it — ensuring accurate Markdown even on pages built with shortcode-heavy builders:
| Builder | Status | Method |
|---|---|---|
| Gutenberg (Block Editor) | ✅ Verified | Native block content extraction |
| Classic Editor | ✅ Verified | Direct HTML conversion |
| Elementor / Elementor Pro | ✅ Verified | Shortcode recursive extraction |
| WPBakery Page Builder | ✅ Verified | Shortcode recursive extraction |
| Divi Builder | ✅ Verified | Shortcode recursive extraction |
| Beaver Builder | ✅ Verified | Shortcode recursive extraction |
13.4 SEO Plugins
LLM Override synchronizes with the following SEO plugins for noindex detection:
- Yoast SEO
- Rank Math
- SEOPress
- All in One SEO (AIOSEO)
Pages marked as noindex in any of these plugins are automatically excluded from the /llms.txt manifest and from M2M interception.
13.5 Multisite
LLM Override supports WordPress Multisite installations. Each site in the network operates independently with its own Site Manifest, Terminology Map, and analytics data. Network-wide configuration is not currently supported — each site is configured individually.
FAQ & Troubleshooting
14.1 Verifying the Plugin is Working
The fastest way to verify LLM Override is operational:
- Open any published post or page on your site.
- Add
?view=rawto the URL (e.g.,https://yoursite.com/sample-page/?view=raw). - You should receive a plain-text Markdown response with YAML frontmatter.
Alternatively, use cURL from the command line:
curl -s https://yoursite.com/sample-page/?view=raw | head -20
You should receive a plain-text Markdown response. If you receive HTML, the interceptor is not firing — check the Engine Status on the Dashboard.
14.2 Frequently Asked Questions
Q: I see HTML instead of Markdown when I visit ?view=raw.
A: The M2M interception is not active. Go to LLM Override → Settings and verify that Enable M2M Interception is checked. If it’s checked, flush your permalink structure at Settings → Permalinks → Save Changes.
Q: My caching plugin is serving a cached HTML version of the ?view=raw endpoint.
A: LLM Override sets cache bypass headers automatically, but some aggressive caching configurations may override them. Add a manual exclusion rule in your caching plugin for URLs containing ?view=raw.
Q: The Site Manifest doesn’t appear in my /llms.txt output.
A: Go to LLM Override → Semantic Rules and verify that the Site Manifest field contains text. An empty manifest is not injected. Note: the Site Manifest only appears in the /llms.txt and /llms-full.txt discovery endpoints, not in individual per-page payloads.
Q: My Terminology Standardization rules aren’t applying.
A: Verify your terminology entries at LLM Override → Semantic Rules → Terminology Standardization. Each entry needs both a source term and a replacement. Matching is case-insensitive.
Q: The /llms.txt endpoint returns a 404.
A: Flush your rewrite rules at Settings → Permalinks → Save Changes. The /llms.txt rewrite rule is registered on plugin activation, but some hosting environments require a manual flush.
Q: Does this affect my Google rankings?
A: No. All M2M responses include the X-Robots-Tag: noindex header, which instructs search engine crawlers to ignore the Markdown content. Your HTML pages — the ones Google actually ranks — are completely untouched.
Q: Is this cloaking?
A: No. Cloaking means showing fundamentally different content to search engines versus humans. LLM Override does the opposite: it ensures the AI receives the same factual content that humans see, translated into a format the AI can process without error. The Content Faithfulness Score mathematically verifies this parity.
Q: What happens if I deactivate the Pro addon but keep the Free plugin?
A: The Free plugin continues to work independently. Any compiled Precision Parser payloads will no longer be served (they require Pro to be active), but the automated HTML → Markdown conversion takes over automatically. No data is lost — when you reactivate Pro, the compiled payloads are available again.
14.3 Troubleshooting Checklist
| Symptom | Cause | Fix |
|---|---|---|
?view=raw returns HTML | Engine disabled or permalinks stale | Enable engine + flush permalinks |
Cached HTML on ?view=raw | Aggressive page cache | Add ?view=raw exclusion rule |
| Empty Markdown body | Post content is shortcode-only | Verify recursive extraction is working |
| Stealth detection false positives | Dev/localhost environment | Automatic — localhost is excluded. Disable in settings if persistent |
| JSON-LD not injecting | Post excluded or SEO plugin conflict | Check per-post exclusion checkbox and collision prevention |
/llms.txt returns 404 | Rewrite rules not flushed | Settings → Permalinks → Save Changes |
| Terminology map not applying | Cache not invalidated | Edit + save any post to trigger cache flush |
Changelog
LLM Override (Free)
1.1.7 — 2026-04-12
- Removed: Terminology Standardization engine. This feature introduced semantic divergence between HTML and M2M payloads, contradicting our core principle of content faithfulness. LLM Override now guarantees strict 1:1 parity between what humans read and what machines receive.
1.1.6 — 2026-04-11
- Compliance: Full WP.org Plugin Check pass — zero errors, zero warnings. Normalized all line endings to LF, enforced proper escaping, and removed prohibited files.
- Fix: Safe shortcode extraction via recursive regex preserves inner textual content from Divi, WPBakery, and Elementor shortcode structures.
- Enhancement: Intelligent 12-hour transient caching for
/llms.txtand/llms-full.txtwith automatic invalidation on post publish, update, or trash.
1.1.0
- Feature: Terminology Standardization Engine. M2M Engine now globally replaces legacy forbidden terms logic with a structured
{from → to}Terminology Dictionary to ensure Content Faithfulness and compliance. - Enhancement: Migrated global term filtering logic to comply with accurate Source Attribution guidelines.
- Tweak: Version bump for plugin parity and architectural refactoring ahead of Sprint 22.3.
1.0.5
- New: RAG JSON-LD Grounding Engine. Automatically injects semantic
TechArticleschema markup into the HTML<head>containing the M2M translated content. - Enhancement: Complete architectural refactoring of the Content Pipeline. HTML-to-Markdown conversion is now centralized natively inside
LLM_Override_Content_Pipeline::convert_to_markdown(). - Fix: Developer Experience (DX) bypass for Stealth Bot Detection. IDE headless browsers and Localhost environments (
127.0.0.1,.local) will no longer trigger false positive M2M interceptions.
1.0.4
- Fix: Added deep exclusions for performance auditing tools (
Chrome-Lighthouse,GTmetrix,PingdomPTST) to prevent them from receiving Markdown. - Fix: Added extended SEO bots exclusions (
AhrefsBot,SemrushBot,Applebot,DotBot,MJ12bot) to the whitelist.
1.0.3
- Fix: Critical Indexing Hotfix. Excluded honest search engine crawlers (like Googlebot and Bingbot) from being falsely flagged by the Stealth Detection Engine.
1.0.2
- Fix: Changed the Content-Type header from
text/markdowntotext/plainto ensure strict AI URL ingesters (like Google NotebookLM) accept the M2M endpoints as valid sources. - Tweak: Restored the
X-Robots-Tag: noindexheader to prevent search engine SERP pollution.
1.0.1
- New: Passive Yoast SEO Compatibility Checker. Intercepts
llms.txtoverriding rules and Bot Blocker restrictions from Yoast Premium. - Fix: Added missing
Content-Type: text/markdownheader to the M2M payload response.
1.0.0 — Initial Release (March 2026)
- Active M2M Interceptor engine with structured HTML-to-Markdown conversion.
- Global Semantic Injection: Forbidden Terms and Corporate Manifest via YAML frontmatter.
- Dynamic
/llms.txtand/llms-full.txtendpoint generation. - Algorithmic Discoverability via
<link rel="alternate">tag androbots.txtannouncement. - Native SEO integrations with Yoast SEO, Rank Math, SEOPress, and AIOSEO.
- Native per-post exclusion and payload override via WordPress editor metabox.
- Admin Dashboard with Shadow Analytics Lite (M2M bot hit counters, GDPR-compliant IP hashing).
- View as AI Admin Bar button for empirical M2M payload verification.
- Before vs. After live HTML-to-Markdown simulation in the Dashboard.
- Passive bot detection for 52 known AI crawlers across 3 behavioral categories.
- HTTP Content Negotiation support (
Accept: text/markdownheader). - Enterprise Unicode sanitization (BOM, Zero-Width Spaces, Non-Breaking Spaces, Soft Hyphens).
- AJAX-driven Transient caching (12-hour TTL) for all M2M endpoints with manual flush.
- 14 documented action/filter hooks for developer extensibility.
- Full compliance with WordPress coding standards: 0 Plugin Check errors, 0 warnings.
LLM Override Pro
1.1.7 — 2026-04-12
- Removed: Terminology Standardization engine and all associated UI panels to ensure absolute 1:1 content parity.
- Removed: Terminology and “terms purged” KPIs from the Shadow Analytics dashboard.
1.1.3 — 2026-04-06
- New: Ask AI Buttons Module. Renders AI provider buttons (Perplexity, ChatGPT, Copilot, Gemini, Claude, Grok) on singular pages with native URL-based prompt pre-filling.
- New: 5 visual presets (outlined, solid, pill, ghost, minimal), full color control via CSS Custom Properties, dark mode support.
- Fix: Critical meta key disconnection between Free metabox and Pro Copilot metabox resolved.
1.1.0 — 2026-03-23
- Refactored: Terminology Standardization Engine globally replaces legacy
forbidden_termslogic with a structured{from → to}Terminology Dictionary. - Refactored: Copilot Metabox payload compilation updated to enforce Content Faithfulness parameters under the new terminology map.
- Improved: Complete data migration to a relational table schema, securing semantic integrity.
- Improved: MCP Audit Tools and Actions refactored to support semantic parity checks over the new data structure.
1.0.4 — 2026-03-20
- Added: RAG JSON-LD Generator integrated into the AI Copilot. (Extracts FAQPage, Article, HowTo, etc.).
- Added: Pro JSON-LD Controller injecting high-value Schema autonomously into the frontend.
- Added: Support for
response_format: jsonacross OpenAI, DeepSeek, and OpenRouter for strict schema generation. - Added: Tabbed interface in AI Copilot Metabox to separate M2M Content generation from RAG Schema payload.
- Improved: Batch processor prompt hydration now supports
{{BRAND_ENTITIES}}and{{FORBIDDEN_TERMS}}variables.
1.0.3 — 2026-03-18
- Refactored: Unified GEO Analytics Dashboard — merged dual Client/Technical View into a single cohesive layout.
- Fixed: Race condition between competing change event listeners on the view toggle.
- Fixed: AJAX race condition when rapidly switching date periods — implemented sequence token to discard stale responses.
1.0.2 — 2026-03-16
- Added: Standard MCP JSON-RPC 2.0 Router implementing the Model Context Protocol specification 2025-03-26 (Streamable HTTP transport).
- Added: Configuration tabs for Claude Desktop and Cursor with correct JSON config formats.
- Fixed:
sanitize_title()was silently converting underscores to hyphens in tool names. Replaced withsanitize_key().
1.0.1 — 2026-03-13
- Added: Automated OTA Updates powered by the ArrayPress Lemon Squeezy Auto-Updater library.
- Added: Internal license synchronization between PRO architecture and the third-party updater payload.
- Improved: B2B Clean UI override (native updater row on
/wp-admin/plugins.phpis now intercepted and hidden).
