LLM Override Documentation

Complete technical reference for configuring and extending LLM Override — the M2M translation engine for WordPress.

Introduction

0.1 What is LLM Override?

LLM Override is a B2B trust infrastructure plugin for WordPress that ensures perfect content accessibility for AI systems.

When ChatGPT, Claude, or Perplexity answer a question about your brand, they don’t show your webpage — they synthesize it. They crawl raw HTML, which is a format built for humans, full of scripts, navigation menus, and layout tags. If they cannot parse the structure accurately, they hallucinate the gaps. Traditional SEO cannot fix this.

LLM Override provides a Machine-to-Machine (M2M) translation layer before WordPress renders any HTML. It responds to compliant AI crawlers with a clean, structured Markdown document containing exactly the factual truth of your site — ensuring 100% faithfulness to your visible content without the UI noise.

0.2 The Problem: Why HTML Breaks AI Translation

HTML was designed for browsers, not for AI models.

When an AI crawler visits your site, it receives the same file a human browser does: cookie banners, JavaScript bundles, inline styles, and your actual content — all mixed together. The AI has to guess what matters. It makes that guess at scale, across thousands of pages, with no ability to ask for clarification.

The result is structural parsing failure. The AI fills gaps with the most statistically plausible text it has seen during training, which may be outdated, inaccurate, or lifted from a competitor’s page.

The three most common failure modes:

Outdated facts: The AI describes a product you discontinued two years ago.
Terminology mismatch: The AI uses outdated names for your features or pricing tiers.
Missing context: The AI doesn’t know what makes your infrastructure different, so it defaults to generic industry statements.

A sitemap tells an AI where your URLs are. LLM Override fixes the root issue by standardizing the exact document the AI receives, ensuring verifiable facts and zero hallucinations.

0.3 GEO Compliance vs. SEO

Search Engine Optimization (SEO) is built around one assumption: a human will see the results.

Google crawls your page, ranks it, and shows a link. The entire system is designed to earn clicks. Generative Engine Optimization (GEO) operates under a different assumption: no human will see the raw results. An AI model reads your content, synthesizes an answer, and presents that answer directly to the user. Your page is never shown.

Dimension	SEO	GEO Compliance
Target	Googlebot, Bingbot	GPTBot, ClaudeBot, PerplexityBot
Output	A ranked link	A synthesized answer
Human sees	Your page (if they click)	The AI’s mathematical summary of your page
Control mechanism	Keywords, backlinks, metadata	Structured Markdown, Terminology Standardization
Failure mode	Low ranking	Brand hallucination / Fact corruption
Outcome	Traffic generation	M2M Content Faithfulness

SEO and GEO are not in conflict. They simply require different infrastructure. LLM Override is the GEO Compliance layer.

0.4 How M2M Translation Works

M2M stands for Machine-to-Machine. It describes the communication channel between an AI crawler and your server — a channel that exists entirely outside the browser, outside your theme, and outside any caching layer.

Here is the exact sequence of events when an AI crawler visits a page protected by LLM Override:

Step 1 — Content Negotiation Signal
LLM Override adds a <link rel="alternate" type="text/markdown"> tag to the <head> of every page on your site. This is a standard Content Negotiation signal that tells compliant AI crawlers: “A machine-readable version of this page exists at this URL.”

Step 2 — Crawler discovery
The AI crawler reads your page’s <head>, finds the alternate link, and follows it. This is automatic behavior for crawlers that comply with the Content Negotiation standard.

Step 3 — Translation Request
The crawler appends ?view=raw to your URL and sends a new request. LLM Override translates this request at the WordPress routing layer — before any theme template loads and before any caching plugin serves a stale response.

Step 4 — M2M Translation Delivery
Instead of HTML, the crawler receives a clean Markdown document containing:

A YAML frontmatter block with your page title, canonical URL, and last modified date
Your Site Manifest — a block of verifiable brand facts
Your page content, stripped of scripts, ads, layout, and UI noise
With Terminology Standardization already applied to ensure naming consistency

Step 5 — Human experience unchanged
Human visitors follow none of this path. They never receive Markdown, and their experience — including page speed, theme rendering, and caching — is completely untouched.

0.5 The Two-Plugin Architecture

LLM Override uses a two-plugin model.

The Free plugin is a complete, standalone GEO engine. Every core feature — M2M translation, Markdown delivery, Site Manifest, Terminology Standardization, /llms.txt integration, bot detection, Shadow Analytics, and per-post controls — works without a license key.

The Pro addon is a separate plugin at llmoverride.com that extends the Free engine with capabilities designed for large B2B sites and agencies:

M2M Precision Parser (Copilot) — per-post strict 1:1 structural translations using an LLM to generate perfect Markdown, plus RAG JSON-LD extraction
Batch Compilation Engine — site-wide background compilation via Action Scheduler
GEO Analytics — GDPR-compliant IP hashing telemetry, bot fingerprinting, and Content Faithfulness Score checking
Autopilot llms.txt — AI-drafted site manifesto strictly grounded in your actual content
Agency MCP Server — expose a full Model Context Protocol endpoint for remote agent orchestration

0.6 What Does NOT Change for Human Visitors

Nothing.

LLM Override operates on a separate delivery channel that human browsers never trigger.

Specifically:

Your theme renders exactly as it does today
Your page speed is unaffected — M2M requests bypass your cache, but human requests continue to be served from cache normally
Your SEO rankings are unaffected — LLM Override adds X-Robots-Tag: noindex to all Markdown responses, telling search engine bots to ignore them
Your URLs do not change — there is no parallel site or redirect
Your existing plugins continue to work seamlessly

The only thing that changes is the mathematical precision with which AI models ingest your verified facts.

Getting Started

1.1 System Requirements

Before installing LLM Override, verify your environment meets these requirements:

Requirement	Minimum	Recommended
WordPress	6.0	Latest stable
PHP	7.4	8.1 or higher
MySQL / MariaDB	5.7 / 10.3	8.0 / 10.6
Hosting type	Any (shared, VIP, managed)	Any — including read-only filesystems
SSL	Recommended	Required for Agency MCP endpoints

LLM Override does not write to the filesystem. It uses only the WordPress Options API, Transients API, and standard Rewrite Rules — ensuring verifiable operation on every enterprise hosting environment, including Kinsta, WP Engine, WordPress VIP, and Cloudways.

1.2 Installing the Free Plugin

The Free plugin is available directly from the WordPress.org Plugin Directory.

Option A — Install from the WordPress dashboard:

Go to Plugins → Add New Plugin in your WordPress admin.
Search for LLM Override.
Click Install Now, then Activate.

Option B — Manual upload:

Download the .zip file from wordpress.org/plugins/llm-override.
Go to Plugins → Add New Plugin → Upload Plugin.
Select the .zip file and click Install Now, then Activate.

No configuration is required after activation. The M2M translation engine starts working immediately to ensure content accessibility for AI systems.

1.3 Installing the Pro Addon

The Pro addon is a separate plugin available at llmoverride.com. It requires the Free plugin to be installed and active.

Purchase a Pro license at llmoverride.com.
Download the llm-override-pro.zip file from your account.
Go to Plugins → Add New Plugin → Upload Plugin.
Select the .zip file and click Install Now, then Activate.
Go to LLM Override → License and enter your license key.

Important: Do not deactivate or delete the Free plugin. The Pro addon hooks into the Free engine’s architecture as an extensibility layer. Both plugins must be active simultaneously.

1.4 First 5 Minutes: What to Do After Activation

The plugin works out of the box, but three crucial actions will establish your baseline GEO Compliance.

Action 1 — Verify the engine is running (30 seconds)

Go to LLM Override → Dashboard. You will see the engine status indicator. If it shows active, the M2M translation layer is functioning. If it shows a warning, the dashboard will tell you exactly what requires adjustment.

Action 2 — Define your Site Manifest (3 minutes)

Go to LLM Override → Semantic Rules. In the Site Manifest field, write 3–5 sentences establishing the indisputable facts of your brand: who you are, what infrastructure you provide, who you serve, and compliance data. This text will anchor every AI payload from this moment forward.

				
					Acme Corp is a B2B SaaS company founded in 2019. We build invoice automation software 
for construction companies with 10–200 employees. Our core product reduces invoice 
processing time from 4 days to 6 hours. We are headquartered in Madrid, Spain, and 
serve clients across the EU and Latin America. We are SOC2 Type II compliant.

Action 3 — Verify the M2M Payload (1 minute)

Open any post or page on your site. In the WordPress Admin Bar at the top, click View as AI. You will see the exact Markdown document that an AI crawler receives from that page — including your Site Manifest in the YAML frontmatter and any Terminology Standardization applied perfectly.

That is your baseline. Everything from here is mathematical refinement.

1.5 The Dashboard: Understanding Your KPIs

The LLM Override Dashboard is your compliance command center. Here is what each metric means and why it matters.

Engine Status
A live health check confirming the M2M translation layer is active and the ?view=raw endpoint is correctly handling content negotiation.

Bot Hits (Total)
The cumulative number of M2M translations delivered since activation. Each hit represents an AI crawler reading your optimized Markdown instead of unstructured HTML.

Intercepted URLs
The number of distinct pages successfully processed and delivered to AI crawlers.

Before vs. After Simulation
A diagnostic tool that allows you to paste any URL from your site and compare the raw HTML an AI would receive without the plugin versus the strict, token-optimized Markdown it receives with the plugin active.

Health Checklist
A list of configuration items the plugin audits automatically:

Site Manifest: set or empty
Terminology Map: configured or not
/llms.txt: generation integrity
robots.txt directive: active or missing
Cache bypass: functioning for your hosting environment

Any item showing a warning has a direct link to the setting that resolves it.

M2M Interception Engine

2.1 How the Interceptor Works

The M2M Interception Engine is the core infrastructure of LLM Override. Everything else — Site Manifest, Terminology Standardization, /llms.txt, and GEO Analytics — sits on top of it.

Most tools for AI optimization generate a list of URLs that tell a crawler where your content is. The M2M engine defines how the crawler processes your content.

When an AI crawler requests a page on your site, LLM Override routes that request at the initial WordPress layer (the template_redirect hook) — before any template loads, before any database query runs for theme output, and before any page builder renders its layout logic. The HTML WordPress would normally generate is never produced for the machine. Instead, the engine responds directly with a structured Markdown translation, sets the correct HTTP headers (Content-Type: text/plain; charset=utf-8), and cleanly exits the process.

The result: the crawler receives clean, dense, semantically structured facts. Your theme never runs. Your server does less work, and the AI parses the data flawlessly.

2.2 The Four-Layer Detection Cascade

LLM Override uses a four-layer detection cascade to identify AI crawler requests. Each layer operates independently — if any layer triggers, the engine serves the M2M payload.

Layer	Method	Description
1	`?view=raw`	Explicit M2M endpoint — advertised via `<link rel="alternate">` in `<head>`
2	User-Agent	52 known AI crawler strings across 3 categories: `training`, `query`, `agent`
3	Content Negotiation	`Accept: text/markdown` HTTP header
4	Stealth Fingerprinting	Detects browser-like UAs missing `Sec-Fetch-*` headers (localhost excluded)

Layer 1 — The ?view=raw parameter

LLM Override adds the following tag into the <head> of every singular post and page:

				
					<link rel="alternate" type="text/markdown" href="https://yoursite.com/your-page/?view=raw">

This is a standard HTML Content Negotiation signal. AI crawlers that follow this standard — including GPTBot, ClaudeBot, and PerplexityBot — read this tag, identify the alternate machine-readable version, and fetch it automatically. Human browsers ignore this tag entirely.

Mechanism 2 — Passive bot detection via User-Agent

For AI crawlers that don’t proactively follow Content Negotiation but are known to the plugin, LLM Override leverages safe User-Agent matching. When a request arrives from a recognized bot, the engine intercepts it automatically and delivers Markdown.

LLM Override includes a dictionary of over 50 known AI crawlers across 4 behavioral categories:

Category	Description	Examples
Training	Bots harvesting data for model training	CCBot, Common Crawl
Query	Real-time RAG fetch during inference	GPTBot, ClaudeBot, PerplexityBot
Discovery	Crawlers mapping site structure and manifests	Amazonbot, Applebot
Scraping	Unclassified AI traffic	Bytespider, DataForSEO

2.3 HTML → Markdown Translation

When the engine processes a request, it retrieves the raw content of the requested post or page from the WordPress database and runs it through a deterministic translation pipeline.

Stage 1 — Content extraction
The engine extracts the post content directly from the database, bypassing the theme. This isolates the factual editorial content from navigation, sidebars, footers, cookie banners, and visual UI components.

Stage 2 — Element stripping
Before conversion, the engine strips elements that add noise to an AI payload: <script>, <style>, <iframe>, <object>, <embed>, empty elements, BOM markers, and Zero-Width Spaces (U+200B) that cause parser errors in ChatGPT and Claude.

Stage 3 — Structural Markdown conversion
The cleaned HTML is processed by a parser that converts standard HTML elements to their Markdown equivalents: headings become # markers, bold text becomes **text**, lists become - bullets, links become [text](url). This is a strict 1:1 structural map.

Stage 4 — Data Standardization
The engine prepends a YAML frontmatter metadata block and your global Site Manifest, and then applies Terminology Standardization filtering to the core content to ensure all nomenclature matches your official brand dictionary.

Stage 5 — Delivery
The engine sets Content-Type: text/markdown; charset=UTF-8, disables all caching layers, adds X-Robots-Tag: noindex, and outputs the Markdown document.

2.4 YAML Frontmatter

Every M2M payload begins with a YAML frontmatter block — a structured metadata section placed at the very start of the Markdown document, delimited by ---.

A typical frontmatter block looks like this:

				
					---
title: How Invoice Automation Works
canonical_url: https://yoursite.com/how-it-works/
last_updated: 2026-03-01T14:22:00+00:00
plugin_version: 1.0.0
---

# How Invoice Automation Works

Acme Corp is a B2B SaaS company founded in 2019. We build invoice automation 
software for construction companies with 10–200 employees. Our core product 
reduces invoice processing time from 4 days to 6 hours.

Why YAML frontmatter matters

AI language models are specifically trained to parse and prioritize YAML frontmatter. Processing the frontmatter before the body gives you context control at the highest-priority position in the document.

Field	Source	Purpose
`title`	WordPress post title	Gives the AI the canonical name of the document
`canonical_url`	WordPress permalink	Anchors the content to a specific, citable source
`last_updated`	WordPress post modified date	Ensures verifiable payload freshness
`plugin_version`	Plugin version	Identifies the delivery infrastructure

The Site Manifest position
The Site Manifest does not sit inside the YAML frontmatter. It is positioned in the actual Markdown document immediately following the H1 heading. This ensures that the AI reads your indisputable brand facts as the very first sentences of every single page on your site.

2.5 Cache Bypass System

Caching plugins serve pre-rendered HTML from disk or memory, bypassing WordPress’s routing layer entirely. LLM Override solves this by detecting active caching systems at request time and programmatically disabling them — exclusively for M2M requests.

System	Method used
WP Rocket	`DONOTCACHEPAGE` constant
LiteSpeed Cache	`LSCACHE_NO_CACHE` constant + `X-LiteSpeed-Cache-Control: no-cache` header
W3 Total Cache	`DONOTCACHEPAGE` constant
FastCGI / Nginx	`Cache-Control: no-store, no-cache` header
Varnish / Cloudflare	`Cache-Control: no-store, no-cache` header

Human visitors are never affected. Only M2M requests bypass the cache — guaranteeing that crawlers always receive your latest published verifiable facts.

2.6 `X-Robots-Tag: noindex`

Every Markdown response served by LLM Override includes the following HTTP header:

				
					X-Robots-Tag: noindex

This header tells Googlebot and Bingbot to ignore the response entirely. Without it, Google could potentially see your Markdown pages as duplicate content, conflicting with the HTML versions that are your actual SEO-ranked pages.

Important: This applies only to search engine bots. AI language models (ChatGPT, Claude, Perplexity) do not obey X-Robots-Tag — they read the content regardless, which is the exact intended behavior. The header protects your SEO while enabling your GEO.

2.7 The Kill Switch

LLM Override includes a global kill switch that instantly disables the entire M2M translation layer.

Go to LLM Override → Settings → Enable M2M Interception and uncheck the checkbox. Instantly:

All ?view=raw requests will return standard HTML
The <link rel="alternate"> tag is removed from your pages’ <head>
The /llms.txt endpoint is deactivated
Shadow Analytics stops recording

Your WordPress site returns to standard behavior as if the plugin did not exist. Re-enabling the switch restores all functionality immediately — no data is lost.

Site Manifest

3.1 What the Site Manifest Does

The Site Manifest is a block of factual text that anchors your brand identity for AI crawlers. It is injected into the /llms.txt and /llms-full.txt discovery endpoints — the site-wide indexes that AI crawlers read to understand your entire content inventory. This ensures your verifiable brand facts are delivered at the site level without injecting them into individual per-page M2M payloads.

This is not a marketing tagline. This is a factual positioning instrument that anchors the AI’s understanding of your company before it processes the rest of the page content.

Think of it as a data commitment: you write exactly what is true, and the AI uses exactly that when answering questions about your brand.

3.2 How to Write an Effective Site Manifest

The Site Manifest is configured at LLM Override → Semantic Rules → Site Manifest.

A high-impact Site Manifest typically contains five pieces of factual information:

1. Who you are (legal entity + founding date)
This is the anchor. AI models will cite this when asked “Who is [company]?” — if you don’t provide it, the AI invents one from its training data.

2. What you do (specific, not generic)
Avoid “we help companies grow.” That is marketing copy. Instead: “We build invoice automation software for construction companies.” This gives the AI a precise retrieval token for RAG queries.

3. Who you serve (ICP)
This stops the AI from generating answers that position you in the wrong market. “We serve construction companies with 10–200 employees in the EU and Latin America” is mathematically more useful than “we serve businesses worldwide.”

4. What you are not
This is the most underused tactic. If the AI is consistently confusing you with a competitor or a different type of product, state the negative clearly: “We are not a general-purpose ERP. We do not offer HR management features.” This creates a hard boundary in the AI’s reasoning.

5. Non-negotiable facts
Anything verifiable that, if stated incorrectly by the AI, would damage your credibility: compliance certifications (SOC2, ISO 27001), headquarters location, founding year, number of customers.

Full example:

				
					Acme Corp is a B2B SaaS company founded in 2019. We build invoice automation 
software for construction companies with 10–200 employees. Our core product 
reduces invoice processing time from 4 days to 6 hours. We are headquartered 
in Madrid, Spain, and serve clients across the EU and Latin America. We are 
SOC2 Type II compliant. We are not an ERP. We do not offer HR or payroll 
functionality.

Length: Keep it under 300 words. AI models have a context window — a long Site Manifest dilutes the per-page content. Be factual, not poetic.

llms.txt Standard

4.1 What Is `/llms.txt`?

The /llms.txt file is a machine-readable site index specifically designed for AI crawlers. It is the equivalent of robots.txt for language models — a single endpoint that lists all pages on your site with contextual metadata, giving any AI crawler an immediate, structured overview of your entire content inventory.

LLM Override generates your /llms.txt automatically. No manual configuration is required.

Access it at: https://yoursite.com/llms.txt

Here is the typical structure:

				
					# Acme Corp

> Acme Corp is a B2B SaaS company founded in 2019.
> We build invoice automation software for construction companies.

## Pages

- [How Invoice Automation Works](https://yoursite.com/how-it-works/?view=raw)
- [Pricing Plans](https://yoursite.com/pricing/?view=raw)
- [About Acme Corp](https://yoursite.com/about/?view=raw)
- [Contact Sales](https://yoursite.com/contact/?view=raw)

Each URL in the manifest points directly to the M2M endpoint (?view=raw) of that page — meaning an AI can follow any link and receive the optimized Markdown version instantly.

4.2 `robots.txt` Integration

LLM Override automatically adds the following line to your WordPress robots.txt output:

				
					# LLM Override - AI Content Index
Sitemap-LLM: https://yoursite.com/llms.txt

This is passive bot discovery: AI crawlers that check robots.txt before crawling your site will find the llms.txt reference and fetch it first, giving them a complete map of your content before visiting individual pages.

4.3 `<meta name="llms">` Tag

In addition to robots.txt, LLM Override injects a <meta> tag into the <head> of every page:

				
					<meta name="llms" content="/llms.txt">

This tag serves a different purpose from robots.txt and llms.txt. While those are site-level discovery mechanisms, the <meta> tag is a page-level signal. An AI crawler processing your HTML will find this tag in the document it is already reading, and can immediately discover the /llms.txt endpoint without a separate robots.txt fetch.

4.4 Post Type Inclusion Rules

By default, LLM Override includes all public post types in your /llms.txt index: posts, pages, and any custom post types registered with public = true.

You can control which post types are included at LLM Override → Settings → Post Type Selection. Unchecking a post type:

Removes it from /llms.txt
Disables M2M interception for that post type’s content
Hides the LLM Override metabox from that post type’s editor

4.5 SEO Plugin Synchronization

If a post is marked as noindex in your SEO plugin, LLM Override respects that configuration and automatically excludes it from the /llms.txt manifest and from M2M interception.

Supported SEO plugins:

Yoast SEO
Rank Math
SEOPress
All in One SEO (AIOSEO)

This synchronization is automatic. No additional configuration is needed. If a page shouldn’t appear in Google, it won’t appear in your AI manifest either.

Payload Precision

5.1 The Per-Post Metabox

Every post and page in WordPress has a dedicated LLM Override metabox in the editor sidebar. This is your per-page control center for the M2M payload.

The metabox provides the following controls:

Exclusion Controls (3 levels)

Checkbox	Scope	Effect
Exclude from LLM Override entirely	Master exclusion	Hides post from M2M delivery, `/llms.txt`, JSON-LD, and `<link alternate>`
Exclude from `/llms.txt`	Discovery only	Post is still M2M-accessible via `?view=raw`, but doesn’t appear in the manifest
Exclude from JSON-LD	Schema only	Disables JSON-LD Semantic Enclosure for this post while keeping everything else active

Use the master exclusion for pages that contain no content relevant to AI (login pages, thank-you pages, internal dashboards, or pages with content you explicitly do not want synthesized).

M2M Payload Override (Bypass)
A full text area where you can write a complete custom Markdown payload for this specific post. When this field contains content, the M2M engine will serve exactly that content — bypassing the HTML → Markdown conversion engine entirely.

Important: When using a bypass, you are responsible for the content of the payload. The engine will still enforce Terminology Standardization on the bypass content, but the HTML → Markdown conversion is skipped entirely. Make sure your content is valid Markdown — including proper heading hierarchy — before saving.

5.2 How the Priority System Works

The M2M engine uses a strict priority hierarchy for each post when building the payload:

M2M Precision Parser (Copilot) output — If a Copilot-compiled payload exists (Pro only), it is used. This is the highest-fidelity output available.
Manual bypass — If a user has written custom Markdown in the bypass field, it is used.
Automated conversion — If neither of the above exists, the engine extracts post content and performs the standard HTML → Markdown translation.

This hierarchy means you can use the automated engine for most pages and override only the ones that require high-precision control.

5.3 Customizing the llms-full.txt Excerpt

The automated /llms-full.txt endpoint includes a brief excerpt for each page in your site index. By default, this excerpt is generated automatically by stripping Markdown to plain text and truncating.

You can override this excerpt for any individual post in the metabox using the Custom llms.txt Description field. When this field contains text, the automated truncation is bypassed and your custom description appears in the full index instead.

Tip: Use the custom description for your highest-traffic pages — your homepage, pricing page, and core product pages. These are the pages most likely to appear in AI-generated summaries, and a precise excerpt in the full manifest gives the AI better context before it even visits the page.

5.4 The “View as AI” Button

The “View as AI” button is an empirical verification tool that shows you the exact Markdown document an AI crawler would receive for any page on your site.

You can access it in two ways:

Admin Bar: When viewing any post or page on the front end, the WordPress Admin Bar shows a “View as AI” button. Clicking it opens the ?view=raw endpoint in a new tab.
Post Editor: Inside the LLM Override metabox, a “Preview M2M Output” link opens the same endpoint.

What you see is exactly what the AI sees. There is no simulation or approximation. The button simply requests the same URL with the same parameter that an AI crawler would use.

Use this after every significant content edit to verify that:

Your Site Manifest appears correctly
Terminology Standardization rules are firing as expected
The heading hierarchy is clean
Code blocks, tables, and lists are properly formatted
No UI noise leaked into the payload

5.5 JSON-LD Semantic Enclosure

LLM Override automatically injects a JSON-LD <script type="application/ld+json"> block into the HTML <head> of every eligible post. This structured data enclosure contains the M2M-translated Markdown inside a Schema.org articleBody field — allowing discovery bots and RAG pipelines to ingest your clean content directly from the DOM without visiting the ?view=raw endpoint.

Schema type logic:

Condition	@type	Reason
Yoast SEO or Rank Math active	`TechArticle`	Avoids duplicate `Article` schema collision
No SEO plugin detected	`Article`	Standard schema type

Key properties injected:

headline — Post title
articleBody — Full Markdown content (truncated at 10,000 characters by default)
url — Canonical permalink
dateModified — Last modified timestamp in ISO 8601
author / creator — Post author display name

Controls:

Per-post: “Exclude from JSON-LD” checkbox in the metabox
Programmatic: llm_override_jsonld_enabled filter (return false to disable)
Max body length: llm_override_jsonld_max_length filter (default: 10,000)
Schema customization: llm_override_jsonld_schema filter

Shadow Analytics Lite

6.1 What Shadow Analytics Tracks

Shadow Analytics Lite is a lightweight bot telemetry layer built into the free version of LLM Override. It records every M2M translation request served by your site — giving you empirical visibility into AI crawler activity without requiring any external analytics tools.

Every time an AI crawler triggers the M2M endpoint, the engine logs the following data in the WordPress database:

Field	Description
`bot_name`	Identified crawler (e.g., “GPTBot”, “ClaudeBot”, “Stealth-Bot”)
`url`	The page URL that was requested
`timestamp`	Exact date and time of the request
`user_agent`	Full User-Agent string from the request
`trigger`	Detection method: `view_raw`, `user_agent`, `content_negotiation`, or `stealth_fingerprint`
`bot_type`	Bot category: `training`, `query`, `agent`, or `unknown`

6.2 Viewing the Log

Go to LLM Override → Analytics to view the log. The table shows the most recent requests, sorted by timestamp.

Filtering: You can filter by bot name to isolate traffic from a specific crawler (e.g., show only GPTBot activity) or by date range to analyze trends.

Export: The log can be exported as CSV for external analysis or client reporting.

6.3 Log Retention

Shadow Analytics Lite stores the most recent 1,000 events by default. Older entries are automatically pruned to prevent database bloat.

This limit is configurable via the filter llm_override_analytics_retention:

				
					add_filter( 'llm_override_analytics_retention', function() {
    return 5000; // Keep the last 5,000 events
});

6.4 GDPR Compliance

Shadow Analytics Lite was designed from the beginning to be GDPR-compliant by default:

No personal data is collected. The system records bot identifiers and URLs, not human visitor data.
No data is transmitted externally. All analytics data stays in your WordPress database — no third-party services, no external API calls, no tracking pixels.
No cookies are set. The analytics layer operates entirely at the server level.
IP addresses are not stored. The engine identifies bots by their User-Agent string, not by IP.

For enterprise organizations that need granular privacy controls, the Pro addon’s GEO Analytics module offers GDPR-compliant IP hashing for deeper traffic analysis — but even that module uses irreversible hashing (SHA-256 with rotating salts), ensuring no raw IP is ever stored.

6.5 Stealth Bot Detection

Beyond User-Agent matching, LLM Override includes a fingerprinting layer that detects AI crawlers disguised as regular browsers. Real browsers always send specific Sec-Fetch-* headers — bots impersonating browsers typically omit them.

Header	Expected in Real Browser	Missing = Bot Signal
`Sec-Fetch-Mode`	`navigate`	✅ Counted
`Sec-Fetch-Site`	`none` or `same-origin`	✅ Counted
`Sec-Fetch-Dest`	`document`	✅ Counted

A request with a browser-like User-Agent (Chrome, Firefox, Safari, Edge) that is missing 2 or more of these headers is classified as a stealth bot and served the M2M payload.

Safeguards:

Requests from 127.0.0.1, ::1, or hostnames ending in .local are always excluded from stealth detection to prevent false positives during local development.
Known search engine crawlers (Googlebot, Bingbot, Applebot, etc.) are whitelisted and never flagged regardless of their headers.
Stealth detection can be toggled off globally via LLM Override → Settings → Enable Stealth Bot Detection.

M2M Precision Parser

This feature requires LLM Override Pro.

7.1 What the Precision Parser Does

The M2M Precision Parser (formerly “AI Copilot”) is a per-post translation engine that uses an AI model to generate a strict 1:1 structural translation of your page content. Instead of the deterministic HTML → Markdown conversion performed by the free engine, the Precision Parser sends your content to a language model with a tightly constrained system prompt that enforces absolute factual faithfulness.

The key difference:

Engine	Translation Method	Best for
Free (Automated)	Deterministic HTML → Markdown conversion	Content that is already clean and well-structured
Pro (Precision Parser)	AI-driven structural translation with facts enforcement	Complex pages, visual builders, dynamic content

The Precision Parser does not add, remove, rewrite, or paraphrase any content. It performs a strict structural 1:1 translation: the same information, presented in perfect Markdown hierarchy, verified against the source document.

7.2 BYOK Architecture

The Precision Parser uses a Bring Your Own Key (BYOK) architecture. You provide your own API key from any supported AI provider, and LLM Override Pro sends translation requests directly to that provider’s API.

Supported providers:

Provider	Configuration location
OpenAI	LLM Override → Copilot → OpenAI API Key
Anthropic	LLM Override → Copilot → Anthropic API Key
DeepSeek	LLM Override → Copilot → DeepSeek API Key
OpenRouter	LLM Override → Copilot → OpenRouter API Key (supports 100+ models)

Your API key is encrypted at rest using WordPress’s AUTH_SALT constant and is never transmitted to any server other than the AI provider’s API endpoint. LLM Override does not proxy, store, or log your API requests.

7.3 How to Compile a Post

Open any post or page in the WordPress editor.
In the LLM Override metabox, click the “Compile with Copilot” button.
The plugin sends the post’s raw HTML content to your configured AI provider with a system prompt that enforces strict 1:1 structural translation.
The AI returns a Markdown document, which is stored as the compiled M2M payload for that post.
From this point forward, whenever an AI crawler requests this post, it receives the compiled payload — not the automated conversion.

You can re-compile at any time by clicking the button again. Each compilation overwrites the previous compiled payload.

7.4 The System Prompt

The Precision Parser uses a tightly constrained system prompt that enforces three rules:

1:1 structural fidelity: Every heading, paragraph, list, and table in the HTML must appear in the Markdown output. Nothing is added, removed, or paraphrased.
Facts-only operation: The AI is explicitly forbidden from generating any information not present in the source document.
Hierarchy preservation: The heading structure (H1 → H2 → H3) must be preserved exactly. No heading level changes.

Dynamic placeholders:

System prompts support placeholders that are replaced at compile time with the actual post data:

				
					{{post_title}}       → The post's title
{{post_url}}         → The post's canonical URL
{{post_content}}     → The post's raw HTML content
{{site_manifest}}    → The global Site Manifest text
{{terminology_map}}  → The active terminology rules

7.5 RAG JSON-LD Extraction

The Precision Parser includes an advanced capability: automated extraction of structured data from your page content into JSON-LD Schema.org markup, specifically engineered for Retrieval-Augmented Generation (RAG) systems.

When you compile a post, the Precision Parser doesn’t just create a Markdown translation — it also analyzes the content and generates JSON-LD structured data blocks that AI systems can use for precise entity extraction during query retrieval.

This feature is available directly in the per-post metabox editing interface. The generated JSON-LD is embedded within the compiled payload metadata, making it accessible to any AI system that ingests your M2M content.

7.6 Content Faithfulness Score

Every compiled payload is automatically scored using the Content Faithfulness Score — a Jaccard similarity metric that measures token-level deviation between the source HTML and the compiled Markdown.

The score ranges from 0 to 100:

Score	Interpretation	Action Required
80–100	High faithfulness — exact structural match	None. Payload is compliant.
60–79	Moderate deviation — structural differences present	Review the compiled output for omissions or repositioned content.
Below 60	Significant deviation — cloaking risk	Re-compile or use manual bypass. Investigate the cause of divergence.

This score is displayed in the metabox after every compilation and is also visible in the Batch Compilation Engine dashboard (Block 8).

The Faithfulness Score is the single most important compliance metric in LLM Override. A score below 60 is a mathematical indicator that the AI-generated Markdown does not faithfully represent the human-visible content — which constitutes operational cloaking risk.

Batch Compilation Engine

This feature requires LLM Override Pro.

8.1 How the Batch Engine Works

The Batch Compilation Engine is a background processing system that lets you compile M2M payloads for your entire site — or any filtered subset of it — without manually clicking “Compile” on each post.

The engine is built on WordPress’s Action Scheduler, which is the same battle-tested asynchronous job queue used by WooCommerce for processing orders at scale. This means batch jobs run reliably in the background, survive server restarts, and handle thousands of posts without timing out.

8.2 Running a Batch Job

Go to LLM Override → Batch Compile.

Select the post types to include (posts, pages, custom post types).
Optionally filter by status (published, draft) or by specific categories/tags.
Click “Start Batch Compilation”.

The engine will queue every matching post and process them one by one in the background. Each post is sent to your configured AI provider (via BYOK) for Precision Parser compilation.

You can monitor progress in real time on the batch dashboard, which shows:

Total posts queued
Posts compiled
Posts failed (with error details)
Content Faithfulness Score distribution
Estimated time remaining

8.3 Rate Limiting and Cost Control

The batch engine includes built-in rate limiting to prevent API quota exhaustion:

Default rate: 1 compilation every 5 seconds.
Configurable delay: Adjustable at LLM Override → Copilot → Batch Delay.
Automatic retry: Failed compilations (API timeouts, rate limit errors) are automatically retried up to 3 times with exponential backoff.

Cost estimation: Before starting a batch job, the dashboard shows an estimated API cost based on your selected provider, the average token count of your content, and the number of posts to compile.

8.4 Autopilot Mode

Autopilot is a continuous synchronization mode that automatically recompiles M2M payloads whenever the source content changes.

When Autopilot is enabled:

Every time you publish or update a post, the engine will automatically compile a fresh M2M payload within minutes.
The engine uses Action Scheduler to queue the compilation asynchronously — the editor does not slow down.
If the compilation fails, it is retried automatically.

Enable Autopilot at LLM Override → Batch Compile → Autopilot Mode.

Autopilot is designed for sites that update content frequently and need their M2M payloads to always reflect the latest published version without manual intervention.

GEO Analytics

This feature requires LLM Override Pro.

9.1 How GEO Analytics Differs from Shadow Analytics

Shadow Analytics Lite (included in the free plugin) records basic bot request data: which bot visited, which page, and when. GEO Analytics extends this with deep telemetry designed for B2B agencies and enterprise compliance teams.

Capability	Shadow Analytics Lite	GEO Analytics (Pro)
Bot identification	✅	✅ + fingerprinting
Request logging	✅ (last 1,000)	✅ (unlimited, configurable retention)
IP hashing	❌	✅ (SHA-256, GDPR-compliant)
Content Faithfulness Score tracking	❌	✅
Entity injection detection	❌	✅
Client reporting	❌	✅ (client-labeled reports)
Operational vs. Interception log split	❌	✅

9.2 Bot Fingerprinting

GEO Analytics doesn’t just record the User-Agent string — it builds a behavioral fingerprint for each crawler based on request patterns, frequency, and content access sequences. This allows you to differentiate between:

Genuine AI crawlers (GPTBot, ClaudeBot) and impersonators
Training crawlers (bulk page fetches) and RAG crawlers (targeted, real-time queries)
New, previously unseen crawlers that may require classification

9.3 Content Faithfulness Score Tracking

Every compiled M2M payload has a Faithfulness Score (see Block 7: M2M Precision Parser). GEO Analytics tracks this score over time, giving you a historical view of your site’s content compliance posture.

The dashboard shows:

Site-wide average Faithfulness Score
Score distribution (how many posts are above 80, between 60–79, below 60)
Score changes after re-compilation
Posts flagged as “cloaking risk” (score below 60)

9.4 Entity Injection Tracking

Entity injection tracking monitors whether AI models are inserting unauthorized entities (competitor names, incorrect product names, fabricated team members) into answers about your brand.

This module works by comparing your Site Manifest entities against the entities present in compiled payloads, detecting any additions that don’t originate from your verified content.

9.5 Client Reporting

For agencies managing multiple client sites, GEO Analytics includes a client reporting feature that generates white-label compliance reports.

Reports include:

Bot activity summary (crawl volume by bot, by page)
Content Faithfulness Score overview
Terminology Standardization enforcement metrics
GEO coverage percentage (pages with compiled payloads vs. total)
Recommendations for compliance improvement

Master Fact Manifest

This feature requires LLM Override Pro.

10.1 What Is the Master Fact Manifest?

The Master Fact Manifest is an AI-generated comprehensive document that replaces the basic /llms.txt index with a deep, factually grounded description of your entire site. Instead of a simple list of URLs, the Master Fact Manifest is a multi-section document that gives AI crawlers immediate access to your company’s complete factual profile.

Think of it as the difference between a table of contents and an executive brief.

10.2 How It’s Generated

The Master Fact Manifest is compiled using the same BYOK AI provider configured for the Precision Parser. The generation process:

The engine collects the Site Manifest, all compiled M2M payloads, and the Terminology Map.
It sends this aggregated dataset to your AI provider with a system prompt that enforces factual summarization — no creative content, no marketing language, strictly facts.
The AI generates a structured document covering: company overview, product/service descriptions, key differentiators, compliance information, and contact details — all grounded exclusively in your published content.
The generated manifest is served at /llms.txt instead of the automated index.

10.3 Regeneration and Freshness

The Master Fact Manifest can be regenerated at any time from the LLM Override → llms.txt settings page.

When Autopilot Mode is enabled (Block 8), the Master Fact Manifest is automatically regenerated whenever the underlying content changes significantly — ensuring the document always reflects your latest published facts.

10.4 Fallback Behavior

If you have Pro installed but haven’t generated a Master Fact Manifest, the /llms.txt endpoint falls back to the standard automated index. There is no disruption — the endpoint always serves something useful.

Agency MCP Server

This feature requires LLM Override Pro (Agency).

11.1 What MCP Enables

The Agency MCP Server exposes a full Model Context Protocol (MCP) endpoint on your WordPress site. MCP is an open standard that allows AI agents (Claude Desktop, Cursor, custom automation pipelines) to interact with external systems programmatically.

With the MCP Server active, an AI agent can:

Read your site’s GEO compliance status
Read the compiled M2M payload for any page
Read your Site Manifest
Update your Site Manifest and per-post payloads
Trigger batch compilations

This turns your WordPress site into a GEO compliance data source that external tools can query and update without human intervention.

11.2 Authentication

The MCP endpoint uses WordPress’s built-in Application Passwords for authentication. Every request must include a valid Application Password with manage_options capability.

To set up authentication:

Go to Users → Your Profile → Application Passwords.
Create a new Application Password (name it “MCP Agent” or similar).
Copy the generated password — it will only be shown once.
Use it in your MCP client’s configuration.

Security note: Application Passwords are separate from your WordPress login password. They can be revoked individually at any time, and they only grant API access — they cannot be used to log into the WordPress dashboard.

11.3 Endpoint Discovery

The MCP endpoint is available at:

				
					https://yoursite.com/wp-json/llm-override/v1/mcp

Example with cURL:

				
					curl -X POST https://yoursite.com/wp-json/llm-override/v1/mcp \
  -H "Content-Type: application/json" \
  -u "admin:xxxx xxxx xxxx xxxx xxxx xxxx" \
  -d '{"tool": "get_site_manifest"}'

11.4 Read Operations

Read operations retrieve your site’s GEO compliance data without modifying anything.

get_site_manifest

Returns your current Site Manifest text and Terminology Map. Use this to audit the semantic layer before making changes.

get_page_markdown

Returns the compiled M2M Markdown payload for a specific post or page. Parameters: post_id (integer).

Example response (abbreviated):

				
					{
  "post_id": 42,
  "title": "How Invoice Automation Works",
  "markdown": "---\ntitle: How Invoice Automation Works\n---\n\n# How Invoice Automation Works\n\nAcme Corp is a B2B SaaS...",
  "source": "compiled",
  "faithfulness_score": 92,
  "last_compiled": "2026-03-15T10:30:00+00:00"
}

The source field tells you how the payload was generated: compiled (Precision Parser), bypass (manual override), or auto (automated conversion).

In a compatible MCP client (Claude Desktop, Cursor, or a custom agent using the MCP SDK), connect to the endpoint using your Application Password. The tools will appear automatically.

MCP Configuration Example:

				
					{
  "mcpServers": {
    "llm-override": {
      "url": "https://yoursite.com/wp-json/llm-override/v1/mcp/sse",
      "headers": {
        "Authorization": "Basic base64(user:application_password)"
      }
    }
  }
}

get_site_coverage

Returns the site-wide GEO coverage metrics: total posts, posts with compiled payloads, posts with bypasses, and posts using automated conversion. This gives you a single data point that tells you how much of your site has been explicitly optimized for AI systems.

11.5 Write Operations

Write operations modify your site’s GEO configuration programmatically.

set_site_manifest

Replaces your entire Site Manifest text. Parameters: manifest (string).

set_terminology_map

Replaces the entire Terminology Map. Parameters: terms (array of {term, replacement} objects).

set_page_bypass

Injects a fully custom M2M Markdown payload for a specific post, bypassing both the automated conversion and the Precision Parser. Parameters: post_id (integer), markdown (string).

11.6 Use Cases

Multi-site agency management

Use case: An orchestration agent that monitors your site for content updates automatically generates and deploys Precision Parser compilations. Example workflow:

Ask AI Buttons

This feature requires LLM Override Pro or Agency license.

11b.1 What Are Ask AI Buttons?

Ask AI Buttons is a Pro module that renders a visual component at the bottom of your post content, inviting visitors to ask AI platforms about your article. Each button opens the corresponding AI platform with a pre-built prompt that includes your M2M-optimized URL (?view=raw).

This creates a self-reinforcing GEO loop: your visitor reads your article, clicks “Ask ChatGPT,” and ChatGPT fetches your optimized M2M payload to answer the question.

Supported AI Platforms:

Platform	Method
Perplexity	URL pre-fill
ChatGPT	URL pre-fill
Microsoft Copilot	URL pre-fill
Gemini	URL pre-fill
Claude	URL pre-fill
Grok	URL pre-fill

All 6 providers use native URL-based prompt pre-filling — no clipboard hacks, no JavaScript clipboard API calls.

11b.2 Configuration

Go to LLM Override Pro → Ask AI Buttons to configure the module.

Setting	Default	Description
Enable/Disable	Enabled	Master toggle for the entire module
Post Types	Posts, Pages	Which post types display the buttons
Providers	All 6 enabled	Choose which AI platforms to show
CTA Label	“Ask AI about this article:”	The heading text above the buttons
Prompt Template	See below	The prompt sent to AI platforms

11b.3 Prompt Template

The prompt template supports dynamic placeholders resolved at render time:

Placeholder	Resolves to
`{title}`	Post title
`{url}`	Canonical permalink
`{m2m_url}`	M2M endpoint URL (`permalink?view=raw`)
`{site_name}`	Your site’s name
`{excerpt}`	Post excerpt or auto-generated 200-char snippet
`{author}`	Post author display name

The template deliberately points AI models to the {m2m_url} endpoint — ensuring they read your structured Markdown payload.

Prompt length limit: 800 characters maximum (enforced server-side).

11b.4 Visual Customization

Presets: outlined (default), solid, pill, ghost, minimal

Button sizes: sm, md (default), lg

Color scheme: Auto (follows prefers-color-scheme), Light, Dark

All colors are customizable via CSS Custom Properties under the .llm-override-ask-ai namespace.

11b.5 Branding

Pro license: “Powered by LLM Override” branding is always visible.

Agency license: Branding can be toggled off for white-label client deployments.

				
					1. Agent reads get_site_coverage → identifies 12 new posts without compiled payloads.
2. For each post: Agent reads get_page_markdown → reviews the auto-generated output.
3. Agent sends post HTML to its own AI provider for optimal compilation.
4. Agent writes compiled Markdown back via set_page_bypass.
5. Agent logs the operation and moves to the next site.

CI/CD integration

Use case: A CI/CD pipeline that detects a merge to the main branch of a content repository automatically sends updated Markdown to all WordPress sites via MCP.

Scaling across sites

Each site has its own Application Password. Your central agent stores these credentials securely and iterates through all sites, auditing compliance, updating manifests, and reporting exceptions — no human touches any WordPress dashboard.

Developer Reference

12.1 Plugin Architecture

LLM Override is built entirely on the WordPress Plugin API. It does not modify core files, does not write to the filesystem, and does not require custom database tables for its core operation (Shadow Analytics uses a custom table for log storage).

All integration points are standard WordPress hooks — filters and actions — that allow developers to extend or modify behavior without touching plugin code.

12.2 Filters (Content Modification Hooks)

Filters let you modify data as it passes through the M2M pipeline.

llm_override_markdown_output
Filters the final Markdown string after all processing stages (conversion, frontmatter, terminology standardization) but before delivery. Use this strictly for compliance metadata, legal disclaimers, or audit timestamps — not for injecting substantive content.

				
					add_filter( 'llm_override_markdown_output', function( $output, $post ) {
    // Append a compliance verification timestamp
    $verified_date = get_the_modified_date( 'Y-m-d', $post );
    $output .= "\n\n---\n";
    $output .= "*Content verified as of {$verified_date}. ";
    $output .= "This document is subject to the terms at https://yoursite.com/terms/.*\n";
    return $output;
}, 10, 2 );

Parameters:

$output (string) — Complete Markdown payload ready for delivery.
$post (WP_Post) — The post object being served.

⚠️ Warning: Adding content that does not exist in the visible HTML page constitutes cloaking and violates search engine guidelines. Limit modifications to compliance timestamps, legal disclaimers, and audit metadata. See §12.8.

llm_override_frontmatter
Modify the YAML frontmatter array before it is serialized into the Markdown document. Use this to add custom metadata fields that AI crawlers should consume.

				
					add_filter( 'llm_override_frontmatter', function( $frontmatter, $post ) {
    $frontmatter[] = 'industry: construction';
    $frontmatter[] = 'region: EU';
    return $frontmatter;
}, 10, 2 );

llm_override_bypass_markdown
Modify a manual bypass payload before delivery. This fires only when a post has a custom bypass — not during automated conversion.

llm_override_llms_txt_lines
Modify the lines of the /llms.txt output before it is sent to the crawler. Useful for adding custom URLs or sections.

llm_override_clean_special_chars
Controls whether the post-conversion Unicode sanitization stage runs. Return false to preserve BOM markers, Zero-Width Spaces, and other Unicode artifacts in the output.

12.3 Actions (Event Hooks)

Actions let you execute custom code when specific events occur in the M2M pipeline.

llm_override_bot_detected
Fires every time a bot is detected by any interception layer. Use this for operational notifications — alerting your team when specific AI crawlers access critical pages.

Parameters:

$post_id (int) — The post ID being accessed (0 if non-singular).
$bot_slug (string) — The matched bot identifier (e.g., ChatGPT-User, Stealth-Bot).
$bot_type (string) — The bot category: training, query, agent, or unknown.
$is_singular (bool) — Whether the request is for a singular post/page.

llm_override_intercept_request
Fires when a request explicitly uses ?view=raw. Useful for logging or triggering side effects on explicit M2M requests.

llm_override_serve_llms_txt / llm_override_serve_llms_full_txt
Fire immediately after the respective manifest endpoint is served.

12.4 Constants

LLM_OVERRIDE_VERSION — The current plugin version string ('1.1.6'). Use this in your extensions to check compatibility.

12.5 Extension Pattern

The recommended pattern for building extensions is a standalone WordPress plugin that hooks into LLM Override’s public API. Example:

				
					<?php
/**
 * Plugin Name: My LLM Override Extension
 * Description: Slack notifications for AI bot activity.
 * Version: 1.0.0
 */

if ( ! defined( 'LLM_OVERRIDE_VERSION' ) ) {
    return; // LLM Override is not active
}

add_action( 'llm_override_bot_detected', function( $post_id, $bot_slug, $bot_type, $is_singular ) {
    if ( ! $is_singular || 'query' !== $bot_type ) {
        return;
    }
    wp_remote_post( 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL', [
        'body'     => wp_json_encode( [ 'text' => sprintf( '🤖 %s read %s', $bot_slug, get_permalink( $post_id ) ) ] ),
        'headers'  => [ 'Content-Type' => 'application/json' ],
        'blocking' => false,
    ] );
}, 10, 4 );

The defined( 'LLM_OVERRIDE_VERSION' ) check ensures your extension does nothing if LLM Override is not active — preventing fatal errors and ensuring clean activation/deactivation.

12.6 Compliance & Responsible Use

LLM Override’s hooks are designed for legitimate extension: compliance disclaimers, operational notifications, metadata enrichment, CRM integrations, and translation layer customization.

Supported uses:

Adding compliance timestamps, legal disclaimers, or audit metadata to payloads
Injecting YAML frontmatter fields for internal taxonomy, content tier, or language metadata
Extending the /llms.txt manifest with custom sections
Triggering operational notifications (Slack, CRM, webhook) on bot detection events

The following uses constitute cloaking and violate search engine guidelines:

Modifying the Markdown payload to include substantive content not present in the visible HTML
Injecting keywords, claims, or promotional text into the M2M output that human visitors never see
Serving materially different content to AI crawlers than what is served to human browsers

Automated guardrail: LLM Override Pro includes a Content Faithfulness Score (Jaccard similarity). A score below 90% is flagged as a cloaking risk in the GEO Analytics dashboard. Scores below 70% trigger a visual warning on the post’s metabox.

Compatibility & Hosting

13.1 Hosting Environments

LLM Override is verified compatible with every major WordPress hosting platform:

Host	Status	Notes
Generic shared hosting	✅ Verified	Works out of the box
WP Engine	✅ Verified	Cache bypass confirmed
Kinsta	✅ Verified	Works with Kinsta’s Nginx rules
Cloudways	✅ Verified	Varnish bypass confirmed
WordPress VIP	✅ Verified	No filesystem writes required
Pantheon	✅ Verified	Works on both Live and Dev environments
Flywheel	✅ Verified	Cache bypass confirmed
SiteGround	✅ Verified	SG Optimizer cache bypass confirmed

The plugin does not write to the filesystem, does not require custom server configuration, and does not depend on any server-side software beyond PHP and WordPress.

13.2 Caching Plugins

LLM Override includes automatic cache bypass for M2M requests. The following caching plugins are explicitly supported and tested:

WP Rocket
LiteSpeed Cache
W3 Total Cache
WP Super Cache
Autoptimize (excluded from M2M endpoints)

If you use a caching plugin not listed here, it will likely work without configuration — the plugin sets standard cache bypass headers (Cache-Control: no-store, DONOTCACHEPAGE) that all well-behaved caching plugins respect.

13.3 Page Builders

LLM Override works with all major page builders. The content pipeline uses recursive regex-based extraction to preserve shortcode inner content rather than stripping it — ensuring accurate Markdown even on pages built with shortcode-heavy builders:

Builder	Status	Method
Gutenberg (Block Editor)	✅ Verified	Native block content extraction
Classic Editor	✅ Verified	Direct HTML conversion
Elementor / Elementor Pro	✅ Verified	Shortcode recursive extraction
WPBakery Page Builder	✅ Verified	Shortcode recursive extraction
Divi Builder	✅ Verified	Shortcode recursive extraction
Beaver Builder	✅ Verified	Shortcode recursive extraction

13.4 SEO Plugins

LLM Override synchronizes with the following SEO plugins for noindex detection:

Yoast SEO
Rank Math
SEOPress
All in One SEO (AIOSEO)

Pages marked as noindex in any of these plugins are automatically excluded from the /llms.txt manifest and from M2M interception.

13.5 Multisite

LLM Override supports WordPress Multisite installations. Each site in the network operates independently with its own Site Manifest, Terminology Map, and analytics data. Network-wide configuration is not currently supported — each site is configured individually.

FAQ & Troubleshooting

14.1 Verifying the Plugin is Working

The fastest way to verify LLM Override is operational:

Open any published post or page on your site.
Add ?view=raw to the URL (e.g., https://yoursite.com/sample-page/?view=raw).
You should receive a plain-text Markdown response with YAML frontmatter.

Alternatively, use cURL from the command line:

				
					curl -s https://yoursite.com/sample-page/?view=raw | head -20

You should receive a plain-text Markdown response. If you receive HTML, the interceptor is not firing — check the Engine Status on the Dashboard.

14.2 Frequently Asked Questions

Q: I see HTML instead of Markdown when I visit ?view=raw.

A: The M2M interception is not active. Go to LLM Override → Settings and verify that Enable M2M Interception is checked. If it’s checked, flush your permalink structure at Settings → Permalinks → Save Changes.

Q: My caching plugin is serving a cached HTML version of the ?view=raw endpoint.

A: LLM Override sets cache bypass headers automatically, but some aggressive caching configurations may override them. Add a manual exclusion rule in your caching plugin for URLs containing ?view=raw.

Q: The Site Manifest doesn’t appear in my /llms.txt output.

A: Go to LLM Override → Semantic Rules and verify that the Site Manifest field contains text. An empty manifest is not injected. Note: the Site Manifest only appears in the /llms.txt and /llms-full.txt discovery endpoints, not in individual per-page payloads.

Q: My Terminology Standardization rules aren’t applying.

A: Verify your terminology entries at LLM Override → Semantic Rules → Terminology Standardization. Each entry needs both a source term and a replacement. Matching is case-insensitive.

Q: The /llms.txt endpoint returns a 404.

A: Flush your rewrite rules at Settings → Permalinks → Save Changes. The /llms.txt rewrite rule is registered on plugin activation, but some hosting environments require a manual flush.

Q: Does this affect my Google rankings?

A: No. All M2M responses include the X-Robots-Tag: noindex header, which instructs search engine crawlers to ignore the Markdown content. Your HTML pages — the ones Google actually ranks — are completely untouched.

Q: Is this cloaking?

A: No. Cloaking means showing fundamentally different content to search engines versus humans. LLM Override does the opposite: it ensures the AI receives the same factual content that humans see, translated into a format the AI can process without error. The Content Faithfulness Score mathematically verifies this parity.

Q: What happens if I deactivate the Pro addon but keep the Free plugin?

A: The Free plugin continues to work independently. Any compiled Precision Parser payloads will no longer be served (they require Pro to be active), but the automated HTML → Markdown conversion takes over automatically. No data is lost — when you reactivate Pro, the compiled payloads are available again.

14.3 Troubleshooting Checklist

Symptom	Cause	Fix
`?view=raw` returns HTML	Engine disabled or permalinks stale	Enable engine + flush permalinks
Cached HTML on `?view=raw`	Aggressive page cache	Add `?view=raw` exclusion rule
Empty Markdown body	Post content is shortcode-only	Verify recursive extraction is working
Stealth detection false positives	Dev/localhost environment	Automatic — localhost is excluded. Disable in settings if persistent
JSON-LD not injecting	Post excluded or SEO plugin conflict	Check per-post exclusion checkbox and collision prevention
`/llms.txt` returns 404	Rewrite rules not flushed	Settings → Permalinks → Save Changes
Terminology map not applying	Cache not invalidated	Edit + save any post to trigger cache flush

Changelog

LLM Override (Free)

1.1.7 — 2026-04-12

Removed: Terminology Standardization engine. This feature introduced semantic divergence between HTML and M2M payloads, contradicting our core principle of content faithfulness. LLM Override now guarantees strict 1:1 parity between what humans read and what machines receive.

1.1.6 — 2026-04-11

Compliance: Full WP.org Plugin Check pass — zero errors, zero warnings. Normalized all line endings to LF, enforced proper escaping, and removed prohibited files.
Fix: Safe shortcode extraction via recursive regex preserves inner textual content from Divi, WPBakery, and Elementor shortcode structures.
Enhancement: Intelligent 12-hour transient caching for /llms.txt and /llms-full.txt with automatic invalidation on post publish, update, or trash.

1.1.0

Feature: Terminology Standardization Engine. M2M Engine now globally replaces legacy forbidden terms logic with a structured {from → to} Terminology Dictionary to ensure Content Faithfulness and compliance.
Enhancement: Migrated global term filtering logic to comply with accurate Source Attribution guidelines.
Tweak: Version bump for plugin parity and architectural refactoring ahead of Sprint 22.3.

1.0.5

New: RAG JSON-LD Grounding Engine. Automatically injects semantic TechArticle schema markup into the HTML <head> containing the M2M translated content.
Enhancement: Complete architectural refactoring of the Content Pipeline. HTML-to-Markdown conversion is now centralized natively inside LLM_Override_Content_Pipeline::convert_to_markdown().
Fix: Developer Experience (DX) bypass for Stealth Bot Detection. IDE headless browsers and Localhost environments (127.0.0.1, .local) will no longer trigger false positive M2M interceptions.

1.0.4

Fix: Added deep exclusions for performance auditing tools (Chrome-Lighthouse, GTmetrix, PingdomPTST) to prevent them from receiving Markdown.
Fix: Added extended SEO bots exclusions (AhrefsBot, SemrushBot, Applebot, DotBot, MJ12bot) to the whitelist.

1.0.3

Fix: Critical Indexing Hotfix. Excluded honest search engine crawlers (like Googlebot and Bingbot) from being falsely flagged by the Stealth Detection Engine.

1.0.2

Fix: Changed the Content-Type header from text/markdown to text/plain to ensure strict AI URL ingesters (like Google NotebookLM) accept the M2M endpoints as valid sources.
Tweak: Restored the X-Robots-Tag: noindex header to prevent search engine SERP pollution.

1.0.1

New: Passive Yoast SEO Compatibility Checker. Intercepts llms.txt overriding rules and Bot Blocker restrictions from Yoast Premium.
Fix: Added missing Content-Type: text/markdown header to the M2M payload response.

1.0.0 — Initial Release (March 2026)

Active M2M Interceptor engine with structured HTML-to-Markdown conversion.
Global Semantic Injection: Forbidden Terms and Corporate Manifest via YAML frontmatter.
Dynamic /llms.txt and /llms-full.txt endpoint generation.
Algorithmic Discoverability via <link rel="alternate"> tag and robots.txt announcement.
Native SEO integrations with Yoast SEO, Rank Math, SEOPress, and AIOSEO.
Native per-post exclusion and payload override via WordPress editor metabox.
Admin Dashboard with Shadow Analytics Lite (M2M bot hit counters, GDPR-compliant IP hashing).
View as AI Admin Bar button for empirical M2M payload verification.
Before vs. After live HTML-to-Markdown simulation in the Dashboard.
Passive bot detection for 52 known AI crawlers across 3 behavioral categories.
HTTP Content Negotiation support (Accept: text/markdown header).
Enterprise Unicode sanitization (BOM, Zero-Width Spaces, Non-Breaking Spaces, Soft Hyphens).
AJAX-driven Transient caching (12-hour TTL) for all M2M endpoints with manual flush.
14 documented action/filter hooks for developer extensibility.
Full compliance with WordPress coding standards: 0 Plugin Check errors, 0 warnings.

LLM Override Pro

1.1.7 — 2026-04-12

Removed: Terminology Standardization engine and all associated UI panels to ensure absolute 1:1 content parity.
Removed: Terminology and “terms purged” KPIs from the Shadow Analytics dashboard.

1.1.3 — 2026-04-06

New: Ask AI Buttons Module. Renders AI provider buttons (Perplexity, ChatGPT, Copilot, Gemini, Claude, Grok) on singular pages with native URL-based prompt pre-filling.
New: 5 visual presets (outlined, solid, pill, ghost, minimal), full color control via CSS Custom Properties, dark mode support.
Fix: Critical meta key disconnection between Free metabox and Pro Copilot metabox resolved.

1.1.0 — 2026-03-23

Refactored: Terminology Standardization Engine globally replaces legacy forbidden_terms logic with a structured {from → to} Terminology Dictionary.
Refactored: Copilot Metabox payload compilation updated to enforce Content Faithfulness parameters under the new terminology map.
Improved: Complete data migration to a relational table schema, securing semantic integrity.
Improved: MCP Audit Tools and Actions refactored to support semantic parity checks over the new data structure.

1.0.4 — 2026-03-20

Added: RAG JSON-LD Generator integrated into the AI Copilot. (Extracts FAQPage, Article, HowTo, etc.).
Added: Pro JSON-LD Controller injecting high-value Schema autonomously into the frontend.
Added: Support for response_format: json across OpenAI, DeepSeek, and OpenRouter for strict schema generation.
Added: Tabbed interface in AI Copilot Metabox to separate M2M Content generation from RAG Schema payload.
Improved: Batch processor prompt hydration now supports {{BRAND_ENTITIES}} and {{FORBIDDEN_TERMS}} variables.

1.0.3 — 2026-03-18

Refactored: Unified GEO Analytics Dashboard — merged dual Client/Technical View into a single cohesive layout.
Fixed: Race condition between competing change event listeners on the view toggle.
Fixed: AJAX race condition when rapidly switching date periods — implemented sequence token to discard stale responses.

1.0.2 — 2026-03-16

Added: Standard MCP JSON-RPC 2.0 Router implementing the Model Context Protocol specification 2025-03-26 (Streamable HTTP transport).
Added: Configuration tabs for Claude Desktop and Cursor with correct JSON config formats.
Fixed: sanitize_title() was silently converting underscores to hyphens in tool names. Replaced with sanitize_key().

1.0.1 — 2026-03-13

Added: Automated OTA Updates powered by the ArrayPress Lemon Squeezy Auto-Updater library.
Added: Internal license synchronization between PRO architecture and the third-party updater payload.
Improved: B2B Clean UI override (native updater row on /wp-admin/plugins.php is now intercepted and hidden).

LLM Override Documentation

Introduction

0.1 What is LLM Override?

0.2 The Problem: Why HTML Breaks AI Translation

0.3 GEO Compliance vs. SEO

0.4 How M2M Translation Works

0.5 The Two-Plugin Architecture

0.6 What Does NOT Change for Human Visitors

Getting Started

1.1 System Requirements

1.2 Installing the Free Plugin

1.3 Installing the Pro Addon

1.4 First 5 Minutes: What to Do After Activation

1.5 The Dashboard: Understanding Your KPIs

M2M Interception Engine

2.1 How the Interceptor Works

2.2 The Four-Layer Detection Cascade

2.3 HTML → Markdown Translation

2.4 YAML Frontmatter

2.5 Cache Bypass System

2.6 X-Robots-Tag: noindex

2.7 The Kill Switch

Site Manifest

3.1 What the Site Manifest Does

3.2 How to Write an Effective Site Manifest

llms.txt Standard

4.1 What Is /llms.txt?

4.2 robots.txt Integration

4.3 <meta name="llms"> Tag

4.4 Post Type Inclusion Rules

4.5 SEO Plugin Synchronization

Payload Precision

5.1 The Per-Post Metabox

5.2 How the Priority System Works

5.3 Customizing the llms-full.txt Excerpt

5.4 The “View as AI” Button

5.5 JSON-LD Semantic Enclosure

Shadow Analytics Lite

6.1 What Shadow Analytics Tracks

6.2 Viewing the Log

6.3 Log Retention

6.4 GDPR Compliance

6.5 Stealth Bot Detection

M2M Precision Parser

7.1 What the Precision Parser Does

7.2 BYOK Architecture

7.3 How to Compile a Post

7.4 The System Prompt

7.5 RAG JSON-LD Extraction

7.6 Content Faithfulness Score

Batch Compilation Engine

8.1 How the Batch Engine Works

8.2 Running a Batch Job

8.3 Rate Limiting and Cost Control

8.4 Autopilot Mode

GEO Analytics

9.1 How GEO Analytics Differs from Shadow Analytics

9.2 Bot Fingerprinting

9.3 Content Faithfulness Score Tracking

9.4 Entity Injection Tracking

9.5 Client Reporting

Master Fact Manifest

10.1 What Is the Master Fact Manifest?

10.2 How It’s Generated

10.3 Regeneration and Freshness

10.4 Fallback Behavior

Agency MCP Server

11.1 What MCP Enables

11.2 Authentication

11.3 Endpoint Discovery

11.4 Read Operations

11.5 Write Operations

11.6 Use Cases

Ask AI Buttons

11b.1 What Are Ask AI Buttons?

11b.2 Configuration

11b.3 Prompt Template

11b.4 Visual Customization

11b.5 Branding

Developer Reference

2.6 `X-Robots-Tag: noindex`

4.1 What Is `/llms.txt`?

4.2 `robots.txt` Integration

4.3 `<meta name="llms">` Tag