Traditional SEO Audits Miss Half the Picture
If you ran a technical SEO audit on your site last year, you probably checked the usual suspects: crawl errors, broken links, page speed, mobile friendliness, canonical tags, sitemap health. These factors still matter. But they represent only half of what determines whether your content reaches users in 2026.
The other half is AI readiness. A set of technical factors that determine whether AI platforms like ChatGPT, Perplexity, Gemini, Claude, DeepSeek, Grok, and Google AI Overviews can discover your content, parse it correctly, and cite it in their responses. Traditional audit tools do not check most of these factors because they were designed for a world where Google was the only search engine that mattered.
AI readiness is not a separate discipline from technical SEO. It is an extension of it. The same principles apply: make content discoverable, parsable, and trustworthy. But the specific requirements differ because AI crawlers have different capabilities, tolerances, and priorities than traditional search bots.
A site that loads perfectly in Chrome and ranks well on Google might be completely invisible to GPTBot because JavaScript rendering fails, robots.txt blocks the wrong user-agent, or the content structure lacks the semantic clarity that language models need.
We have organized AI readiness into 38 specific, testable factors across five categories: Crawlability, Structured Data, Content Structure, Entity Signals, and Performance. Each factor includes what it is, why it matters for AI, and what a passing result looks like.
See also: E-E-A-T and AI Visibility: Why Google's Quality Framework Matters for GEO
Category 1: Crawlability (8 Factors)
Crawlability is the foundation. If AI bots cannot access your pages, nothing else in this audit matters. These eight factors are pass/fail gates.
1. AI Crawler Access in robots.txt
Your robots.txt must explicitly allow major AI crawlers: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, and Applebot-Extended. If your file contains a blanket Disallow: / for wildcard user-agents without specific Allow rules for these bots, they cannot crawl your site. Check for accidental blocks.
Pass: Named AI crawlers are allowed or not mentioned (defaulting to allowed). Fail: Blanket blocks or specific Disallow rules for bots you want to reach.
2. llms.txt File Presence
llms.txt is an emerging standard that provides a structured summary of your brand, key pages, and content hierarchy for language models. Not all AI platforms use it yet, but early adoption signals AI-awareness and provides a clear content map for bots that do.
Pass: llms.txt exists at root, contains brand summary and key page links. Fail: No llms.txt file present.
3. XML Sitemap Completeness
Your sitemap must include every page you want AI crawlers to find. AI bots use sitemaps as discovery paths, especially for content that is not well-linked internally. Missing pages in your sitemap mean AI crawlers may never find them.
Pass: Sitemap includes all indexable pages, is referenced in robots.txt, and returns 200 status. Fail: Sitemap missing, incomplete, or returning errors.
4. Server-Side Rendering (SSR) for Key Content
Most AI crawlers have limited JavaScript rendering capability. If your content relies on client-side JavaScript to load, AI bots see empty pages. Critical content must be available in the initial HTML response without JavaScript execution.
Pass: Key content visible in page source without JavaScript execution. Fail: Content loads only after JavaScript runs.
5. Crawl Response Time Under 2 Seconds
AI crawlers have timeout thresholds. If your server takes too long to respond, the bot moves on and your page is never processed. Keep server response time (TTFB) under 2 seconds for all content pages.
Pass: TTFB under 2 seconds for content pages. Fail: Responses exceeding 2 seconds on content-heavy pages.
6. No Soft 404s on Content Pages
A soft 404 returns a 200 status code but displays error or empty content. AI crawlers trust the status code. Serving empty pages with 200 status teaches AI models to associate your brand with low-quality content.
Pass: All content pages return actual content with 200 status. Error pages return proper 404 codes. Fail: Empty or error pages returning 200 status.
7. Canonical Tags Pointing to Correct URLs
Duplicate content confuses AI crawlers. Every page should have a self-referencing canonical tag or point to the preferred version. Conflicting versions may cause AI models to cite the wrong page or skip both.
Pass: Every page has a correct canonical tag. No orphaned or conflicting canonicals. Fail: Missing, self-conflicting, or incorrect canonical tags.
8. HTTPS Across All Pages
AI crawlers and the platforms they serve prefer secure connections. Mixed content (HTTP pages on an HTTPS site) can trigger crawl failures or reduced trust scoring. Every page, image, and resource should load over HTTPS.
Pass: Full HTTPS, no mixed content warnings, valid SSL certificate. Fail: HTTP pages, mixed content, or expired certificates.
If your site fails any of the 8 crawlability factors, fix them first. Everything else in this audit depends on AI bots being able to access and read your pages. A site with perfect structured data but blocked crawlers is invisible to AI.
Category 2: Structured Data (7 Factors)
Structured data helps AI platforms understand what your content is about at a machine-readable level. It is the difference between a bot reading your page as unstructured text and understanding that this page describes a product, this section answers a question, and this person is the author.
9. Organization Schema
Your site should have Organization schema markup on the homepage. This tells AI models your company name, logo, social profiles, and contact information. It anchors your brand as an entity that AI can reference consistently.
Pass: Valid Organization schema on homepage with name, logo, URL, and social links. Fail: No Organization schema or incomplete implementation.
10. WebSite Schema with SearchAction
WebSite schema tells AI crawlers that your domain is a website (not a random page collection) and provides a site search URL. This helps AI models understand your site as a cohesive entity.
Pass: WebSite schema on homepage with name, URL, and SearchAction. Fail: Missing WebSite schema.
11. Article/BlogPosting Schema on Content Pages
Every blog post and article should have Article or BlogPosting schema. This tells AI platforms the publication date, author, headline, and description in a structured, machine-readable format.
Pass: Article or BlogPosting schema on all content pages with headline, datePublished, author, and description. Fail: Content pages without article schema.
12. FAQ Schema for Question-Answer Content
If your page answers common questions, FAQ schema marks those Q&A pairs for direct AI consumption. Pages with properly marked-up FAQ content get cited more often in AI responses.
Pass: FAQ schema on pages with Q&A content. Questions and answers match visible page content. Fail: Q&A content without FAQ schema, or schema that does not match visible content.
13. BreadcrumbList Schema
Breadcrumb schema helps AI crawlers understand your site hierarchy and content relationships. It signals which category a page belongs to and how content is organized. This context influences how AI models categorize your content.
Pass: BreadcrumbList schema on all pages with accurate hierarchy. Fail: No breadcrumb schema or inaccurate hierarchy.
14. Product Schema (E-commerce Sites)
For e-commerce pages, Product schema provides name, price, availability, reviews, and descriptions in a format AI can parse directly. Without it, AI platforms must guess product details from unstructured page text.
Pass: Product schema on all product pages with name, price, availability, and description. Fail: Product pages without Product schema.
15. Schema Validation (No Errors)
Having schema markup is not enough. It must validate without errors. Invalid schema is worse than no schema because it signals low technical quality to AI systems. Use the Schema.org validator or Google Rich Results Test to check every schema type on your site.
Pass: All schema validates without errors or warnings. Fail: Schema with validation errors, missing required fields, or incorrect types.
Structured data is how you speak to AI in its native language. Unstructured HTML requires AI models to infer meaning. Structured data states it explicitly. Every schema type you add reduces the chance of your content being misinterpreted or overlooked.
Category 3: Content Structure (8 Factors)
Content structure determines how easily AI models can extract facts, definitions, and quotable passages from your pages. A well-structured page is a page that AI can read, parse, and cite. A poorly structured one gets skipped.
16. Clear H1-H2-H3 Hierarchy
Every page needs one H1 (the page title) and a logical H2/H3 hierarchy that breaks content into scannable sections. AI models use heading structure to understand topic boundaries and subtopic relationships. Skipped heading levels (H1 to H3 with no H2) break this logic.
Pass: One H1 per page, logical H2/H3 nesting, no skipped levels. Fail: Multiple H1s, skipped heading levels, or flat structure with no subheadings.
17. Definition in First 2-3 Sentences
AI platforms frequently answer "What is X?" questions. They look for content that provides a clear, concise definition in the opening paragraph. Pages that bury the answer below multiple introductory paragraphs are less likely to be cited.
Pass: Key pages open with a direct definition or clear statement of what the page is about within the first 100 words. Fail: Opening with vague introductions, questions, or stories before stating the main point.
18. Quotable Content Blocks (134-167 Words)
AI-generated answers often include passages that closely match a source. The content that gets selected tends to appear in self-contained blocks of 134 to 167 words. Write paragraphs that can stand alone as complete answers. If a paragraph makes sense out of context, AI platforms can use it.
Pass: Key pages contain multiple self-contained paragraphs that answer specific questions completely. Fail: Content only makes sense when read sequentially with no standalone passages.
19. Lists and Tables for Comparative Data
When AI platforms answer comparison or "how to" queries, they prefer content formatted as lists or tables. Structured formats are easier to parse than prose paragraphs. Use numbered lists for processes, bullet lists for features, and tables for comparisons.
Pass: Appropriate use of lists and tables where content is comparative, procedural, or feature-based. Fail: All content in prose paragraphs with no structured formats.
20. Internal Linking Between Related Content
Internal links help AI crawlers discover related pages and understand content relationships. A page about "robots.txt for AI" should link to "AI crawler list" and "GEO strategy." These connections build a content graph that AI models can follow.
Pass: Each content page links to 3-5 related pages with descriptive anchor text. Fail: Orphaned pages with no internal links or generic anchor text.
21. Unique Meta Descriptions on Every Page
Meta descriptions serve as page summaries that AI crawlers read alongside the page content. Duplicate or missing meta descriptions force AI bots to generate their own summary, which may not represent your page accurately.
Pass: Every page has a unique meta description under 160 characters that accurately summarizes the content. Fail: Missing, duplicate, or auto-generated meta descriptions.
22. Image Alt Text with Descriptive Context
AI crawlers that process images rely on alt text to understand visual content. Even text-focused AI models use alt text as additional context for understanding a page. Descriptive alt text (not keyword-stuffed) improves content comprehension.
Pass: All content images have descriptive alt text that explains what the image shows. Fail: Missing alt text, generic placeholders, or keyword-stuffed alt attributes.
23. Content Freshness Signals
AI platforms prioritize recent, up-to-date content. Pages should display a visible publication date or last-updated date. Schema markup should include datePublished and dateModified fields. Stale content without freshness signals gets deprioritized.
Pass: Visible dates on content pages, datePublished and dateModified in schema, content updated within the last 12 months. Fail: No visible dates, missing date schema, or content that has not been updated in over a year.
Content structure is where most sites have the biggest opportunity. The technical fixes (crawlability, schema) are binary. Content structure is a spectrum, and most pages can be improved with restructuring alone, no new content needed.
Category 4: Entity Signals (8 Factors)
Entity signals tell AI models who you are, what you do, and why you should be trusted. AI platforms do not just index pages. They build an understanding of brands as entities. Strong entity signals mean AI models recognize your brand and are more likely to cite it.
24. Consistent NAP (Name, Address, Phone) Across the Web
Your brand name, address, and phone number should be identical across your website, Google Business Profile, social media, directories, and third-party mentions. Inconsistencies confuse AI models about whether different mentions refer to the same entity.
Pass: NAP information is identical across all major web presences. Fail: Variations in brand name spelling, outdated addresses, or conflicting phone numbers.
25. Wikipedia or Wikidata Presence
AI models weight Wikipedia and Wikidata heavily when building entity knowledge. A Wikipedia page or Wikidata entry increases the chance that AI platforms recognize your brand as notable. Not every brand qualifies for Wikipedia, but Wikidata has a lower threshold.
Pass: Wikidata entry exists with accurate brand information. Wikipedia page if notability criteria are met. Fail: No Wikidata or Wikipedia presence.
26. Google Knowledge Panel
A Google Knowledge Panel indicates that Google recognizes your brand as a distinct entity. Since Google AI Overviews draws from the same knowledge graph, a Knowledge Panel strongly correlates with AI Overviews visibility.
Pass: Active Knowledge Panel with correct information. Fail: No Knowledge Panel or one with outdated/incorrect data.
27. Author Pages with Structured Bios
Content attributed to named authors with verifiable expertise gets cited more often by AI platforms. Each author should have a dedicated bio page on your site with credentials, photo, social links, and links to their published content. Author schema (Person type) should be implemented.
Pass: Named authors on all content, dedicated bio pages, Person schema for each author. Fail: Anonymous content, no author pages, or missing author schema.
28. Brand Mentions on Authoritative Sites
AI models learn about brands from the broader web. Mentions on industry publications, news sites, and review platforms build entity authority. More authoritative mentions mean higher citation probability.
Pass: Brand mentioned on multiple authoritative third-party sites relevant to your industry. Fail: Minimal or no third-party brand mentions.
29. Social Media Profile Consistency
Your social media profiles should be linked from your website (using sameAs schema), use consistent branding, and be actively maintained. AI platforms cross-reference social profiles when building entity understanding.
Pass: Active social profiles linked from website via sameAs schema, consistent branding across platforms. Fail: Inactive, unlinked, or inconsistently branded social profiles.
30. About Page with Clear Brand Definition
Your About page is often the first page AI models consult when building entity knowledge. It should clearly state what your company does, who it serves, and what makes it different. This is not marketing copy for humans only. It is entity definition for machines.
Pass: About page with clear one-sentence brand definition, founding story, team information, and mission. Fail: Vague or missing About page.
31. Consistent Brand Terminology
Use the same terms to describe your products and services across your entire site. If you call something "AI monitoring" on one page and "brand tracking" on another, AI models may not connect the two. Pick your terms and use them consistently everywhere.
Pass: Consistent product and feature terminology across all pages. Fail: Inconsistent or contradictory terminology for the same features.
Entity signals are the hardest to build and the most impactful once established. Technical fixes take hours. Content restructuring takes days. Building entity authority takes months. But once AI models recognize your brand as a trusted entity, that recognition compounds across every query where your brand is relevant.
Category 5: Performance (7 Factors)
Performance factors affect whether AI crawlers can process your pages efficiently and whether the data they collect is accurate and usable.
32. Core Web Vitals (LCP, FID, CLS)
Google uses Core Web Vitals as quality signals for both traditional search and AI Overviews. Poor metrics signal a low-quality page. Targets: LCP under 2.5 seconds, FID under 100ms, CLS under 0.1.
Pass: All three Core Web Vitals in "Good" range. Fail: Any metric in "Poor" range.
33. Mobile Responsiveness
AI platforms serve users on all devices, and Google indexes mobile-first. If your content is not readable on mobile, it may receive lower quality signals that affect both search and AI citation probability.
Pass: Fully responsive design, readable content on all screen sizes, no horizontal scrolling. Fail: Non-responsive layout or content that breaks on mobile.
34. No Render-Blocking Resources for Content
Heavy CSS and JavaScript files that block initial page rendering can prevent AI crawlers from accessing content. Critical content should render without waiting for non-essential resources to load.
Pass: Critical content available in initial HTML. No render-blocking resources delaying content visibility. Fail: Content hidden behind render-blocking scripts or stylesheets.
35. Clean URL Structure
URLs should be readable, descriptive, and stable. AI models process URLs as signals about page content. A URL like /blog/ai-crawler-list-2026 tells the model more than /post?id=47382. Avoid URL changes without proper redirects, as broken URLs fragment your entity signals.
Pass: Descriptive, stable URLs. Proper 301 redirects for any changed URLs. Fail: Parameter-heavy URLs, frequent URL changes without redirects, or broken links.
36. Minimal Redirect Chains
Each redirect in a chain adds latency and increases the chance of a crawl failure. AI crawlers have lower patience for redirect chains than traditional search bots. Keep chains to a maximum of one redirect (original URL to final URL, no intermediaries).
Pass: No redirect chains longer than one hop. Fail: Redirect chains with two or more intermediary URLs.
37. Proper Error Handling (4xx and 5xx)
Monitor your site for 4xx and 5xx errors that AI crawlers encounter. A high error rate signals unreliability. AI platforms deprioritize sites that frequently return errors because unreliable sources produce unreliable citations.
Pass: Error rate below 1% for crawled URLs. 404 pages return proper status codes. No persistent 5xx errors. Fail: Error rate above 5%, or persistent 5xx errors on content pages.
38. CDN and Geographic Availability
If your target audience spans multiple regions, your content should be available globally through a CDN. AI crawlers operate from various geographic locations. A site that loads quickly in the US but times out from Europe may miss crawl cycles from non-US bot instances.
Pass: CDN in use, content accessible globally with consistent performance. Fail: Single-origin hosting with poor performance outside the primary region.
Performance is the silent killer of AI visibility. A slow site does not get an error message. It gets skipped. The crawler moves on to the next source, and you never know the visit happened or that it failed.
Prioritization: Where to Start
All 38 factors matter, but not equally. If you are starting from scratch, here is the order that produces the fastest impact:
Priority 1: Crawlability (Factors 1-8). These are binary gates. If AI bots cannot access your site, nothing else matters. Fix robots.txt, implement SSR, ensure fast response times. This is a one-time setup that unlocks everything downstream.
Priority 2: Structured Data (Factors 9-15). Schema markup gives AI platforms machine-readable context about your content. Start with Organization, Article, and FAQ schema. These three types cover the most common AI query patterns.
Priority 3: Content Structure (Factors 16-23). Restructuring existing content for AI readability is the highest-impact ongoing activity. Clear headings, quotable blocks, opening definitions, and internal linking improve how AI models parse and cite your pages.
Priority 4: Entity Signals (Factors 24-31). Entity building is a long-term investment. Start with consistency (NAP, terminology, social profiles) and work toward authority (Wikipedia, third-party mentions, Knowledge Panel). Results take months but compound over time.
Priority 5: Performance (Factors 32-38). Performance issues rarely cause total AI invisibility, but they reduce crawl efficiency and signal quality. Fix obvious problems (redirect chains, error pages) and monitor Core Web Vitals.
Run this audit quarterly. Track scores over time to measure progress.
Running the Audit
Run this audit with a combination of standard SEO tools and AI-specific checks. Google Search Console covers crawl errors, Core Web Vitals, and sitemap status. Schema validation tools handle structured data. Server logs reveal AI crawler access patterns.
The piece most traditional tools miss: checking whether AI platforms actually cite your brand after you make fixes, and tracking how competitor positions shift across AI platforms. That gap is where AI-specific monitoring matters.
Document each factor as pass or fail. Fix the failures in priority order. Retest quarterly.
See also: How to Build a GEO Strategy from Scratch (Step-by-Step)