Crawl budget is still real (and still misunderstood)
Google publicly downplays crawl budget for small sites, but anyone running an enterprise commerce platform knows it's the single biggest lever on indexation latency. The crawler allocates time based on site authority, server response speed and content freshness. If your origin takes 800ms to respond, Googlebot will visit fewer URLs per session — and your fresh inventory pages will sit unindexed for days. In 2026, the fix is uncompromising: edge-cached HTML, a server-side rendered shell, and a clean sitemap that excludes parameterised, faceted and thin URLs. We've seen mid-size sites cut indexation lag from 11 days to 38 hours just by trimming 200k junk URLs out of the sitemap and serving HTTP 410 on the dead ones.
Server response time: the silent killer
Time to First Byte (TTFB) is the metric every team underestimates. A 400ms TTFB on mobile pushes LCP past 2.5s no matter how optimised the rest of your page is. Audit your TTFB at the 75th percentile, not the median — a fast median hides a slow tail of database-bound pages that quietly suppress your rankings. Move dynamic logic to the edge, cache aggressively, and accept that some pages need to be statically pre-rendered even if your stack is fully dynamic. The crawler doesn't care about your architecture purity.
Internal linking is the most under-used ranking lever
Most sites focus on backlinks and ignore internal linking, which is a mistake. Internal links distribute PageRank, communicate topical hierarchy and accelerate discovery of new content. The 2026 best practice is contextual linking — every new article links to 4-6 related pages with descriptive anchor text, and every hub page links back to its supporting articles. Skip the generic 'Read more' anchors. Use the actual target keyword, varied naturally across the site.
Hreflang for international sites
If you serve multiple markets, hreflang is the single biggest source of indexation chaos. The rules: every alternate must be reciprocal, every URL self-references, language codes are ISO 639-1, region codes are ISO 3166-1 alpha-2. Ship hreflang in the HTML head or via the XML sitemap — never both. Mixing the two creates conflicts the crawler resolves unpredictably. We routinely find sites losing 30% of international visibility to broken hreflang they don't even know is broken.
Indexation control: robots, noindex, canonical
The three indexation signals — robots.txt, meta noindex and canonical — must agree. The common mistake is canonicalising a noindex page to a indexed one, which tells Google the canonical should also be noindexed. Or blocking a page in robots.txt that has a noindex tag — Google never sees the noindex and the URL lingers in the index for months. Audit conflicts with a crawler that simulates Googlebot, not a generic site spider.
Structured data: the rich result moat
Structured data is no longer optional for competitive verticals. Product schema with Offer and AggregateRating, Article schema for editorial, FAQPage where Google still surfaces it, Event for ticketing — every type that triggers a rich result is a CTR multiplier. Validate in CI with the Rich Results Test API. A broken schema is a silent traffic leak that can take quarters to detect.
Mobile-first means mobile-only
In 2026, the mobile version of your site is the only version Google indexes. If your mobile layout hides content that desktop shows, that content is invisible to ranking. Audit parity between mobile and desktop HTML on every template. Responsive design is fine; conditional rendering is dangerous.
The audit cadence
Run a full technical audit quarterly, a focused crawl monthly, and continuous monitoring for the top 50 templates. Most issues catch themselves at the template level if you ship validation in CI. The audit is for finding what slipped through.