Tutorial: dynamic sitemaps in Next.js and TanStack

Why dynamic sitemaps matter

A static sitemap.xml works for small sites. It fails the moment your content scales past a few hundred URLs or starts changing daily. A dynamic sitemap, generated on request from your database or CMS, stays in sync with your content without manual intervention. For commerce, editorial and any site with frequently-changing content, dynamic is the only sane option.

The sitemap structure

For sites with under 50,000 URLs, ship a single sitemap.xml. For larger sites, use a sitemap index that references multiple child sitemaps, each capped at 50,000 URLs and 50MB. Group child sitemaps by content type (products, articles, categories) so a regression in one type doesn't break the others.

Next.js: the App Router pattern

In Next.js 14+, ship sitemap.ts in your app directory. Export a default async function that returns an array of sitemap entries. The framework handles XML serialisation and the correct content-type header. For sitemap indexes, ship sitemap.ts in subdirectories — each becomes its own child sitemap, and a top-level sitemap.ts can reference them.

TanStack Start: server routes

In TanStack Start, create src/routes/sitemap[.]xml.ts with a server handler that returns the XML response. The [.] in the filename escapes the literal dot. Fetch your data inside the handler, build the XML string, return it with content-type application/xml and a sensible cache header.

Performance: caching is mandatory

Generating a sitemap from a 100k-row database on every crawler request will melt your origin. Cache the sitemap at the edge for at least an hour, ideally 24 hours. If your content changes more frequently than that, use cache tag invalidation to refresh only when something actually changes.

lastmod: get it right or omit it

The lastmod field tells Google when a URL last changed. Get it from your database — the last_modified timestamp on the row. If you can't get an accurate lastmod, omit the field. Faking lastmod with the current date for every URL on every crawl is a known anti-pattern that Google ignores or penalises.

Validation in CI

Ship sitemap validation in CI. The XML must validate against the sitemap schema. URLs must be absolute, escaped and reachable. Broken sitemaps don't error visibly — they just get ignored, and your indexation silently degrades. Catch the regression before it ships.

Linking from robots.txt

Add a Sitemap: directive in robots.txt pointing to your sitemap (or sitemap index). It's the simplest discovery signal and the one Google relies on most. Skip it and you're hoping the crawler finds the sitemap via the well-known path, which is less reliable than it should be.