Skip to main content

Guide

Pre-render cache headers - Cache-Control, Vary, invalidation

Cache-Control, Vary, Last-Modified, and invalidation patterns that make pre-rendered snapshots recrawl efficiently on any CDN.

14 min readProcedure: 30 min setupIntermediateUpdated

Introduction

Cache headers are the interface between your pre-rendering layer and the rest of the crawl pipeline. Tune them well and recrawl becomes cheap; tune them badly and crawlers fetch the same snapshot over and over. If the broader problem is discovery rather than freshness, start with crawl budget fundamentals.

This guide covers the five headers that matter and the invalidation patterns that pair with each. The most important question is not just what headers you set, but which page types deserve which freshness policy.

What most teams get wrong is using one cache rule for every URL family. Product pages, docs, pricing pages, category pages, listings, and static support pages do not carry the same freshness risk. A cache policy that looks tidy in config often becomes the reason stale crawler HTML survives longer than it should.

Step-by-step

How to: Cache headers

  1. 1

    Set Cache-Control with max-age and stale-while-revalidate

    max-age controls freshness; stale-while-revalidate lets crawlers receive a stale copy while the render pool regenerates. This keeps recrawl responses fast even right after a TTL expiry.

    text
    Cache-Control: public, max-age=3600, stale-while-revalidate=86400
  2. 2

    Add Vary: User-Agent on bot-facing responses

    Snapshots are served only to bots; human traffic gets the SPA. Vary: User-Agent signals to caches that the response depends on UA and prevents a stale bot snapshot from being served to a user.

  3. 3

    Emit Last-Modified from the snapshot timestamp

    Last-Modified lets crawlers send If-Modified-Since and receive 304 Not Modified — the cheapest possible response. Without it, every recrawl transfers the full HTML.

  4. 4

    Wire a purge API to content events

    CMS updates, price changes, inventory changes — each should fire a purge against the snapshot URL. ostr.io's purge API accepts batches of 10,000 URLs per call.

    cms-webhook.ts
    typescript
    import { purgeOstrio } from "@/lib/ostrio";
    export async function onCmsUpdate(event: CmsEvent) {
    const affectedUrls = event.affectedPaths.map(
    (p) => `https://yourdomain.com${p}`,
    );
    await purgeOstrio({ urls: affectedUrls });
    }
  5. 5

    Set a noindex header on Soft 404 or error pages

    Pre-rendered 404 or error pages should carry X-Robots-Tag: noindex in the response header. Google honours the header even if the page body looks indexable.

Cache policy should follow page type, not one global default

A single TTL across the whole site is almost always wrong. Product pages, category pages, docs, changelogs, static marketing pages, and inventory pages all carry different freshness risk and different recrawl value.

That is why cache policy should be grouped by template family. The same rule appears again in large sites (100k+ pages), where tiered TTL becomes mandatory instead of optional.

A practical TTL model by template type

Use short TTLs where a stale field would create SEO or trust damage: product pages with price or stock, booking and availability pages, listings with status changes, or high-visibility category pages that change several times per day. Use medium TTLs for docs, changelogs, comparison pages, and evergreen use-case pages that need recrawl efficiency but not minute-by-minute freshness. Use long TTLs for static pages such as about, legal, privacy, and stable support content.

In practice, the safe starting model is: 1-4 hours for high-volatility templates, 24 hours for normal commercial content, and 72+ hours for stable pages. That is not a law of nature. It is a risk model. If one stale field can create a ranking, compliance, or trust problem, the TTL should be short and invalidation should be event-driven.

Invalidation should follow business events, not only deploys

A deploy is not the only event that changes crawler-facing truth. CMS saves, inventory syncs, price updates, sold status changes, review-count updates, schema changes, and editorial corrections should all be able to purge affected URLs. If the cache only refreshes on deploy, the site is implicitly accepting stale HTML between releases.

The catch is that different templates need different triggers. A product page might purge on stock or price change. A category page might purge when top inventory changes materially. A docs page might purge on a CMS publish event only. That is why invalidation design belongs in content architecture, not only in CDN configuration.

The real SEO risk is not caching - it is stale business-critical fields

Most teams do not get hurt because a page is cached. They get hurt because pricing, stock, availability, or structured data in the snapshot no longer matches the live page. That is a freshness and invalidation problem, not a blanket anti-cache argument.

If those fields are central to your organic traffic model, pair this guide with ecommerce, bookings, or real-estate depending on your inventory type.

Structured data drift is one of the least visible failure modes. Teams check the visible HTML, but forget that stale `Offer`, `AggregateRating`, `JobPosting`, or availability fields can survive in the snapshot even when the page looks normal to a human reviewer. That is why parity checks should inspect JSON-LD as carefully as visible content.

Failure cases that break otherwise good cache policies

Failure case one: returning `200 OK` with a stale snapshot for a URL that should already be `404`, `410`, or `noindex`. Failure case two: forgetting `Vary: User-Agent`, which risks mixing crawler-facing HTML and user-facing responses in the wrong cache layer. Failure case three: using stale-while-revalidate as a bandage for missing purge triggers, which makes the cache look calm while important fields stay wrong.

Failure case four is more subtle: TTLs are correct, but the upstream app changes the meaning of the page without changing the invalidation signal. Teams see this on listings, jobs, bookings, and inventory-heavy pages. The page technically refreshed on schedule, but not when the business state changed. That gap is where SEO-visible staleness survives.

FAQ

Questions engineers ask about this guide

1-4 hours for freshness-critical content (product pricing, availability). 24 hours for typical content. 72+ hours for stable pages (about, privacy, terms). Tiered TTL per prefix is the pragmatic pattern.

No. A single TTL across all templates is usually a mistake. The right TTL depends on how risky stale HTML would be for that page type and how often the underlying business data changes.

Cloudflare, Vercel, Netlify, Fastly and Akamai all support it. ostr.io's edge also honours it. Some legacy CDNs ignore stale-while-revalidate; check vendor docs.

Any event that changes crawler-facing truth: CMS publishes, stock changes, price updates, sold status changes, schema changes, review-count updates, or important content edits. If a human would say the page materially changed, the snapshot should usually be purged.

Surrogate-Control is useful when you need different TTLs for the edge vs the browser. For bot-facing snapshots both targets are the same, so Cache-Control alone is usually enough.

Yes. Stale JSON-LD is easy to miss in QA because the page can still look correct visually. But outdated `Offer`, `AggregateRating`, `JobPosting`, or availability fields can create trust and eligibility problems in search results even when the visible snapshot looks fine.

It becomes dangerous when teams use it to hide missing purge logic. stale-while-revalidate is useful for fast recrawls and smoother regeneration, but it is not a substitute for event-driven invalidation on templates where business-critical fields change often.

Editorial trust

Written by ostr.io engineering team · Engineering Team. We build and run pre-rendering infrastructure for more than 200 engineering teams, which is where the numbers and code samples on this page come from.

Last updated . Editorial scope and review policy: About prerender.info.