Skip to main content

Guide

Pre-rendering at scale — large sites with 100k+ pages

Operational playbook for pre-rendering sites with 100,000+ URLs: sitemap sharding, tiered TTL, selective pre-rendering, and render-pool cost control.

18 min readProcedure: 60 min planAdvancedUpdated

Introduction

At 100,000+ URLs, pre-rendering everything is wasteful. The 80/20 rule holds: most organic traffic comes from a bounded canonical set, and the long tail is better served by canonical discipline than by rendering. If you are still proving the baseline problem, start with crawl budget fundamentals.

This guide is the operational playbook for that scale. Six steps from auditing the surface to tuning the render pool. It pairs directly with JavaScript rendering cost and ostr.io vs Cloudflare when the question becomes build vs buy at scale.

Step-by-step

How to: Large sites 100k+

  1. 1

    Audit the crawlable surface by segment

    Group URLs by template (products, categories, filters, seller-pages, blog, docs, etc.). Pull impressions per segment from Search Console. The top 3-5 segments usually cover 90% of traffic.

  2. 2

    Classify URLs by pre-render priority

    Tier A: top 20% canonicals with traffic or explicit priority (homepage, top categories, featured products). Tier B: long tail with any measurable impressions. Tier C: everything else (canonical-redirected, not pre-rendered).

  3. 3

    Shard the sitemap to match the tiers

    sitemap-a.xml contains Tier A with frequent lastmod. sitemap-b.xml contains Tier B with longer lastmod intervals. Tier C is not in the sitemap. Googlebot processes shards in parallel, so Tier A gets priority.

  4. 4

    Set tiered TTL on the pre-render pool

    Tier A: 1-4 hour TTL. Tier B: 24-72 hour TTL. Tier C: not pre-rendered. Render-pool cost scales with TTL × URL count; tiering keeps cost manageable.

    ostrio-config.json
    json
    {
    "defaultTtlSeconds": 86400,
    "overrides": [
    { "prefix": "/", "ttlSeconds": 3600 },
    { "prefix": "/category/", "ttlSeconds": 7200 },
    { "prefix": "/p/", "ttlSeconds": 14400 },
    { "prefix": "/blog/", "ttlSeconds": 86400 },
    { "prefix": "/long-tail/", "ttlSeconds": 259200 }
    ]
    }
  5. 5

    Return rel=canonical on Tier C URLs

    Filter permutations, paginated deep URLs, and near-duplicates should canonical to the Tier A or Tier B parent. This removes them from the crawlable surface without hurting discoverability.

  6. 6

    Monitor render-pool utilisation weekly

    Render-pool utilisation should stay under 70% during peak crawl windows. If it climbs higher, either add render capacity or move the bottom of Tier B to Tier C. Cost scales linearly with utilisation.

The real work is building the priority matrix

The mistake most teams make at 100k+ URLs is thinking in terms of page types only. The better model is a priority matrix: template type, impressions, update frequency, conversion value, and crawl cost. That matrix tells you what belongs in Tier A, B, and C.

If the site behaves more like a listing marketplace or aggregator than a classic catalogue, compare this guide with travel aggregators and marketplaces.

The operational threshold matters as much as the traffic threshold

Two teams can both have 150k URLs and still need different architectures. The difference is whether they can own invalidation, observability, and render-pool capacity themselves. That is why scale decisions must be read together with headless browser overhead.

When the team cannot support that layer reliably, selective pre-rendering plus a managed service is usually cheaper than trying to optimize a perfect DIY system.

FAQ

Questions engineers ask about this guide

Use the Search Console impression threshold. URLs with zero impressions over 30 days can usually be canonical-redirected. URLs with 1-10 impressions belong in Tier B. URLs with 10+ belong in Tier A.

New URLs enter Tier B by default and get promoted to Tier A if they accumulate impressions in the first month. Automate this with a weekly cron that re-reads Search Console data.

No, because pre-rendering runs off-band from the deploy. The render pool processes snapshots asynchronously. Deploys remain fast; the snapshot refresh catches up in minutes.

Above 1M URLs, selective pre-rendering is mandatory. Most teams pre-render 10-20% of URLs and let the remaining 80% canonical or noindex. Render-pool cost dominates at this scale.

Editorial trust

Written by ostr.io engineering team · Engineering Team. We build and run pre-rendering infrastructure for more than 200 engineering teams, which is where the numbers and code samples on this page come from.

Last updated . Editorial scope and review policy: About prerender.info.