Guide
Pre-rendering at scale — large sites with 100k+ pages
Operational playbook for pre-rendering sites with 100,000+ URLs: sitemap sharding, tiered TTL, selective pre-rendering, and render-pool cost control.
Introduction
At 100,000+ URLs, pre-rendering everything is wasteful. The 80/20 rule holds: most organic traffic comes from a bounded canonical set, and the long tail is better served by canonical discipline than by rendering. If you are still proving the baseline problem, start with crawl budget fundamentals.
This guide is the operational playbook for that scale. Six steps from auditing the surface to tuning the render pool. It pairs directly with JavaScript rendering cost and ostr.io vs Cloudflare when the question becomes build vs buy at scale.
How to: Large sites 100k+
- 1
Audit the crawlable surface by segment
Group URLs by template (products, categories, filters, seller-pages, blog, docs, etc.). Pull impressions per segment from Search Console. The top 3-5 segments usually cover 90% of traffic.
- 2
Classify URLs by pre-render priority
Tier A: top 20% canonicals with traffic or explicit priority (homepage, top categories, featured products). Tier B: long tail with any measurable impressions. Tier C: everything else (canonical-redirected, not pre-rendered).
- 3
Shard the sitemap to match the tiers
sitemap-a.xml contains Tier A with frequent lastmod. sitemap-b.xml contains Tier B with longer lastmod intervals. Tier C is not in the sitemap. Googlebot processes shards in parallel, so Tier A gets priority.
- 4
Set tiered TTL on the pre-render pool
Tier A: 1-4 hour TTL. Tier B: 24-72 hour TTL. Tier C: not pre-rendered. Render-pool cost scales with TTL × URL count; tiering keeps cost manageable.
ostrio-config.jsonjson{"defaultTtlSeconds": 86400,"overrides": [{ "prefix": "/", "ttlSeconds": 3600 },{ "prefix": "/category/", "ttlSeconds": 7200 },{ "prefix": "/p/", "ttlSeconds": 14400 },{ "prefix": "/blog/", "ttlSeconds": 86400 },{ "prefix": "/long-tail/", "ttlSeconds": 259200 }]} - 5
Return rel=canonical on Tier C URLs
Filter permutations, paginated deep URLs, and near-duplicates should canonical to the Tier A or Tier B parent. This removes them from the crawlable surface without hurting discoverability.
- 6
Monitor render-pool utilisation weekly
Render-pool utilisation should stay under 70% during peak crawl windows. If it climbs higher, either add render capacity or move the bottom of Tier B to Tier C. Cost scales linearly with utilisation.
The real work is building the priority matrix
The mistake most teams make at 100k+ URLs is thinking in terms of page types only. The better model is a priority matrix: template type, impressions, update frequency, conversion value, and crawl cost. That matrix tells you what belongs in Tier A, B, and C.
If the site behaves more like a listing marketplace or aggregator than a classic catalogue, compare this guide with travel aggregators and marketplaces.
The operational threshold matters as much as the traffic threshold
Two teams can both have 150k URLs and still need different architectures. The difference is whether they can own invalidation, observability, and render-pool capacity themselves. That is why scale decisions must be read together with headless browser overhead.
When the team cannot support that layer reliably, selective pre-rendering plus a managed service is usually cheaper than trying to optimize a perfect DIY system.
Questions engineers ask about this guide
Use the Search Console impression threshold. URLs with zero impressions over 30 days can usually be canonical-redirected. URLs with 1-10 impressions belong in Tier B. URLs with 10+ belong in Tier A.
New URLs enter Tier B by default and get promoted to Tier A if they accumulate impressions in the first month. Automate this with a weekly cron that re-reads Search Console data.
No, because pre-rendering runs off-band from the deploy. The render pool processes snapshots asynchronously. Deploys remain fast; the snapshot refresh catches up in minutes.
Above 1M URLs, selective pre-rendering is mandatory. Most teams pre-render 10-20% of URLs and let the remaining 80% canonical or noindex. Render-pool cost dominates at this scale.
Related guides and deep dives
Guide — crawl budget 80 percent
Where budget gets lost on JS sites.
Guide — expand crawl budget
Systematic levers.
Guide — cache headers
TTL tuning per tier.
Use case — marketplaces
Typical 1M+ URL pattern.
Use case — travel aggregators
Deep fanout at 1M+ URLs.
Compare — vs Cloudflare
KV cost dominates at 100k+ URLs.
Editorial trust
Written by ostr.io engineering team · Engineering Team. We build and run pre-rendering infrastructure for more than 200 engineering teams, which is where the numbers and code samples on this page come from.
Last updated . Editorial scope and review policy: About prerender.info.