Skip to main content

Case Studies

Case Study: E-Commerce Recovers 52% Crawl

90-day case study deploying ostr.io prerendering on a 500k-product React SPA, recovering crawl budget from WAF and DOM issues.

8 min readUpdated
Case Study: E-Commerce Recovers 52% Crawl

Article

This case study covers the implementation of ostr.io prerendering on a 500,000-product e-commerce catalog and the indexation outcomes over a 90-day period. The site had a React SPA architecture with client-side product data fetching, a Cloudflare WAF deployment with bot-fight mode enabled, and 18% of the catalog stuck in "Discovered — currently not indexed" status at the start of the engagement.

The Problem: Two Compounding Issues

The catalog's indexation failure had two root causes that amplified each other.

Issue 1: Client-side product data fetching. The product page React components fetched inventory, pricing, and product attributes via useEffect hooks after initial mount. The initial HTML shell was a skeleton — a header, navigation, and empty product container. Googlebot's second-wave JavaScript rendering populated the product content, but second-wave rendering is queued and delayed by days to weeks for a catalog of this size.

Issue 2: Cloudflare Bot Fight Mode blocking Googlebot. The site used Cloudflare's Bot Fight Mode in blocking mode. Googlebot's IP range overlapped with legitimate cloud infrastructure IPs that Cloudflare's behavioral analysis was flagging. In Google Search Console's crawl stats, approximately 12% of Googlebot requests were receiving 403 responses — silently failing before any rendering could occur.

The combination produced a compounding problem: 12% of crawl budget was wasted on blocked requests, and a significant fraction of the remaining 88% was consumed by second-wave JavaScript rendering queue overhead.

Raster technical flow diagram for Case Study: How a 500k-Page E-Commerce Site Recovered 52% of Wasted Crawl Budget — delivery paths, caching, and crawler-facing HTML.

Diagnosis

Google Search Console data at the start of the engagement:

MetricValue
Total pages in sitemap523,000
Indexed pages428,500
"Discovered — currently not indexed"63,200
"Crawled — currently not indexed"31,300
Avg crawl frequency (product pages)1.2× per month
Render type (crawl stats)78% JavaScript, 22% HTML
Googlebot 403 rate (WAF logs)~12%

The 78% JavaScript rendering rate confirmed that most product pages were triggering Googlebot's second-wave rendering queue. The target was to shift this to near-0% JavaScript rendering by serving static HTML snapshots.

Raster comparison panel summarizing architectural tradeoffs discussed in Case Study: How a 500k-Page E-Commerce Site Recovered 52% of Wasted Crawl Budget.

Implementation

Week 1: WAF configuration. ostr.io's dedicated IP ranges were added to Cloudflare's WAF allowlist as priority-1 allow rules. Additionally, Googlebot's known IP ranges (published by Google) were added as a secondary allowlist. The 403 rate in WAF logs dropped to 0% within 24 hours of the allowlist changes.

Week 1–2: Prerendering middleware deployment. ostr.io middleware was deployed as a Cloudflare Worker sitting in front of the origin. Bot detection used User-Agent matching plus IP range verification against ostr.io's updated bot IP database.

TTL configuration:

  • Product pages: 15 minutes (dynamic pricing and inventory)
  • Category pages: 30 minutes (listing changes)
  • Static content pages: 4 hours

Week 2: Cache Warming API integration. A webhook was connected to the product information management (PIM) system. When product attributes, pricing, or inventory status changed, the PIM webhook triggered a Cache Warming API call to ostr.io for the affected product URL. High-velocity products (seasonal promotions, flash sale items) were given high-priority warming.

Week 3: DOM Consistency Score baseline. ostr.io's monitoring dashboard was configured for 200 sample product URLs — 100 from the "indexed" set and 100 from the "not indexed" set. The baseline DOM Consistency Score was 61% — alarmingly low. The cause: the product data fetching useEffect was adding significant content (full product descriptions, specifications, review counts) after the initial render snapshot was captured.

Root cause fix: The product API calls were moved from useEffect to the Next.js App Router page.tsx Server Component using async/await data fetching. This pre-populated the product data on the server side, eliminating the DOM inconsistency.

After the data fetching refactor: DOM Consistency Score improved to 97% for the same 200 URLs.

Results at 30, 60, and 90 Days

30-Day Results:

MetricBaseline30 DaysChange
Googlebot 403 rate12%0%-100%
JavaScript rendering rate78%8%-70pp
DOM Consistency Score (sample)61%97%+36pp
Avg crawl frequency1.2×/month2.1×/month+75%

At 30 days, Googlebot's crawl frequency had increased by 75% — a clear signal that the quality of responses improved. Higher crawl frequency means more budget is allocated to this domain.

60-Day Results:

MetricBaseline60 DaysChange
Indexed pages428,500471,200+42,700 (+10%)
"Not indexed" (combined)94,50051,800-42,700
Organic clicks (GSC, 28-day)Baseline+18%
Cache hit rate (ostr.io)84%

42,700 previously unindexed product pages entered the index at 60 days. These were primarily mid-tier products with specific attribute-matching search intent (size, color, material, model number).

90-Day Results:

MetricBaseline90 DaysChange
Indexed pages428,500511,000+82,500 (+19%)
"Not indexed" (combined)94,50012,000-87%
Crawl budget efficiency48%100%+52pp
Organic revenue (GA, 28-day)Baseline+31%

The headline result: 52% crawl budget recovery. At baseline, 52% of Googlebot's crawl budget on this domain was consumed by WAF blocks (12%), failed second-wave renders (25%), and repeated crawls of pages that were returning inconsistent DOM output (15%). After prerendering, this waste was eliminated.

The 19% increase in indexed pages (82,500 new product pages) drove 31% organic revenue growth — from long-tail product search traffic that previously had no indexable product pages.

Key Lessons

The WAF issue was invisible without WAF log analysis. The 12% Googlebot 403 rate was not visible in Google Search Console — GSC does not report blocked requests separately. It only appeared in Cloudflare WAF logs when filtered for Googlebot User-Agent. WAF configuration should be the first thing checked before any other prerendering diagnosis.

DOM Consistency Score of 61% explained the "not indexed" persistence. Google does not necessarily de-index pages with low DOM Consistency Scores, but it deprioritizes crawling them. The pattern was clear: pages with high DOM inconsistency were crawled once, not indexed, and then visited very infrequently. After the data fetching fix raised DOM Consistency Score to 97%, these pages received follow-up crawl visits within 3–4 weeks.

Cache Warming API was critical for the 15-minute TTL product pages. Without proactive warming, the 15-minute TTL meant Googlebot frequently encountered rendering-in-progress states. With Cache Warming API triggered by PIM webhooks, the cache hit rate reached 84% within 60 days — Googlebot almost always received a pre-cached snapshot.

FAQ

Frequently Asked Questions

GSC's "Pages" and "Crawl stats" reports do not break out blocked requests as a separate failure mode. Blocked Googlebot requests appear as missing crawl events, not as 4xx or 5xx errors. The 12% Googlebot 403 rate was only visible in Cloudflare's WAF event log, filtered for verified Googlebot User-Agents. Any team running Cloudflare, AWS WAF, or Akamai on a site with prerendering should grep WAF logs for crawler User-Agents before assuming GSC error counts represent the full picture.

Approximately 30 days. The first 7 days were dominated by initial cache population for the 500k-page catalog. Days 8 to 21 saw the warming queue catch up with the PIM webhook firing rate (roughly 12k product updates per day). By day 30 the hit rate stabilized at 84% and stayed within a 2-point band thereafter. The remaining 16% misses were genuinely fresh URLs (new SKUs) and TTL expiries during low-traffic windows when proactive warming was deprioritized.

Almost entirely long-tail. Branded queries were stable across the period because branded SERP positions were already strong and not bottlenecked by indexation. The 82,500 newly-indexed product pages absorbed traffic from attribute-specific queries: model number, size and color combinations, technical-spec searches, and specific product comparisons. Average traffic per newly-indexed page was small (5 to 20 monthly sessions), but at 82,500 pages the aggregate moved revenue meaningfully.

The crawl-budget recovery mechanic still applies but the absolute upside is smaller. Sites under 100k pages rarely hit Googlebot's render-budget ceiling unless they have severe DOM consistency or WAF issues. The first diagnostic is the same: pull WAF logs, measure DOM Consistency Score on a sampled 200 URLs. If both are clean, focus on content depth and internal linking before investing in prerendering. Prerendering ROI on smaller catalogs is positive when the issue is rendering-related, not when the issue is content quality or topical authority. ## Related Reading - Cache Warming API and Freshness Signal - WAF Blocking Legitimate Bots: Cloudflare and AWS - DOM Consistency Check - Prerendering Cost Analysis !Raster matrix diagram of operational levers, risks, and validation checks for Case Study: How a 500k-Page E-Commerce Site Recovered 52% of Wasted Crawl Budget.

Editorial trust

Written by prerender Editorial · Engineering Team. We build and run pre-rendering infrastructure for more than 200 engineering teams, which is where the numbers and code samples on this page come from.

Last updated . Editorial scope and review policy: About prerender.info.