WebPageTest and Real User Monitoring Setup

The Symptom

The team runs Lighthouse locally, sees a score of 94, and ships. Two weeks later, the product manager reports that conversion dropped on the checkout flow. A/B testing rules out content changes. The field data, once someone thinks to check it, shows p75 LCP degraded from 3.2s to 4.8s. The regression was invisible in lab data because the new code path only affected users on slower connections loading larger product catalogs.

The Cause

Lab tools test a fixed set of conditions. A single device profile, a single network speed, a single geographic location, a single state of the application’s data. Real traffic includes users on 3G connections in rural areas, users on congested Wi-Fi in airports, users with 200 items in their shopping cart, and users whose browser extensions inject additional JavaScript.

No lab test covers all of these. Field data covers all of them by definition, because it is the actual measurement from actual users.

The gap between lab and field widens as the application grows. Early-stage products with small bundles and simple pages see lab and field data converge. The e-commerce platform with a 420KB JavaScript bundle, 47 product images on listing pages, and 3 third-party analytics scripts sees lab and field diverge by 2-3x on LCP.

The Baseline

WebPageTest results for the product listing page from Dulles, Virginia, Moto G Power, 4G LTE:

Metric	First View	Repeat View
TTFB	680ms	320ms
Start Render	2.1s	1.4s
LCP	4.2s	2.1s
Total Blocking Time	1,800ms	420ms
CLS	0.18	0.02

The “First View” column is the one that matters for new visitors and search engine crawlers. “Repeat View” benefits from cached resources and shows what returning users experience, but optimizing only for repeat views ignores every new visitor.

The Fix: WebPageTest Scripted Tests

WebPageTest supports scripted tests that navigate through multi-page flows. For the e-commerce platform, a scripted test that covers the critical user journey:

// WebPageTest script for checkout flow
setEventName Homepage
navigate https://store.example.com/

setEventName ProductListing
navigate https://store.example.com/category/electronics

setEventName ProductDetail
execAndWait document.querySelector('[data-product-id="SKU-001"]').click()

setEventName AddToCart
execAndWait document.querySelector('[data-action="add-to-cart"]').click()

setEventName Checkout
navigate https://store.example.com/checkout

Each setEventName creates a separate measurement in the results. You get individual waterfalls, filmstrips, and Core Web Vitals for each step. The “ProductDetail” step reveals whether the product image is the LCP element and how long it takes to render. The “Checkout” step reveals INP on form interactions.

Run this script from multiple locations (Virginia, Frankfurt, Mumbai, Sydney) to see how geographic distance from your origin server affects TTFB. The e-commerce platform showed:

Location	TTFB	LCP
Virginia	680ms	4.2s
Frankfurt	420ms	3.6s
Mumbai	1,200ms	6.1s
Sydney	1,400ms	6.8s

Mumbai and Sydney users experience nearly double the LCP of Virginia users. The TTFB difference accounts for most of this gap. A CDN (Chapter 8) eliminates the geographic TTFB penalty. Without one, optimizing JavaScript and images saves a fixed amount of time that gets added to a variable and large TTFB.

The Fix: RUM Pipeline

The web-vitals library from Chapter 1 sends individual metric reports. A production RUM pipeline aggregates these into percentile distributions.

// Server-side aggregation endpoint
import type { Request, Response } from "express";

interface VitalPayload {
  name: "LCP" | "INP" | "CLS";
  value: number;
  rating: "good" | "needs-improvement" | "poor";
  navigationType: string;
  pathname: string;
  connectionType: string;
  deviceMemory: number;
}

interface MetricBucket {
  values: number[];
  good: number;
  needsImprovement: number;
  poor: number;
}

const metrics = new Map<string, MetricBucket>();

function recordMetric(payload: VitalPayload): void {
  const key = `${payload.name}:${payload.pathname}`;
  const bucket = metrics.get(key) ?? {
    values: [],
    good: 0,
    needsImprovement: 0,
    poor: 0,
  };

  bucket.values.push(payload.value);
  bucket[
    payload.rating === "needs-improvement" ? "needsImprovement" : payload.rating
  ]++;
  metrics.set(key, bucket);
}

function getPercentile(values: number[], percentile: number): number {
  const sorted = [...values].sort((a, b) => a - b);
  const index = Math.ceil((percentile / 100) * sorted.length) - 1;
  return sorted[Math.max(0, index)];
}

function getMetricSummary(
  metricName: string,
  pathname: string,
): { p50: number; p75: number; p90: number; sampleCount: number } | null {
  const key = `${metricName}:${pathname}`;
  const bucket = metrics.get(key);
  if (!bucket || bucket.values.length < 100) return null;

  return {
    p50: getPercentile(bucket.values, 50),
    p75: getPercentile(bucket.values, 75),
    p90: getPercentile(bucket.values, 90),
    sampleCount: bucket.values.length,
  };
}

The client-side collection adds device context that CrUX does not provide:

// Enhanced client-side collection
import { onLCP, onINP, onCLS, type Metric } from "web-vitals";

interface EnhancedVitalReport {
  name: string;
  value: number;
  rating: string;
  pathname: string;
  connectionType: string;
  deviceMemory: number;
  hardwareConcurrency: number;
  userAgent: string;
}

function getConnectionType(): string {
  const nav = navigator as Navigator & {
    connection?: { effectiveType?: string };
  };
  return nav.connection?.effectiveType ?? "unknown";
}

function getDeviceMemory(): number {
  const nav = navigator as Navigator & { deviceMemory?: number };
  return nav.deviceMemory ?? -1;
}

function reportVital(metric: Metric): void {
  const report: EnhancedVitalReport = {
    name: metric.name,
    value: metric.value,
    rating: metric.rating,
    pathname: window.location.pathname,
    connectionType: getConnectionType(),
    deviceMemory: getDeviceMemory(),
    hardwareConcurrency: navigator.hardwareConcurrency ?? -1,
    userAgent: navigator.userAgent,
  };

  navigator.sendBeacon("/api/vitals", JSON.stringify(report));
}

onLCP(reportVital);
onINP(reportVital);
onCLS(reportVital);

The connectionType and deviceMemory fields allow segmenting field data by device class. On the e-commerce platform, segmenting by device memory revealed:

Device Memory	p75 LCP	p75 INP	Sample %
≤ 4GB	5.2s	420ms	38%
4-8GB	3.1s	210ms	35%
> 8GB	1.8s	95ms	27%

38% of users are on devices with 4GB or less RAM. These users experience an LCP more than double that of high-end devices. Optimizations that improve LCP by 500ms on an 8GB device might improve it by 1,200ms on a 4GB device, because the bottleneck shifts from network to CPU parsing time.

The Proof

After deploying the RUM pipeline and collecting two weeks of data from 180,000 page loads:

Identified that 42% of poor LCP scores came from three product listing pages with unoptimized hero images (addressed in Chapter 4).
Identified that 68% of poor INP scores came from the checkout page coupon application (addressed in CH1-S1).
Identified that 91% of poor CLS scores came from font swap on first load (addressed in Chapter 4).
Identified that users in South Asia (14% of traffic) experienced 2.3x worse TTFB than users in Europe, leading to the CDN deployment decision (Chapter 8).

None of these insights were available from Lighthouse scores alone. The RUM pipeline cost: 1.8KB of additional JavaScript, ~200 bytes per beacon payload, and a lightweight aggregation service that processes ~50 requests per second on a single small instance.

The Trade-off

RUM data requires traffic volume. A page with 10 daily visitors does not produce statistically meaningful p75 metrics. The web-vitals library documentation recommends a minimum of 200 samples for stable percentile computation. For low-traffic pages, lab data from WebPageTest with realistic device and network profiles is the fallback, with the understanding that it underestimates real-world variance.

The RUM beacon adds a network request on every page load and every qualifying interaction. On mobile connections, this is an additional ~200 bytes competing for bandwidth. In practice, sendBeacon uses a low-priority queue and does not affect page load metrics. But testing this assumption on your own traffic with a controlled A/B experiment is the responsible approach. On the e-commerce platform, the A/B test showed no statistically significant difference in LCP or INP between the instrumented and uninstrumented groups across 50,000 sessions.

Privacy considerations: the userAgent and deviceMemory fields constitute device fingerprinting data. Your privacy policy must disclose this collection, and the data should be aggregated and anonymized within 30 days. The performance benefit of device segmentation does not override regulatory compliance.