Building the k6 Baseline for the Telemetry Platform

The Symptom

The team knows the system is slow under load. “Slow” is not actionable. Slow where? Slow for whom? Slow compared to what? Without a repeatable load test that models real traffic patterns, performance conversations devolve into anecdotes about that one incident last month.

The Cause

No load test exists. Performance is measured by the absence of alerts. When alerts fire, the team profiles one request in a development environment with 500 rows in the database, finds nothing wrong, and closes the ticket. The production database has 280 million documents. The dev environment has never simulated more than 5 concurrent connections.

The Benchmark

This section builds the complete k6 test suite that every chapter in this book references.

The Traffic Model

Real traffic to the telemetry platform is not uniform. During peak hours:

70% sensor ingestion traffic (high-throughput writes, 1-5 KB payloads)
20% activity feed reads (paginated queries with user-specific filters)
10% aggregation queries (time-windowed analytics, heavy on CPU and memory)

Sensors push data every second. They are relentless. Activity feed reads come in bursts when users open the dashboard. Aggregation queries are infrequent but expensive, scanning millions of documents when unoptimized.

Complete k6 Test Suite

// k6/scenarios/sensor-ingestion.js
import http from 'k6/http';
import { check } from 'k6';
import { Trend, Rate, Counter } from 'k6/metrics';

const ingestDuration = new Trend('ingest_duration', true);
const ingestErrors = new Rate('ingest_errors');
const ingestCount = new Counter('ingest_total');

const SENSOR_IDS = Array.from({length: 10000}, (_, i) => `sensor-${String(i).padStart(5, '0')}`);
const BASE_URL = __ENV.BASE_URL || 'http://localhost:8080';

export function ingestBatch() {
  const sensorId = SENSOR_IDS[Math.floor(Math.random() * SENSOR_IDS.length)];
  const now = Date.now();

  const payload = JSON.stringify({
    sensorId: sensorId,
    timestamp: new Date(now).toISOString(),
    readings: {
      temperature: 18 + Math.random() * 20,
      humidity: 30 + Math.random() * 40,
      pressure: 1005 + Math.random() * 25,
      co2: 400 + Math.random() * 200,
      batteryLevel: 20 + Math.random() * 80,
    },
    metadata: {
      firmwareVersion: '2.4.1',
      signalStrength: -40 - Math.random() * 50,
    },
  });

  const res = http.post(`${BASE_URL}/api/telemetry/ingest`, payload, {
    headers: { 'Content-Type': 'application/json' },
    tags: { name: 'ingest' },
  });

  ingestDuration.add(res.timings.duration);
  ingestErrors.add(res.status !== 201);
  ingestCount.add(1);

  check(res, {
    'ingest returns 201': (r) => r.status === 201,
    'ingest under 500ms': (r) => r.timings.duration < 500,
  });
}

export function readActivityFeed() {
  const userId = `user-${Math.floor(Math.random() * 5000)}`;
  const page = Math.floor(Math.random() * 5);

  const res = http.get(
    `${BASE_URL}/api/activity/${userId}?limit=50&skip=${page * 50}`,
    { tags: { name: 'activity_feed' } }
  );

  check(res, {
    'feed returns 200': (r) => r.status === 200,
    'feed has results': (r) => {
      try { return JSON.parse(r.body).length > 0; }
      catch { return false; }
    },
  });
}

export function runTimeWindowAggregation() {
  const sensorId = SENSOR_IDS[Math.floor(Math.random() * 100)];
  const windows = ['1h', '6h', '24h'];
  const window = windows[Math.floor(Math.random() * windows.length)];

  const res = http.get(
    `${BASE_URL}/api/telemetry/stats/${sensorId}?window=${window}`,
    { tags: { name: 'aggregation' } }
  );

  check(res, {
    'aggregation returns 200': (r) => r.status === 200,
    'aggregation under 2s': (r) => r.timings.duration < 2000,
  });
}

The Orchestrator

// k6/baseline.js
import { ingestBatch, readActivityFeed, runTimeWindowAggregation }
  from './scenarios/sensor-ingestion.js';

export { ingestBatch, readActivityFeed, runTimeWindowAggregation };

export const options = {
  scenarios: {
    sensor_writes: {
      executor: 'constant-arrival-rate',
      rate: 1000,
      timeUnit: '1s',
      duration: '5m',
      preAllocatedVUs: 200,
      maxVUs: 500,
      exec: 'ingestBatch',
    },
    feed_reads: {
      executor: 'constant-arrival-rate',
      rate: 200,
      timeUnit: '1s',
      duration: '5m',
      preAllocatedVUs: 50,
      maxVUs: 100,
      exec: 'readActivityFeed',
    },
    aggregations: {
      executor: 'constant-arrival-rate',
      rate: 20,
      timeUnit: '1s',
      duration: '5m',
      preAllocatedVUs: 10,
      maxVUs: 30,
      exec: 'runTimeWindowAggregation',
    },
  },
  thresholds: {
    'ingest_duration': ['p(95)<500', 'p(99)<2000'],
    'ingest_errors': ['rate<0.01'],
    'http_req_duration{name:activity_feed}': ['p(95)<200', 'p(99)<1000'],
    'http_req_duration{name:aggregation}': ['p(95)<1000', 'p(99)<5000'],
  },
};

Running the Baseline

# Start the platform
docker compose up -d mongodb app

# Seed test data: 10M telemetry documents
java -jar tools/data-seeder.jar \
  --mongodb-uri="mongodb://localhost:27017/telemetry" \
  --sensors=10000 \
  --readings-per-sensor=1000

# Run the baseline
k6 run --out json=results/baseline-$(date +%Y%m%d).json k6/baseline.js

The Proof

The baseline run produces:

scenarios: (100.00%) 3 scenarios, 630 max VUs, 5m30s max duration

     ✓ ingest returns 201........: 99.58%
     ✓ ingest under 500ms........: 95.12%
     ✓ feed returns 200..........: 99.91%
     ✓ feed has results..........: 98.74%
     ✓ aggregation returns 200...: 99.23%
     ✗ aggregation under 2s......: 87.44%

     ingest_duration..........: avg=67ms   p(50)=12ms  p(95)=420ms  p(99)=3800ms
     ingest_errors............: 0.42%
     ingest_total.............: 300000

     http_req_duration........: avg=52ms   p(50)=9ms   p(95)=280ms  p(99)=1890ms
       { name:activity_feed }.: avg=45ms   p(50)=8ms   p(95)=180ms  p(99)=1200ms
       { name:aggregation }...: avg=890ms  p(50)=340ms p(95)=2800ms p(99)=6200ms

     http_reqs................: 366000  1220/s
     vus_max..................: 512

Three problems are visible:

Ingestion p99 (3,800ms) is 316x slower than p50 (12ms). Something catastrophic happens to 1% of writes.
Activity feed p99 (1,200ms) is 150x slower than p50 (8ms). Pagination queries degrade under concurrent load.
Aggregation p95 (2,800ms) fails the 2-second threshold. 12.5% of aggregation queries are unacceptably slow.

These numbers are the baseline. Every chapter that follows attacks one of these contributors. The fix is proven when these numbers improve.

The Trade-off

Running k6 at constant-arrival-rate with 1,000 requests per second requires significant client-side resources. The k6 process itself consumes 2-4 GB of RAM at 500 VUs. Run load tests from a separate machine, never from the same host as the application under test. Network latency between the load generator and the target introduces measurement noise; keep them in the same network segment or account for the network overhead in your thresholds.