Optimizing Multi-Subnet Kubernetes Networking with Tailscale and Cilium eBPF

An Iximiuz Cluster of Clusters with Tailscale and Cilium

Engineer Adam Leskis architected a multi-subnet Kubernetes cluster using Tailscale for cross-playground connectivity and Cilium for eBPF-based observability. The project integrated 12 nodes across 5 different subnets to test the limits of overlay-on-overlay networking.

Why This Matters

The project exposes the performance degradation inherent in nested overlay networks, specifically Cilium VXLANs encapsulated within Tailscale tunnels. Real-world constraints, such as shared public IPs across different subnets, forced Tailscale into relayed connections, introducing enough jitter to break KEDA scaling and metrics collection. This demonstrates that while flat network abstractions are powerful, the underlying transport layer’s latency and jitter can invalidate high-level orchestration features like HPA and service mesh telemetry.

Key Insights

Cilium’s eBPF-based service mesh was selected over Istio or Envoy to minimize resource usage on iximiuz Labs VMs limited to 4CPU and 8GiB RAM.
Nested overlay networking (VXLAN over Tailscale) across 5 subnets caused intermittent connectivity failures due to relayed Tailscale connections and high jitter.
The hubble-gazer tool was developed using Go and React to consume Hubble-Observatory data via Server-Sent Events (SSE) rather than WebSockets for live traffic visualization.
Network stability was only achieved by consolidating all worker nodes into a single subnet, restricting Tailscale to control-plane-to-worker communication.
The environment utilized iximiuz Labs playgrounds, which impose an 8-hour maximum lifetime, requiring automated scripting for cluster bootstrapping.

Practical Applications

Monitoring L4/L7 traffic and DNS queries in real-time using Cilium Hubble and a custom SSE-based frontend like hubble-gazer.
Pitfall: Implementing double-encapsulation (VXLAN over WireGuard/Tailscale) in high-latency environments leads to metric server timeouts and HPA failures.
Deploying a distributed K8s control plane across disparate networks using Tailscale hostnames for node addressing.
Pitfall: Relying on relayed Tailscale connections across overlapping public IP spaces, which introduces excessive latency for cluster-internal traffic.

References:

https://dev.to/lpmi13/an-iximiuz-cluster-of-clusters-with-tailscale-and-cilium-43d4

On This Page

An Iximiuz Cluster of Clusters with Tailscale and Cilium

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

Building Gigawatt-Scale AI Clusters with Backend Aggregation

Optimizing AKS Deployments via Centralized Azure DevOps YAML Templates

Mastering Kubernetes via Homelab: A Cost-Effective Setup Guide