Virtual Threads and Structured Concurrency: Where They Help and Where They Hurt
Virtual Threads and Structured Concurrency: Where They Help and Where They Hurt
Virtual threads are not faster threads. They are cheaper threads. A platform thread maps 1:1 to an OS thread, consuming ~1MB of stack memory and requiring a kernel scheduling slot. A virtual thread consumes ~1KB initially, runs on a carrier thread from a shared pool, and unmounts from that carrier thread whenever it blocks on I/O.
This distinction matters. Virtual threads do not make computation faster. They make waiting cheaper. If your workload spends 90% of its time waiting for database responses and HTTP calls, virtual threads let you run 10,000 concurrent tasks on a handful of carrier threads instead of needing 10,000 OS threads. If your workload spends 90% of its time computing, virtual threads provide zero benefit and can make things worse.
The content platform has both workload types. The article fetcher fans out to 20 upstream sources, waits for HTTP responses, and parses the results. Virtual threads turn this from a thread pool sizing problem into a non-problem. The image processor crunches pixel data with zero I/O after the initial file read. Virtual threads add overhead here with no benefit.
The Mechanical Difference
A platform thread:
- Allocated via
pthread_createon Linux - Gets a 1MB stack from the OS (configurable with
-Xss) - Scheduled by the kernel scheduler
- When blocked (I/O,
sleep, lock wait), the OS marks it as not runnable - Context switches involve saving/restoring CPU registers, TLB flushes on some architectures
A virtual thread:
- Allocated as a Java object on the heap
- Gets a ~1KB initial stack that grows as needed (stored as stack chunk objects on the heap)
- Scheduled by a
ForkJoinPoolof carrier threads (sized toNcpuby default) - When blocked on a supported operation, the JVM unmounts the virtual thread from its carrier, freeing the carrier to run another virtual thread
- No OS context switch for mount/unmount. The carrier thread continues running. Only the stack and continuation state change.
The diagram shows two operating modes. The top panel depicts normal virtual thread scheduling: two carrier threads service seven virtual threads by mounting and unmounting them as I/O operations block and complete. Thousands of virtual threads wait in the queue, each consuming only ~1KB. The bottom panel shows the pinning problem: a virtual thread that enters a synchronized block cannot unmount when it blocks on I/O inside that block, holding the carrier thread hostage and starving all other virtual threads waiting for that carrier.
Where Virtual Threads Help
The content platform’s article fetcher retrieves content from 20 upstream RSS feeds, parses each feed, and stores the articles. With platform threads:
// SLOW: Platform threads, limited by thread pool size
public class PlatformArticleFetcher {
private final ExecutorService pool = Executors.newFixedThreadPool(64);
private final HttpClient httpClient = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(5))
.build();
public List<Article> fetchAll(List<String> feedUrls) throws Exception {
List<Future<List<Article>>> futures = new ArrayList<>();
for (String url : feedUrls) {
futures.add(pool.submit(() -> fetchFeed(url)));
}
List<Article> articles = new ArrayList<>();
for (Future<List<Article>> future : futures) {
articles.addAll(future.get(30, TimeUnit.SECONDS));
}
return articles;
}
private List<Article> fetchFeed(String url) throws Exception {
HttpResponse<String> response = httpClient.send(
HttpRequest.newBuilder().uri(URI.create(url)).build(),
HttpResponse.BodyHandlers.ofString()
);
return parseFeed(response.body());
}
private List<Article> parseFeed(String xml) {
// Parse RSS/Atom XML into Article objects
// ~2ms CPU time per feed
return List.of(); // placeholder
}
}
This works for 20 feeds. What happens when the platform grows to 2,000 feeds? The thread pool has 64 threads. Each feed takes 200-500ms of network wait plus 2ms of parsing. At 64 threads with 300ms average wait, the pool processes 64 / 0.302 = 212 feeds per second. Processing 2,000 feeds takes ~9.4 seconds.
Increasing the pool to 512 threads improves throughput but consumes 512MB of stack space and creates kernel scheduling pressure. With virtual threads:
// FAST: Virtual threads, one per feed, no pool sizing needed
public class VirtualArticleFetcher {
private final HttpClient httpClient = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(5))
.build();
public List<Article> fetchAll(List<String> feedUrls) throws Exception {
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
List<Future<List<Article>>> futures = new ArrayList<>();
for (String url : feedUrls) {
futures.add(executor.submit(() -> fetchFeed(url)));
}
List<Article> articles = new ArrayList<>();
for (Future<List<Article>> future : futures) {
articles.addAll(future.get(30, TimeUnit.SECONDS));
}
return articles;
}
}
private List<Article> fetchFeed(String url) throws Exception {
HttpResponse<String> response = httpClient.send(
HttpRequest.newBuilder().uri(URI.create(url)).build(),
HttpResponse.BodyHandlers.ofString()
);
return parseFeed(response.body());
}
private List<Article> parseFeed(String xml) {
return List.of(); // placeholder
}
}
2,000 virtual threads, each consuming ~1KB. Total overhead: ~2MB. All 2,000 feeds fetch concurrently, limited only by network bandwidth and upstream server capacity. On an 8-core machine with 8 carrier threads, the JVM unmounts each virtual thread when httpClient.send() blocks on the socket read, freeing the carrier to run another virtual thread’s parse step.
Result: 2,000 feeds processed in ~500ms (the slowest individual feed response time) instead of 9.4 seconds.
Where Virtual Threads Hurt
CPU-Bound Work
// Virtual threads add overhead for CPU-bound work
@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 3, time = 2)
@Measurement(iterations = 5, time = 5)
@Fork(2)
@OutputTimeUnit(TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class CpuBoundComparison {
private static final int TASKS = 1000;
@Benchmark
public long platformThreads() throws Exception {
try (var executor = Executors.newFixedThreadPool(
Runtime.getRuntime().availableProcessors())) {
List<Future<Long>> futures = new ArrayList<>();
for (int i = 0; i < TASKS; i++) {
futures.add(executor.submit(CpuBoundComparison::cpuWork));
}
long total = 0;
for (var f : futures) {
total += f.get();
}
return total;
}
}
@Benchmark
public long virtualThreads() throws Exception {
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
List<Future<Long>> futures = new ArrayList<>();
for (int i = 0; i < TASKS; i++) {
futures.add(executor.submit(CpuBoundComparison::cpuWork));
}
long total = 0;
for (var f : futures) {
total += f.get();
}
return total;
}
}
private static long cpuWork() {
long hash = 0;
for (int i = 0; i < 1_000_000; i++) {
hash ^= ThreadLocalRandom.current().nextLong();
}
return hash;
}
}
Results on an 8-core machine:
| Implementation | Throughput (tasks/sec) | CPU Utilization |
|---|---|---|
| Platform (8 threads) | 12,400 | 99% |
| Virtual threads | 11,800 | 99% |
Virtual threads are 5% slower. The ForkJoinPool carrier thread scheduler adds overhead compared to a plain ThreadPoolExecutor. For CPU-bound work, the virtual thread abstraction provides no benefit (there is no I/O to unmount on) and adds scheduling cost.
The Pinning Problem
Virtual threads unmount from carrier threads when they block on operations that the JVM recognizes: Thread.sleep, socket I/O, ReentrantLock.lock, LockSupport.park. They do not unmount when blocked inside a synchronized block. This is pinning.
// SLOW: synchronized pins virtual thread to carrier
public class PinnedDatabasePool {
private final List<Connection> connections;
private int nextIndex = 0;
public PinnedDatabasePool(List<Connection> connections) {
this.connections = connections;
}
public ResultSet query(String sql) throws SQLException {
Connection conn;
synchronized (this) { // PINS carrier thread
conn = connections.get(nextIndex);
nextIndex = (nextIndex + 1) % connections.size();
}
return conn.createStatement().executeQuery(sql); // I/O while pinned? No.
// But if the I/O were INSIDE synchronized, carrier is blocked.
}
}
The real danger is synchronized blocks that contain I/O:
// DANGEROUS: synchronized + I/O = carrier thread blocked
public synchronized String fetchWithCache(String key) {
String cached = cache.get(key);
if (cached != null) {
return cached;
}
// This HTTP call blocks the carrier thread, not just the virtual thread
String result = httpClient.send(request, BodyHandlers.ofString()).body();
cache.put(key, result);
return result;
}
When a virtual thread enters a synchronized block, it acquires the monitor. If it then blocks on I/O inside that block, the JVM cannot unmount it because the monitor is an OS-level construct tied to the carrier thread. The carrier thread is blocked for the entire duration of the I/O operation. With 8 carrier threads and 8 pinned virtual threads, all carrier threads are blocked. The remaining 9,992 virtual threads cannot run.
The fix is straightforward: replace synchronized with ReentrantLock.
// FAST: ReentrantLock allows unmounting during I/O
public class UnpinnedCachedFetcher {
private final ReentrantLock lock = new ReentrantLock();
private final Map<String, String> cache = new HashMap<>();
private final HttpClient httpClient = HttpClient.newHttpClient();
public String fetchWithCache(String key) throws Exception {
lock.lock();
try {
String cached = cache.get(key);
if (cached != null) {
return cached;
}
// Virtual thread unmounts here, carrier is freed
String result = httpClient.send(
HttpRequest.newBuilder()
.uri(URI.create("https://api.example.com/" + key))
.build(),
HttpResponse.BodyHandlers.ofString()
).body();
cache.put(key, result);
return result;
} finally {
lock.unlock();
}
}
}
ReentrantLock uses LockSupport.park internally, which the virtual thread scheduler recognizes. When the virtual thread blocks on httpClient.send(), it unmounts from the carrier, which then runs another virtual thread.
Detect pinning with:
# JDK Flight Recorder event for pinned virtual threads
java -Djdk.tracePinnedThreads=short -jar app.jar
This prints a stack trace every time a virtual thread is pinned for more than a threshold duration. In production, use JFR:
java -XX:StartFlightRecording=filename=recording.jfr,settings=profile -jar app.jar
Look for jdk.VirtualThreadPinned events in the recording.
Thread-Local Variables and Virtual Threads
Thread-local variables interact poorly with virtual threads. A ThreadLocal is scoped to the carrier thread, not the virtual thread. When a virtual thread unmounts and remounts on a different carrier, it sees a different ThreadLocal value.
Use ScopedValue instead (preview API):
// SLOW: ThreadLocal leaks across virtual thread mount/unmount
private static final ThreadLocal<RequestContext> CONTEXT =
new ThreadLocal<>();
// FAST: ScopedValue is virtual-thread-aware
private static final ScopedValue<RequestContext> CONTEXT =
ScopedValue.newInstance();
public void handleRequest(RequestContext ctx) {
ScopedValue.runWhere(CONTEXT, ctx, () -> {
processRequest();
});
}
private void processRequest() {
RequestContext ctx = CONTEXT.get(); // Always sees the correct value
// ...
}
ScopedValue binds a value for the duration of a Runnable and all tasks it spawns. It is immutable within that scope, which makes it safe for virtual threads that may unmount and remount on different carriers.
Performance Guidelines for Virtual Threads
Use virtual threads when:
- Tasks are I/O-bound (wait/compute ratio > 3:1)
- You need high concurrency (hundreds or thousands of concurrent operations)
- Thread pool sizing has been a recurring problem
- You are doing fan-out operations (fetch from multiple sources concurrently)
Do not use virtual threads when:
- Tasks are CPU-bound (wait/compute ratio < 1:1)
- You use
synchronizedblocks that contain I/O (refactor toReentrantLockfirst) - You rely on
ThreadLocalfor request-scoped state (migrate toScopedValue) - You use native code via JNI inside blocking sections (native frames cannot be unmounted)
Do not:
- Pool virtual threads.
Executors.newVirtualThreadPerTaskExecutor()creates one per task. Pooling defeats the purpose. - Use
Thread.sleepfor rate limiting. It works, butSemaphoreis more explicit about intent and plays well with structured concurrency. - Assume virtual threads fix thread-safety bugs. They share the same memory model as platform threads. Data races are still data races.