TornadoVM 2.0 Brings Automatic GPU Acceleration and LLM support to Java
These articles are AI-generated summaries. Please check the original sources for full details.
TornadoVM 2.0: Heterogeneous Hardware Runtime for Java
The TornadoVM project has released version 2.0, an open-source runtime designed to automatically accelerate Java programs on CPUs, GPUs, and FPGAs. This release is especially relevant for developers building Large Language Model (LLM) solutions on the Java Virtual Machine (JVM).
While existing JVMs excel at portability and safety, they often struggle to fully utilize the potential of heterogeneous hardware. TornadoVM bridges this gap by offloading Java code to accelerators, managing memory transfers, and executing compute kernels, enabling significant performance gains for suitable workloads and reducing the cost of compute-intensive tasks.
Key Insights
- Runtime Compilation: TornadoVM acts as a Just-In-Time (JIT) compiler, translating Java bytecode to OpenCL C, NVIDIA CUDA PTX, or SPIR-V binary.
- Parallelism Models: Offers both a simple Loop Parallel API using annotations (@Parallel, @Reduce) and a more explicit Kernel API for GPU-style programming.
- LLM Inference Library: Includes GPULlama3.java, a pure Java library for LLM inference on GPUs, removing external dependencies and simplifying setup.
Working Example
public static void vectorMul(FloatArray a, FloatArray b, FloatArray result) {
for (@Parallel int i = 0; i < result.getSize(); i++) {
result.set(i, a.get(i) * b.get(i));
}
}
var taskGraph = new TaskGraph("multiply")
.transferToDevice(DataTransferMode.FIRST_EXECUTION, a, b)
.task("vectorMul", Example::vectorMul, a, b, result)
.transferToHost(DataTransferMode.EVERY_EXECUTION, result);
var snapshot = taskGraph.snapshot();
new TornadoExecutionPlan(snapshot).execute();
Practical Applications
- LLM Inference: GPULlama3.java enables running LLMs like Llama 3 and Qwen3 directly within Java applications on GPUs.
- Pitfall: Workloads without loop dependencies may not benefit from TornadoVM’s acceleration; careful analysis of code structure is required.
References:
Continue reading
Next article
Amazon Exposes Years-Long GRU Cyber Campaign Targeting Energy and Cloud Infrastructure
Related Content
Jlama: Running LLMs Locally in Java
Jlama 0.8.4 enables local LLM inference in Java, eliminating reliance on external APIs and offering greater control.
LightSeek Foundation Releases TokenSpeed: An Open-Source Inference Engine for Agentic AI
LightSeek Foundation's TokenSpeed is an open-source LLM inference engine that outperforms TensorRT-LLM by 11% in throughput on NVIDIA B200 GPUs for agentic coding workloads.
NVIDIA Releases cuda-oxide: A Native Rust-to-PTX Compiler for SIMT GPU Kernels
NVIDIA AI researchers released cuda-oxide, an experimental Rust-to-CUDA compiler backend that compiles SIMT GPU kernels directly to PTX, achieving 868 TFLOPS on B200 GPUs.