TinyLoRA: Achieving 91.8% GSM8K Accuracy with Only 13 Parameters

This AI Paper Introduces TinyLoRA, A 13-Parameter Fine-Tuning Method That Reaches 91.8 Percent GSM8K on Qwen2.5-7B

Researchers from FAIR, Cornell, and CMU have unveiled TinyLoRA, a method that scales model updates down to as little as a single parameter. Using this approach, a Qwen2.5-7B-Instruct model achieved 91.8% accuracy on the GSM8K benchmark with a total update size of only 26 bytes.

Why This Matters

Standard LoRA implementations for models like Llama3-8B have a minimum update size of approximately 3 million parameters, creating a significant lower bound for on-device or ultra-low-bandwidth adaptation. TinyLoRA breaks this constraint by using random projections and weight tying, demonstrating that LLMs can be programmed with extreme precision.

This research proves that reinforcement learning is fundamentally more efficient than supervised finetuning in low-capacity regimes. While SFT forces models to absorb stylistic noise, RL provides a sparser, reward-focused signal that allows models to learn complex reasoning with updates 1,000 times smaller than traditional methods.

Key Insights

TinyLoRA circumvents standard LoRA scaling limits by projecting a low-dimensional trainable vector through a fixed random tensor (FAIR/Cornell/CMU, 2026).
Reinforcement Learning (RL) via Group Relative Policy Optimization (GRPO) is 100 to 1,000 times more efficient than SFT in low-parameter regimes.
A frozen SVD rank of r=2 was found to be optimal for tiny updates, as higher ranks introduce too many degrees of freedom for micro-vectors.
Tiling, where nearby modules of similar depth share parameters, outperformed structured sharing based on specific module types like Query or Key.
In bit-constrained regimes, storing parameters in fp32 precision is more bit-efficient than using half-precision formats like bf16 or fp16.

Practical Applications

Use Case: Adapting Qwen2.5-7B for math reasoning using 13 parameters to minimize storage footprint. Pitfall: Using SFT instead of RL, which requires 100x more parameters to reach equivalent performance.
Use Case: Ultra-low-bandwidth on-device model personalization via TinyLoRA vectors. Pitfall: Setting frozen rank higher than r=2, which complicates the optimization of the extremely small trainable vector.
Use Case: Trillion-scale model ‘programming’ where complex tasks are tuned using just a few bytes. Pitfall: Relying on structured parameter sharing which is less effective than depth-based tiling.

References:

https://www.marktechpost.com/2026/03/24/this-ai-paper-introduces-tinylora-a-13-parameter-fine-tuning-method-that-reaches-91-8-percent-gsm8k-on-qwen2-5-7b/

On This Page

This AI Paper Introduces TinyLoRA, A 13-Parameter Fine-Tuning Method That Reaches 91.8 Percent GSM8K on Qwen2.5-7B

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

FunctionGemma: Google AI’s 270M Parameter Function Calling Specialist for Edge Workloads

Liquid AI Releases LFM2-ColBERT-350M: A Compact Late Interaction Model for Multilingual Cross-Lingual Retrieval

Google Health AI Releases MedASR: A Conformer-Based Medical Speech-to-Text Model