TinyLoRA: Achieving 91.8% GSM8K Accuracy with Only 13 Parameters
These articles are AI-generated summaries. Please check the original sources for full details.
This AI Paper Introduces TinyLoRA, A 13-Parameter Fine-Tuning Method That Reaches 91.8 Percent GSM8K on Qwen2.5-7B
Researchers from FAIR, Cornell, and CMU have unveiled TinyLoRA, a method that scales model updates down to as little as a single parameter. Using this approach, a Qwen2.5-7B-Instruct model achieved 91.8% accuracy on the GSM8K benchmark with a total update size of only 26 bytes.
Why This Matters
Standard LoRA implementations for models like Llama3-8B have a minimum update size of approximately 3 million parameters, creating a significant lower bound for on-device or ultra-low-bandwidth adaptation. TinyLoRA breaks this constraint by using random projections and weight tying, demonstrating that LLMs can be programmed with extreme precision.
This research proves that reinforcement learning is fundamentally more efficient than supervised finetuning in low-capacity regimes. While SFT forces models to absorb stylistic noise, RL provides a sparser, reward-focused signal that allows models to learn complex reasoning with updates 1,000 times smaller than traditional methods.
Key Insights
- TinyLoRA circumvents standard LoRA scaling limits by projecting a low-dimensional trainable vector through a fixed random tensor (FAIR/Cornell/CMU, 2026).
- Reinforcement Learning (RL) via Group Relative Policy Optimization (GRPO) is 100 to 1,000 times more efficient than SFT in low-parameter regimes.
- A frozen SVD rank of r=2 was found to be optimal for tiny updates, as higher ranks introduce too many degrees of freedom for micro-vectors.
- Tiling, where nearby modules of similar depth share parameters, outperformed structured sharing based on specific module types like Query or Key.
- In bit-constrained regimes, storing parameters in fp32 precision is more bit-efficient than using half-precision formats like bf16 or fp16.
Practical Applications
- Use Case: Adapting Qwen2.5-7B for math reasoning using 13 parameters to minimize storage footprint. Pitfall: Using SFT instead of RL, which requires 100x more parameters to reach equivalent performance.
- Use Case: Ultra-low-bandwidth on-device model personalization via TinyLoRA vectors. Pitfall: Setting frozen rank higher than r=2, which complicates the optimization of the extremely small trainable vector.
- Use Case: Trillion-scale model ‘programming’ where complex tasks are tuned using just a few bytes. Pitfall: Relying on structured parameter sharing which is less effective than depth-based tiling.
References:
Continue reading
Next article
7 Steps to Mastering Memory in Agentic AI Systems
Related Content
FunctionGemma: Google AI’s 270M Parameter Function Calling Specialist for Edge Workloads
Google released FunctionGemma, a compact 270M parameter model achieving 85% accuracy on the Mobile Actions benchmark after fine-tuning.
Liquid AI Releases LFM2-ColBERT-350M: A Compact Late Interaction Model for Multilingual Cross-Lingual Retrieval
Liquid AI introduces LFM2-ColBERT-350M, a 350M-parameter late interaction retriever optimized for multilingual and cross-lingual search, offering high accuracy and fast inference speeds.
Google Health AI Releases MedASR: A Conformer-Based Medical Speech-to-Text Model
Google released MedASR, a 105M parameter medical speech-to-text model, achieving up to 4.6% word error rate in radiology dictation with a language model.