Large Language Model
54 articles in this category (Page 1 of 3)
Sakana AI and NVIDIA Introduce TwELL: 20.5% Faster LLM Inference via Unstructured Sparsity
Sakana AI and NVIDIA introduced TwELL and custom CUDA kernels, achieving 20.5% inference and 21.9% training speedups in LLMs by exploiting activation sparsity.
Mastering LLM Distillation: Soft-Label, Hard-Label, and Co-distillation Strategies
LLM distillation uses teacher-student models to transfer reasoning capabilities, reducing costs while maintaining performance through techniques like soft-label and co-distillation.
Anthropic Introduces Natural Language Autoencoders to Decode Claude's Internal Activations
Anthropic’s Natural Language Autoencoders (NLAs) convert model activations into readable text, detecting evaluation awareness in up to 26% of benchmark transcripts.
NVIDIA NeMo RL Accelerates LLM Post-Training with Lossless Speculative Decoding
NVIDIA Research integrates speculative decoding into NeMo RL v0.6.0, achieving a 1.8x rollout generation speedup at 8B scale and projecting a 2.5x end-to-end training speedup for 235B models.
Talkie-1930: A 13B Vintage LLM Trained Exclusively on Pre-1931 Data
Researchers released Talkie-1930, a 13B parameter open-weight LLM trained on 260 billion tokens of pre-1931 text to eliminate benchmark contamination and research historical reasoning.
How to Build a Fully Searchable AI Knowledge Base with OpenKB, OpenRouter, and Llama
Learn to build a local AI knowledge base using OpenKB and Llama 3.3, featuring automated wiki synthesis and programmatic graph analysis for structured information retrieval.
DeepSeek-V4: 1M-Token Contexts via Compressed Sparse Attention and Hybrid Architecture
DeepSeek-AI releases DeepSeek-V4, featuring hybrid CSA/HCA attention that reduces KV cache size to 10% of previous models while supporting one-million-token contexts.
MiniMax M2.7: Open-Source Self-Evolving Model Matches GPT-5.3-Codex on SWE-Pro
MiniMax open-sources M2.7, a self-evolving MoE model achieving 56.22% on SWE-Pro and 57.0% on Terminal Bench 2, matching GPT-5.3-Codex in production-level software engineering.
Alibaba Releases Qwen3.5-Omni: A Native Multimodal Model for Real-Time Audio and Video Interaction
Alibaba Qwen Team unveils Qwen3.5-Omni, a native multimodal model achieving SOTA results on 215 subtasks while supporting 256k long-context audio-visual inputs.