Skip to main content

On This Page

MaxToki: A 1B-Parameter Temporal Foundation Model for Cellular Aging Trajectories

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Meet MaxToki: The AI That Predicts How Your Cells Age — and What to Do About It

MaxToki is a transformer decoder model designed to predict temporal shifts in gene network states across the human lifespan. It was trained on nearly 1 trillion gene tokens across 175 million single-cell transcriptomes to overcome the “snapshot” limitation of current biological foundation models.

Why This Matters

Most foundation models in biology treat single-cell transcriptomes as frozen snapshots, failing to account for the slow, progressive shifts in gene network states that drive age-related diseases like Alzheimer’s and pulmonary fibrosis over decades. This technical blind spot prevents researchers from identifying where a cell is headed rather than just its current state.

MaxToki addresses this by implementing a temporal prompting strategy and continuous numerical tokenization, allowing the model to reason across trajectories. By scaling to 1 billion parameters and utilizing RoPE-based context extension to 16,384 tokens, it achieves a median prediction error of 87 months for held-out ages, nearly doubling the accuracy of linear regression baselines.

Key Insights

  • Fact: MaxToki reduced the median prediction error for held-out cellular ages to 87 months, compared to 178 months for standard SGDRegressor baselines (2026).
  • Concept: Rank value encoding orders genes by relative expression within a cell to amplify transcription factors and reduce technical batch effects.
  • Tool: FlashAttention-2 via the NVIDIA BioNeMo stack enabled a 5x improvement in training throughput on H100 80GB GPUs.
  • Fact: The model inferred 15 years of age acceleration in lung fibroblasts from patients with pulmonary fibrosis, despite being trained only on healthy donors.
  • Concept: In-context learning allows the model to infer trajectory context (cell type and gender) from cell states without explicit labels.

Practical Applications

  • In Silico Screening for Longevity: Researchers used MaxToki to nominate pro-aging drivers in cardiac cells, which were later validated in vivo to cause measurable cardiac dysfunction in mice. Pitfall: Relying on raw transcript counts instead of rank encoding, which biases models toward ubiquitous housekeeping genes.
  • Alzheimer’s Pathology Analysis: Distinguishing between symptomatic Alzheimer’s patients (showing 3-year age acceleration in microglia) and resilient individuals who show no acceleration. Pitfall: Treating timelapses as discrete categories rather than a numerical continuum, which significantly degrades prediction accuracy.

References:

Continue reading

Next article

Robust LLM Response Parsing in DataWeave: Eliminating Production Crashes

Related Content