InstaDeep Introduces Nucleotide Transformer v3 (NTv3): A New Multi-Species Genomics Foundation Model

Nucleotide Transformer v3 (NTv3): A New Multi-Species Genomics Foundation Model

InstaDeep has released Nucleotide Transformer v3 (NTv3), a new multi-species genomics foundation model capable of processing 1 Mb genomic windows at single nucleotide resolution. The model unifies representation learning, functional track prediction, genome annotation, and controllable sequence generation within a single architecture.

Current genomic models struggle to connect local genetic motifs with large-scale regulatory context across multiple organisms, hindering accurate predictions and design. Existing methods often lack the scale to capture long-range dependencies, leading to reduced predictive power and increased experimental validation costs.

Key Insights

9 trillion base pairs: NTv3 is pre-trained on this amount of data from the OpenGenome2 resource.
U-Net architecture: Enables processing of very long genomic windows while maintaining single-base resolution.
Masked diffusion language modeling: Allows for controllable sequence generation, validated through STARR-seq assays with 2x improved promoter specificity.

Working Example

# Example of tokenizing a sequence with NTv3's tokenizer
# Note: This is a conceptual example, actual implementation
# requires loading the NTv3 tokenizer and model.

sequence = "ATGCGTAGCTAGCTAGCT"
tokens = list(sequence)  # Character-level tokenization
# Add special tokens like <bbox>, <cls>, <mask>, etc. as needed
tokens.append("<bbox>")
tokens.append("<cls>")

print(tokens)
# Expected output (example): ['A', 'T', 'G', 'C', 'G', 'T', 'A', 'G', 'C', 'T', 'A', 'G', 'C', 'T', 'A', 'G', 'C', 'T', '<bbox>', '<cls>']

Practical Applications

Drug Discovery: Designing enhancers to improve gene expression for therapeutic targets.
Pitfall: Relying on single-species models can lead to poor generalization and inaccurate predictions when applying findings to different organisms.

References:

https://www.marktechpost.com/2025/12/23/instadeep-introduces-nucleotide-transformer-v3-ntv3-a-new-multi-species-genomics-foundation-model-designed-for-1-mb-context-lengths-at-single-nucleotide-esolution/

On This Page

Nucleotide Transformer v3 (NTv3): A New Multi-Species Genomics Foundation Model

Key Insights

Working Example

Practical Applications

Continue reading

Related Content

Meta Releases TRIBE v2: A Tri-Modal Foundation Model for High-Resolution fMRI Prediction

Generalist AI Introduces GEN-θ: A New Era of Embodied Foundation Models for Robotics

Cisco Released Cisco Time Series Model: Their First Open-Weights Foundation Model based on Decoder-only Transformer Architecture