Basecamp Research, a frontier AI lab for biological design, has announced the launch of the Trillion Gene Atlas, a landmark scientific initiative to generate and model biological data at the trillion-gene scale.
Launched in collaboration with Anthropic, Ultima Genomics and PacBio, and powered by NVIDIA AI infrastructure, the Trillion Gene Atlas aims to expand known evolutionary genetic diversity 100-fold by collecting genomic data from more than 100 million species across thousands of sites worldwide.
Today's biological AI models are trained on a narrow slice of life on Earth. The Trillion Gene Atlas expands the known genetic universe by orders of magnitude beyond what is in public databases. Training models at this scale establishes a new paradigm for programmable therapeutic design.
Addressing the Biological Data Bottleneck
With huge increases in model size and computing power, diverse data is a critical enabler for progress in AI drug development. All current sequence-based foundation models rely on variants of the same public repositories, with 80% trained on a public database containing fewer than 250 million sequences.
Basecamp Research's EDEN foundation models bypass the industry's evolutionary data wall by training entirely on BaseData, a proprietary genomic database that is currently more than 10 times larger than all public resources combined. By learning from an unprecedented 10 billion new-to-science genes across 1 million newly discovered species, EDEN unlocked critical new scaling laws for AI in biology.
Real-World Results
In wet-lab validation, EDEN demonstrated zero-shot activity in primary human T-cells without any human or clinical data needed. The model has successfully generated hits across multiple frontier modalities, notably pioneering AI-Programmable Gene Insertion to insert healthy genes and designing targeted antimicrobial peptides with a 97% hit rate against priority pathogens.