Generative AI has revolutionized software development with prompt-based code generation — protein design is next.
EvolutionaryScale today announced the release of its ESM3 model, the third-generation ESM model, which simultaneously reasons over the sequence, structure and functions of proteins, giving protein discovery engineers a programmable platform.
The startup, which emerged from the Meta FAIR (Fundamental AI Research) unit, recently landed funding led by Lux Capital, Nat Friedman and Daniel Gross, with investment from NVIDIA.
At the forefront of programmable biology, EvolutionaryScale can assist researchers in engineering proteins that can help target cancer cells, find alternatives to harmful plastics, drive environmental mitigations and more.
EvolutionaryScale is pioneering the frontier of programmable biology with the scale-out model development of ESM3, which used NVIDIA H100 Tensor Core GPUs for the most compute ever put into a biological foundation model. The 98 billion parameter ESM3 model uses roughly 25x more flops and 60x more data than its predecessor, ESM2.
The company, which developed a database of more than 2 billion protein sequences to train its AI model, offers technology that can provide clues applicable to drug development, disease eradication and, literally, how humans have evolved at scale as a species — as its name suggests — for drug discovery researchers.
Accelerating In Silico Biological Research With ESM3
With leaps in training data, EvolutionaryScale aims to accelerate protein discovery with ESM3.
The model was trained on almost 2.8 billion protein sequences sampled from organisms and biomes, allowing scientists to prompt the model to identify and validate new proteins with increasing levels of accuracy.
ESM3 offers significant updates over previous versions. The model is natively generative, and it is an “all to all” model, meaning structure and function annotations can be provided as input rather than just as output.
Once it’s made publicly available, scientists can fine-tune this base model to construct purpose-built models based on their own proprietary data. The boost in protein engineering capabilities due to ESM3’s large-scale generative training across enormous amounts of data offers a time-traveling machine for in silico biological research.
Driving the Next Big Breakthroughs With NVIDIA BioNeMo
ESM-3 provides biologists and protein designers with a generative AI boost, helping improve their engineering and understanding of proteins. With simple prompts, it can generate new proteins with a provided scaffold, self-improve its protein design based on feedback and design proteins based on the functionality that the user indicates. These capabilities can be used in tandem in any combination to provide chain-of-thought protein design as if the user were messaging a researcher who had memorized the intricate three-dimensional meaning of every protein sequence known to humans and had learned the language fluently, enabling users to iterate back and forth.
“In our internal testing we’ve been impressed by the ability of ESM3 to creatively respond to a variety of complex prompts,” said Tom Sercu, co-founder and VP of engineering at EvolutionaryScale. “It was able to solve an extremely hard protein design problem to create a novel Green Fluorescent Protein. We expect ESM3 will help scientists accelerate their work and open up new possibilities — we’re looking forward to seeing how it will contribute to future research in the life sciences.”
EvolutionaryScale will be opening an API for closed beta today and code and weights are available for a small open version of ESM3 for non-commercial use. This version is coming soon to NVIDIA BioNeMo, a generative AI platform for drug discovery. The full ESM3 family of models will soon be available to select customers as an NVIDIA NIM microservice, run-time optimized in collaboration with NVIDIA, and supported by an NVIDIA AI Enterprise software license for testing at ai.nvidia.com.
The computing power required to train these models is growing exponentially. ESM3 was trained using the Andromeda cluster, which uses NVIDIA H100 GPUs and NVIDIA Quantum-2 InfiniBand networking.
The ESM3 model will be available on select partner platforms and NVIDIA BioNeMo.
See notice regarding software product information.