top of page
Search

The essential role of natural language in representing biology

  • samrodriques
  • Dec 21, 2024
  • 3 min read

At the time of writing, it is December 2024, and I am at NeurIPS. The word of the day, at least in the AI for Biology community, is foundation models. Everyone wants bigger data on more things to throw into bigger models. Virtual cell models will enable us to predict how cell states will change in response to chemical perturbations. Protein language models will enable us to identify better enzymes for degrading plastics or protein binders that have more drug-like properties. The future is bright.


Real biology discoveries look somewhat different, though, and I think it is telling that there are not very many actual biologists at these conferences. Contrast these dreams of foundation models with the latest table of contents from Science or Nature:





I struggle to imagine how any of these discoveries could fall out of a multimodal biology foundation model. This is not intended to be a straw man argument. Surely, a foundation model could potentially identify the lncRNA from the first paper, but I am not sure how such a foundation model would associate it with chromatin remodeling. A multimodal foundation model with enough data could also potentially identify metabolic changes associated with melanoma cells subjected to certain kinds of treatments, but I don’t see how that foundation model could identify the effect of those metabolites in preventing CD8+ T cell activation. Indeed, I do not think that any of the foundation models that are being developed today would be capable of generating rich new biological insights of the kind described in these papers. And yet, these are the kinds of insights that new therapies are made from.


The issue, I think, is that machine learning models work extremely well on structured data, and so all the foundation models that are being built are highly structured. Take a protein sequence as input and produce a protein sequence as output. Take a cell state and a chemical perturbation as input and produce a new cell state as output. Biology, however, is poorly structured. The lncRNA insight is case in point: what structured representation can we use for the action of the lncRNA in modulating chromatin architecture? Protein models cannot represent it; DNA models cannot represent it; virtual cell models cannot represent it. Perhaps a model that incorporates RNA expression and 3D genome state could represent it, but then how would that model represent the lipid modulation of the monocytes? I worry that every discovery may need its own representation space. Indeed, the nature of biology is such that there likely is no representation, short of an atomic-resolution real-space model of the entire organism, that is sufficient to represent the diversity of biological phenomena that are relevant for disease.


Except, of course, for natural language, which is evolved to represent all concepts that humans are capable of contemplating. Indeed, I think natural language is ultimately unavoidable for discovery in biology, insofar as it is the only medium we know of that is sufficiently structured for machine learning and sufficiently flexible to represent the full diversity of biological concepts. At FutureHouse, we work on language agents, which is one way of combining language and biology, but this is not the only way. Models that combine natural language with protein, DNA, transcriptomics, and so on will also be extremely productive, provided the addition of the structured datatypes does not restrict their ability to represent unstructured concepts.

 

The history of biology is built on tools that we have found in nature to study biological phenomena. As all biologists know, trying to engineer things from scratch (almost) never works; what works is finding things in nature and repurposing them. It will be aesthetically pleasing if it turns out that our engineered representations are yet again insufficient for studying biology, and that natural language is simply another such tool that we have found in nature that must be applied instead.





 
 
 

15 Comments


42a65cp6uk
Apr 24

Très bel article, clair et stimulant, avec des idées qui donnent vraiment envie d’aller plus loin dans la réflexion. J’apprécie particulièrement la manière dont vous reliez les concepts à des exemples concrets, un peu comme Kakadu Casino qui attire l’attention par son identité forte et mémorable. Hâte de lire les prochains billets !

Like

meery232ert
Mar 02

Using technology to increase access to youth mental health support may offer a practical way for young people to reach guidance, safe-spaces, and early help without feeling overwhelmed by traditional systems. Digital platforms, helplines, and apps could give them a chance to seek support privately, connect with trained listeners-orexplore resources that might ease their emotional load. This gentle shift toward tech-based support may encourage youth to open-up at their own pace, especially when in-person help feels too heavy to approach.

There is always a chance that these tools-quietly make support feel closer than before, creating moments where help appears just a tap-Berlinintim away. Even a small digital interaction might bring a sense of comfort. And somewhere in that space, you…

Like

meery232ert
Mar 02

Detailed and practical, this guide explains concrete rebar in a way that feels approachable without oversimplifying. The step by step clarity is especially useful for readers new to the subject. I recently came across a construction related explanation on https://hurenberlin.com that offered a similar level of clarity, and this article fits right in with that quality. Great شيخ روحاني resource. explanation feels practical for everyday rauhane users. I checked recommended tools on https://www.eljnoub.com

s3udy

q8yat

elso9

Like

vulekyjap
Sep 22, 2025

Building strong community relationships is crucial for growing a senior home care business. Networking with https://thehomeaidesfranchise.com/ healthcare providers, hospitals, and local organizations increases visibility and referrals. Partnering with senior centers and retirement communities further broadens opportunities. Entrepreneurs who establish genuine connections and provide consistent care often benefit from word-of-mouth recommendations that fuel business expansion.

Like

vulekyjap
Sep 11, 2025

Consultants specializing in GxP compliance deliver tailored solutions for organizations at different growth stages. From gxpauditconsult startups navigating initial regulatory frameworks to established firms optimizing global compliance programs, consultants create customized approaches. They help organizations build sustainable systems that integrate regulatory compliance seamlessly into daily operations and long-term strategies.

Like

Follow

©2024 by Sam Rodriques

bottom of page