top of page
Search

The Humanity Project

  • samrodriques
  • 2 days ago
  • 15 min read

Disclaimer: I run Edison Scientific, a company building an AI Scientist to automate scientific research and development. Our AI Scientist supports partners in biotech and pharma across the entire pipeline of drug development, including hypothesis generation, target identification, and target validation as discussed here, among other things.


GLP-1 agonists are revolutionizing humanity’s approach to weight loss. The odds are that there are very many other GLP-1-like treatments -- extremely effective, with minimal side-effects -- waiting to be found for other major diseases, from cancer to aging. Within a few years, we will likely have AI agents that are better than humans at making scientific discoveries, and which thus may be capable of finding such treatments. However, in order to turn these discoveries into technological and medical breakthroughs, we will need to greatly increase the rate at which we can test novel hypotheses in humans.


Here, I propose the Humanity Project, a project with the goal of curing all diseases that affect humanity. The Humanity Project would run 2000 drug programs testing novel, AI-generated therapeutic hypotheses through clinical proof of concept over 5 years. The Project would cost between $50B and $200B -- equivalent to the amount allocated for the California High Speed Rail, or the 5-year budget for the NIH --, would increase the number of first-in-class drug trials by 400% to 800% over that period, and would benefit from various advantages of AI in proposing and prosecuting clinical trials. The logistical challenges associated with such a problem are intense, so if we are serious about curing all diseases by mid-century and maintaining US leadership in biotechnology and medicine, we should begin now.


We cannot cure diseases today because we don’t understand biology:

Usually, when we cannot cure a disease today, it is for one of two reasons: either we do not know how the disease works (as in the case of schizophrenia, for example); or we do generally think we know how the disease works, but we do not have the technology to modify it (as in Huntington’s, for example). New drugs that either test a new hypothesis about how a disease works (i.e., engage a new drug target) or that use a new technology (or “modality”) are thus generally responsible for the largest medical benefits. This category of “first-in-class” drugs includes revolutionary examples like GLP-1 agonists [1], CAR-T cells, or checkpoint inhibitors, which increased median life expectancy for cancers such as melanoma by more than 500% [2]. A recent survey of anti-cancer drugs found that only 22.2% of next-in-class anticancer drugs established a survival benefit over the first-in-class drug, suggesting that first-in-class drugs capture most of the life-expectancy benefit of novel drugs [3].


However, it does not always pay to test new hypotheses:

Despite this, however, 96% of clinical trials today interrogate established drug targets with established modalities [4]. This phenomenon, called target herding, is a rational response to profit incentives that likely slows the overall pace of medical progress significantly [5]. There are two key reasons why pharmaceutical companies do not pursue more first-in-class drugs:


  1. Good first-in-class hypotheses are hard to find: It is often said that hypotheses are abundant in drug development; however, high quality first-in-class drug hypotheses, with enough evidence to support development into a drug program, are scarce. The thing we hear most often as we talk to pharma companies at Edison Scientific is that they want more high-quality drug targets or modalities [6]. I refer to this as the supply problem.

  2. The first mover does not always capture the value: The risks associated with first-in-class programs are higher and if the next-in-class drug is better, even marginally so, it can end up taking the majority of the market share while taking much lower technical risk. This can be observed with the dynamics around GLP-1s today, and is a particular challenge for drug targets (as opposed to novel modalities), since drug targets cannot be patented [7]. I refer to this as the value-capture problem.


Value captured as a function of launch order and therapeutic advantage. Second-in-class drugs with a slight advantage, or even that are simply non-inferior, can capture 88% of the value of the first-in-class drug. Reproduced from https://www.nature.com/articles/nrd4035.


AI agents will help generate high-quality hypotheses about how to cure diseases:

At its heart, the supply problem boils down to data. To understand why, it is helpful to understand what makes a hypothesis high-quality in the first place.

A first-in-class drug hypothesis is high-quality if it has a high probability of technical success, i.e., there must be strong evidence to support the claim that the hypothesis will be borne out in humans specifically. This evidence must either be derived from human patients directly, or must come from a disease model with high predictive validity, i.e., a model whose predictions are very likely to be valid in humans [8]. The highest quality drug targets almost universally come from genetics [9]. Individuals with mutations in the APOE gene are much more likely to develop Alzheimer’s disease; thus, APOE is a strong target for Alzheimer’s. On the other hand, individuals with mutations in the PCSK9 gene are much less likely to develop coronary heart disease; and, indeed, drugging PCSK9 leads to massive reductions in LDL cholesterol. The use of genetic evidence in nominating drug targets has been credited with significantly increasing the probability of success (and reducing the cost) of developing new drugs [10].


There are likely far more high-quality drug targets to be discovered than are known today. There are up to ~7000 genes in the genome that are plausible drug targets [11], but existing approved drugs currently only target 700-800 genes. The first bottleneck in identifying more targets is in data generation. Existing human genetic datasets are largely underpowered, with sample sizes in genetic cohorts being small relative to disease heterogeneity [12]. Collecting additional human genetic data is very expensive, and even large pharma companies often cannot justify spending tens of millions of dollars on preclinical target identification. The second bottleneck is data analysis. Even within existing genetic datasets, it is likely that there are many mechanisms still to be discovered by using novel analytical techniques or by elucidating the biology underlying known variants, which have not been found today because no one has looked for them.


These two bottlenecks -- data generation and data analysis -- can be resolved with money and with the help of AI agents. On data analysis, we have already shown how Kosmos, our AI Scientist, was able to identify a plausible biological mechanism underlying a previously unexplained genetic variant for type 2 diabetes [13]. Similar results have been obtained by Google and others [14]. The fact that we can do so even at this early stage of development clearly indicates that there are disease-relevant, biological insights (“$100 bills”) lying in plain sight, and gives a preview of how AI agents will automate the process of finding them. On data generation: AI agents will help to identify the highest priority experiments, and plan them optimally to answer the largest number of questions possible within a given budget and timeline. Once targets have been identified, AI will also help to identify and orchestrate the experiments necessary to validate those targets, such as establishing models and assays and developing tool compounds. Within the next few years, we will be inundated with novel high-quality therapeutic hypotheses where the data exists, and with high-quality proposals for experiments that should be run to identify such hypotheses where the data does not exist.


To be clear: there is not yet any conclusive evidence that AI agents will produce hypotheses that are superior to those generated by humans (see the FAQ for a discussion). There is a long history of AI overpromising and underdelivering in drug discovery. It may turn out that AI agents only generate similar hypotheses to those that humans would otherwise have generated, or they may generate worse hypotheses. However, AI agents can consider far more information than a human can, and it seems likely that this attribute alone (even in the absence of superintelligence) would allow them to propose hypotheses that are superior to human-proposed hypotheses. If we realize some form of superintelligence, it seems likely that superintelligence could also result in better decision-making and resource allocation, which could also significantly improve the quality of hypotheses entering the clinic. And AI is improving rapidly across a very wide selection of tasks (e.g. software engineering tasks, below).


Graph showing the task duration of software engineering tasks that language models can complete. The ability of language models to perform complex software engineering tasks is increasing exponentially. We have experienced a similar increase in capabilities for scientific research tasks, such as literature search, data analysis, and hypothesis generation. We need to scale up our ability to run clinical trials accordingly. Reproduced from https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/.


We need a massive, centralized program to test those hypotheses:

Once we have those hypotheses and experimental proposals in-hand, we will need to massively scale up clinical and preclinical work to match. Here, we encounter the second problem enumerated above: the value-capture problem. Drug targets are poorly defensible, so it is generally challenging to justify investing in the preclinical needed to discover them or the clinical work needed to establish them.


To solve the value-capture problem, centralization is necessary. I therefore propose that we should initiate a massive, centralized project, the Humanity Project, to run first-in-class clinical trials testing AI-generated hypotheses, with an eye towards curing all disease by mid-century [15]. The Humanity Project would operate drug programs with high-risk, first-in-class targets or modalities through Phase II, which is clinical proof-of-concept. It would own all the resulting intellectual property, would publish all data, and would out-license successful assets through public auction [16].


For the sake of concreteness, the Humanity Project would initially be aimed at running 2000 first-in-class clinical programs from initiation through Phase II. These programs would be distributed across the top 200 diseases, as counted by aggregated diagnostic codes, that collectively account for ~99% of human disease burden by cost [17]. Clinical programs generally have a success rate of about 10% through Phase II (although the success rate for first-in-class trials may be lower), so running 10 trials per disease would yield one successful Phase II proof-of-concept on average.


AI would play a central role in the Humanity Project. AI agents would be responsible for analyzing data to generate therapeutic hypotheses and for every stage of experimental planning, both for preclinical experiments and clinical trials, with humans remaining in the loop as a check. As an added benefit of centralization, trial design and resource allocation would be heavily optimized: it is likely that there would be multiple trials evaluating the same target for example (possibly for different diseases), so those trials could be combined. When many therapeutic hypotheses were present for a single disease, the relevant experiments or trials would be planned to minimize correlation and maximize information gain.


Rough estimates based on public pharma R&D spending and reports in Nature Reviews Drug Discovery suggest that $25M to $100M is a reasonable estimate for the average cost of a drug program through a Phase II trial [18]. In an organization that performs many such trials, the marginal cost may be significantly lower, but this suggests a total cost of $50B to $200B for the Humanity Project. Moving from a drug target through Phase II in the US ordinarily takes 4-6 years, but some estimates suggest that China can move from a drug target through Phase II roughly 2.5x as fast as the US, or 18-24 months [19]. Thus, from the time the first programs initiate, a middle estimate of 48 months seems reasonable [20].


The Humanity Project would constitute only a small fraction of total spending on pharmaceutical R&D. Drug developers spend $300B/yr and initiate roughly 5000 clinical trials per year [21], versus $10B-$40B and 800 to 1600 per year for the Humanity Project [22]. However, given that 96% of clinical trials initiated by drug developers pursue established targets and modalities, the Humanity Project would increase the number of first-in-class trials by 400% to 800%, which we anticipate would significantly accelerate medical progress.


Moreover, the Humanity Project, although massive, would not overwhelm infrastructure for clinical trials. The Humanity Project would constitute 10%-20% of the total number of interventional clinical trials, but would likely only need 3%-7% of the global supply of patients for interventional clinical trials, because it is focused on earlier-stage trials (see FAQ). For similar reasons, we expect that manufacturing would not be a major constraint in most cases (earlier trials require smaller doses), and there would be significant opportunities for improving efficiency through platform-level CMC rather than asset-specific CMC.


We need to get started soon:

Biotech intellectual property is finite. Geopolitical adversaries such as China have proven extremely agile at leveraging public funding to accelerate sectors of national importance, and at moving rapidly to develop intellectual property and run clinical trials. Unless we launch a program of the sort described here, China may win the race for core medical IP, and become a toll collector on all future drugs. Thus, it is essential that the Humanity Project launch soon, within the next few years, while the US retains an advantage in AI and AI-for-science.


The funding scale ($10B-$40B/yr) is significant but achievable at the national level. It is similar in scope to the amount budgeted for the California High Speed Rail project, for example. It is also similar to the NIH’s budget. Thus, although it would take a dedicated act of Congress, the US Government could certainly serve as an anchor funder if it wanted. It is possible that private funders (institutional investors and pharma companies) could contribute some fraction of the funding as well, if there were sufficient public backing. The logistical lift would also be herculean, which is yet another reason to start now.



FAQ:


Q: Don’t you have a conflict-of-interest in proposing this?

A: Yes. However, the fact that we are at the frontier is also what gives us perspective on the possibilities. Executing on the Humanity Project will take a massive joint effort, and we would likely be one of many players involved.


Q: Is the AI ready to go now?

A: Not quite. Before the Humanity Project can launch and command the level of capital described here, we must clearly establish the following things:

  1. We must demonstrate that AI agents are calibrated, i.e., that they are capable of estimating the probability that a given experiment succeeds or fails (or, for non-binary experiments, what the outcome is) with an accuracy that is similar to or better than that of human experts on average.

  2. We must demonstrate that AI agents can select next experiments in an way that maximizes information gain under some resource constraints, i.e., that AI agents can plan experiments efficiently. It may not be possible in practice to establish this robustly in the general case, because the set of all possible experiments is not well-parameterized. At the very least, we should demonstrate that humans believe that AI-planned experiments are efficient, and that within constrained environments in which the information gain can be measured robustly (Wordle is one simple example), the agents choose experiments that are at least close to information-maximizing. GPT-5.2 and Opus 4.5 are currently very bad at Wordle. If a language model cannot select the best next guess in Wordle, it seems unlikely to be able to select the best next experiment for a drug discovery program [23].

  3. We must demonstrate that AI agents can generate therapeutic hypotheses and propose experiments that are regarded highly by humans experts. In the long run, knowing that an AI agent is as-well-calibrated or better-calibrated than humans should be sufficient for humans simply to trust the proposals the agent makes. However, until the success rate of AI-proposed drug programs is well-established, humans are unlikely to sign off on on AI-proposed drug program unless it aligns with their own intuitions.

  4. We must demonstrate that AI agents can perform reliably the kind of routine scientific tasks that humans must perform in the course of a drug discovery campaign, from target selection and validation to clinical biostatistics, at least with human supervision.

Based on the current rate of progress, I anticipate that we will be in a position to launch the Humanity Project within 1-3 years.


Q: Pharma spends $300B on R&D per year, so the idea that we can get major breakthroughs for $50B-$200B seems intrinsically wrong.

A: Pharma does indeed spend $300B on R&D per year, but 96% of clinical trials are next-in-class, rather than first-in-class. Thus, we anticipate that running 2000 first-in-class drug programs over 5 years would increase the number of first-in-class trials by 400%-800%, which could lead to substantial breakthroughs (20-40 years worth of research condensed into 5 years).


Q: Would there be enough patients or manufacturing capacity to support these trials?

A: Patient availability is a major constraint on clinical trials. Whether there are enough patients to support clinical trials depends heavily on the disease. However, the Humanity Project would likely constitute at most 10%-20% of the total number of drug-developer-initiated interventional trials. Moreover, the Humanity Project would be focused primarily on smaller trials (Phase I and Phase II), which use 5x-10x fewer patients than Phase III trials. Thus, we anticipate that the Humanity Project would need only ~3%-7% of the total supply of patients for interventional clinical trials [24]. In most disease areas, this should be achievable, especially because the Humanity Project would be focused on higher prevalence diseases. For some specific diseases, however, a lack of patients may be an issue.

Similar considerations apply for manufacturing. 20% of CDMOs today operate below 40% capacity due to pipeline delays [25]. It seems highly likely, therefore, that a centralized program would be able to secure enough CMC capacity and achieve operational efficiencies through pipeline optimization.


Notes

[1] Some subtlety is required here. The first-in-class GLP-1 agonist for weight loss, liraglutide, was a commercial flop. Semaglutide, the second-in-class drug behind wegovy, was a success, due in part to a reduction in the number of injections needed from daily to weekly. Semaglutide is therefore not truly "first-in-class," but illustrates why next-in-class drugs can sometimes include significant (perhaps transformative) advances and can capture significant market share. Indeed, as in the case of the GLP-1 agonists, it may not even have been apparent until commercialization how transformative an advance actually is.


In the course of the Humanity Project, this nuance will need to be taken into consideration. There may be times when the highest leverage point for the Humanity Project is a next-in-class trial that includes some significant pharmacological modification, or that alters the modality in some way. That is fine in principle. The Humanity Project should focus primarily on drug programs that test high risk novel therapeutic hypotheses, and does not need to be dogmatic.

[5] See also blog.jck.bio/p/creating-therapeutic-abundance. In many cases, next-in-class or “me-too” drugs do show significant improvements over first-in-class drugs. However, it remains the case that 0-to-1 step-changes, as in the case of GLP-1s, usually result from novel mechanisms or novel modalities.

[6] This obviously depends significantly on therapeutic area. Some areas, like oncology, are generally target-rich, while others, like neurosciences, are generally target-poor.

[7] In Amgen vs. Sanofi (2023), for example, the Supreme Court held, in essence, that targets that would be drugged with an antibody cannot be patented, because disclosing one or more antibodies that can drug a target is not sufficient to enable any such anitbody to drug that target. This further depresses the ROI for being the first to identify a new drug target or modality.

[8] For more on predictive validity, see e.g. https://www.nature.com/articles/s41573-022-00552-x

[9] For simplicity of argument, I am mostly ignoring other forms of non-genetic evidence here, and I am also largely ignoring new modalities. However, the same considerations apply in both cases: there is a bottleneck in planning and executing the experiments, and there is a bottleneck in drawing conclusions from the data. AI will help with both. The development of new modalities have been responsible for many of the greatest breakthroughs in medicine, and should not be discounted.

[10] Specifically, the use of genetic evidence in drug target selection coincided with the breaking of Eroom’s law, the previously-observed trend that the cost of developing a new drug was doubling every decade in real terms. https://www.nature.com/articles/d41573-020-00059-3

[11] I.e., that are involved in signaling, transcription, immunity, and other disease-associated functions. Of these, only a fraction, likely around 3000, can be targeted with existing modalities.

[12] For example, a GWAS study or metaanalysis will often include ~100,000 participants. A disease subtype representing 10% would then be represented by 10,000 participants, within which rare but impactful genotypes at the single-digit percentage level would still be largely undetectable.

[15] I am generally very pro-capitalism and believe that competition and profit motives improve resource allocation in general. In this case, however, if we have or expect to have an AI system that we believe is likely better than humans at allocating resources, then there would be significant efficiency gains from consolidating the market, at least for the early-stage, profit-poor portion of the pipeline.

[16] Note: the Humanity Project would not cure diseases itself. It would simply identify mechanisms of action that could be exploited by cures. For-profit companies could then compete to provide the best cure, as is currently the situation in the case of GLP-1s.

[17] Estimates vary. One estimate from the GBD Studies attributes most of the loss of disability-adjusted life years to ~369 diseases (https://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2820%2930925-9/fulltext), rather than 200. Note, as discussed below, that conditions with very different underlying etiologies can sometimes be lumped under a single diagnostic code. Thus, it is unreasonable to expect that we will come up with a single cure for an entire diagnostic code in general. Nonetheless, we use the figure here as an approximation.

[20] Additional time may be needed to set up logistics and relationships.

[22] Assuming the Humanity Program is carried out over 5 years, and that each drug program consists of 2 to 4 clinical trials.

[23] The issues language models have with Wordle may be due at least in part to bad tokenization. On at least one occasion, Opus 4.5 got confused about which letter was in which position in a word. However, even when the models seem to understand which position each letter is in, they still make information-theoretically suboptimal guesses according to WordleBot.

[24] Roughly ⅓ of clinical trials are Phase III/IV, and ⅔ are Phase I/II. If Phase III/IV trials require 5x more patients on average than Phase I/II, then roughly ⅔ of all patients enrolled in interventional clinical trials would be enrolled in Phase III studies. Thus, if the Humanity Project constitutes 10%-20% of the total number of interventional trials, but is focused exclusively on Phase I and Phase II, then it would require 3%-7% of the total number of patients.

 
 
 

Recent Posts

See All
The Grugbrained CEO

These words inspired by grugbrain.dev . If feel confuse, read grugbrain.dev first. Give good idea on how make good code and be less...

 
 
 

Follow

©2024 by Sam Rodriques

bottom of page