I often hear proposals to build new foundation models for biology. Here is the list of questions I ask. I rarely get past question 1.
What is the core task that the model performs/the core thing the model predicts? I.e., on a single iteration of the model, what is the input and what is the output?
For LLMs, the answer is "you give it N words and it provides a probability distribution over word N+1"
My experience is that 90% of proposals fail at this stage.
What is the loss metric on the core task?
For LLMs, the answer is cross-entropy or perplexity
What emergent behavior do you think the model will have?
For LLMs, the answer is "natural language processing." For other people, it might be "predicting enzyme functionality" or something.
What are the evals you will use to evaluate whether this emergent behavior is emerging?
For LLMs, e.g. Winogrande
Is there evidence that performance on the evals increases with scale? How expensive is it to get that evidence?
Questions about datasets.
Do they exist?
How many tokens?
Are they high quality? How does the person know if they are high quality?
More technical questions
What is the scaling method?
Architecture questions. Dense? Mixture of experts? Multiple read heads? etc.