I want to start by telling a story about a young scientist. This young scientist had everything that you would hope for in someone just starting out their career: ambitious, hardworking, plenty of ideas. And this young scientist had a great hypothesis to start out a project. The only problem though with this hypothesis is that it was totally wrong. There was nothing there. The scientist ended up spending almost 12 months of their scientific life trying to test the idea. The details aren't important because it really was a great hypothesis; just totally wrong.
How did we arrive that it's totally normal that graduate student and postdoc lives are sacrificed at the altar of bad hypotheses? We’ve all been trained that the hypothesis is the bedrock of science. You come up with an idea, a hypothesis if you will, and then you set out to test it. If you gain evidence for the hypothesis, then you conclude that the hypothesis was correct. If you don’t gain evidence, then the hypothesis was incorrect. Simple. This fundamental part of the scientific method is called null hypothesis significance testing. Do the data support the idea? Is the difference significant? But perhaps we’ve been led astray and put too much emphasis on the null hypothesis. Or maybe too much emphasis on the hypothesis itself.
An alternative approach is called exploratory data analysis. In this approach, no hypotheses are preformed and instead, the investigator will 'roll around in the data', or in some cases, the literature. Reading the literature is the bedrock of science and the way that we all share our findings. Reading is a type of exploratory data analysis. We set out to learn something, not really sure what we will learn, and then we are delighted with surprise and knowledge. But formal exploratory data analysis gets a bad rap, primarily because we are all biased creatures. No one can read papers or analyze data without overlaying their own biases onto them. The problem with exploratory data analysis is that confirmation bias ensures that we see patterns that support our preformed notions first. And if these notions have no basis in reality, we might not see that. This is the problem with biomedical research — there is a lot of garbage and scientists need to sift through it all to find the truth.
So how do we separate the bad ideas from the good? The balance lies in moving between exploratory data analysis and null hypothesis significance testing. These are two distinct processes that should be taught distinctly and carried out separately. Exploring a date a set, the literature, or something in between should be done with acknowledgment and appreciation of our bias.
For example, no one else has read the same combination of papers, done the same sets of experiments, or had the life experiences as I have. Therefore, I carry all of those experiences with me as I approach new information. What I see in a data set or the connections I make between papers will be uniquely mine. This is the power of bias. Daniel Kahneman best said
biases are pattern recognition of a fast-thinking mind.
Our biases are the patterns that we see in the data or in the literature. Importantly, it’s these patterns then that can lead to scientific discovery. With acknowledgement of our biases, what does a parallel universe look like that might overhaul the scientific method in order to accelerate scientific discovery ?
One analogy for a new type of science is captured in a recent project run by friend Itai Yanai called night science. In this thought experiment, Itai and his colleague Martin Lercher explore two distinct modes of science. The first mode, they described as night science, where scientists explore ideas, dream up new possibilities, and imagined patterns that could be real. A second mode called day science is the classic scientific approach of hypothesis testing. Balancing the activities of day and night is what powers scientific discovery.
This dichotomy is reminiscent of two modes of thinking, the exploratory (or diffuse) mode, and the evaluative (or convergence) mode. When trying to come up with an idea, the exploratory mode is the hardest but also the most important. Allowing your mind to make connections between ideas is how new ideas are formed. Humans have a tendency to jump quickly to the evaluative mode — determining if that new idea is a good one or not. But research studies on creativity describe how thinkers need to stay in the exploratory mode longer than is comfortable in order to come up with new and creative ideas. This is what Itai described as night science. After spending time in the exploratory mode, the mind and the activity switch to begin to evaluate those ideas. Prioritizing the ideas then leads to the best next steps. In the case of science, this would be taking those ideas forward to test experimentally. The balance between these two modes of thinking, and these two scientific approaches, is not explicit enough in the modern scientific approach.
How does a scientist explore new ideas? The canonical way is to read and think, and wax hypothetical about possible connections. Could X be related to Y? Perhaps take notes or scribble ideas in the margins. Alternatively, scientists sit in a dimly lit conference room to have another scientist tell about their most recent paper. Perhaps not the most efficient use of time, but hopefully entertaining. Where are the patterns? Unfortunately, this approach is untenable. More papers are written today than ever before. In 1988, approximately 1000 papers scientific papers were published each day. Today, it's almost 3x's that amount, with over 25 million total papers cited in the National Library of Medicine.
How then does a scientist keep up with this literature? The short answer is they don’t. There is no way to read all of the papers in a field, even if the field is niche. So scientists read the papers from the top journals, or perhaps from the labs that they know and respect; everything else is filtered out.
Clearly, scientists need a different approach. Imagine a future where instead of going to literature to develop a hypothesis, scientists go to data to develop hypotheses. If the data suggest that X interacts with Y, either physically or functionally, and that interaction is not known, then a scientist could design a very specific experiment to test it. If the experimental results come back as null, then the scientist moves on to the next data-driven hypothesis. Is this data-driven approach possible? Like the Cambrian explosion in scientific literature, a concomitant flourshing in scientific data has followed.
Before the internet, librarians were information curators. Working with an antiquated Dewey Decimal system, their job was to organize information to make it searchable (indexed) and retrieveable (user experience). After the advent of the internet and similar explosion in information, search engines assumed the role of librarian. Today, we Google for the information we need (or Alexa, or Siri...). Scientists cannot yet Google "what does my favorite gene do?". No one has yet organized biomedical data into a uniform or searchable resource. But this is the future, and this will undoubtedly save graduate students time and heartache from failed lines of investigation, ultimatley accelerating scientific discovery. What then of the young graduate student that started this tale? Rest assured that they published a paper from graduate school, went on to have a successful post-doc, and runs a lab now that enables him to write, and think, and share his ideas with you all today.