Recursion, a tech-enabled biopharma company combining automated, experimental biology with artificial intelligence to discover novel treatments that extend beyond our collective understanding of biology, today announced the winners of a Kaggle machine learning competition leveraging RxRx1, an open-source dataset pulled from Recursion’s proprietary biological images dataset, which is the largest of its kind in the world.
More than 860 teams of up to five competitors each participated in the months-long biology machine learning competition through Kaggle, announced in May of this year. The RxRx1 dataset is composed of cell microscopy images of a variety of human cells induced to more than 1,000 different genetic contexts produced weeks and months apart. Three teams were identified as the winners of the competition [see Kaggle leaderboard for team names] based on their ability to train deep learning models to accurately identify these contexts in held-out test experiments. Full results of the competition will be presented at NeurIPS 2019, taking place December 8-14 in Vancouver, Canada.
The RxRx1 dataset open-sourced by Recursion represents just a fraction, less than one percent, of Recursion’s weekly data generation. Today the company has generated more than 3.5 petabytes of biological image data built for the purpose of machine learning for drug discovery and generating complex representations of biology.
The goal of the RxRx1 competition was to inspire machine learning researchers to use this first-ever open-source dataset to develop new algorithms that would lead to greatly improved drug discovery applications and insights.
“This was the first time we decided to open-source a slice of our dataset to the world,” said Berton Earnshaw, Machine Learning Fellow at Recursion. “Many participants did extremely well in this competition, demonstrating how incredibly consistent the biological signal is in our images across time and batches and different genetic contexts. In fact, the top teams achieved near-perfect classification! This is highly validating given skepticism in our field around the ability to control, relate and use this type of dataset to drive major improvements in the pace and scale of drug discovery. We’ve been really encouraged by the results of the Kaggle competition and will do similar competitions and open-source more data in the future.”
RxRx1 is the first in a planned series of open-source biological and chemistry data releases for the machine learning community. Details of RxRx2 will be shared in early 2020.
“The benefits of a ‘miracle’ drug are so profound on society and the individuals and families it affects, yet the pace of such discoveries and the cost to achieve them are not sustainable today,” said Chris Gibson, Ph.D., co-founder and CEO, Recursion. “We must explore new ways to improve the likelihood of success and scale of discovery and — eventually — drive down the cost of discovering new medicines. What all these amazing teams have helped validate here is that machine learning, when applied to rigorously designed and controlled image-based experiments, can achieve levels of relatability and signal that rival any other omics data. What’s more, because the cost to generate biological images data is orders of magnitude less than any other comparably rich set of data, these images can serve as the foundation for techniques that truly scale drug discovery. Congrats to the winners for your passion, creativity and collaboration, and to the entire machine learning community that contributed to this competition. We are excited to build this new era of software-enabled drug discovery alongside you.”
For more information on Recursion’s unique approach to applying artificial intelligence and machine learning to drug discovery and development, visit www.recursionpharma.com.
Recursion is a tech-enabled biopharma company combining automated, experimental biology with artificial intelligence to discover novel medicines that extend beyond our collective understanding of biology. Recursion’s rich, relatable database of more than 3.5 petabytes of biological images generated in-house on the company’s robotics platform enables advanced machine learning approaches to reveal drug candidates, mechanisms of action and potential toxicity, with the eventual goal of decoding biology and advancing new therapeutics that radically improve people’s lives. Recursion is proudly headquartered in Salt Lake City and in 2019 was designated a Fast Company “Most Innovative Company.” Learn more at www.recursionpharma.com, or connect on Twitter and LinkedIn.