This project is no longer accepting applications. Subscribe to our newsletter to be notified of new projects!
Develop a model to predict bacterial resistance to antibiotics from genomic data using an end-to-end data science approach including data exploration, feature transformation, hyperparameter tuning and statistical analysis.
Antimicrobial resistance (AMR) poses a significant global threat to our ability to treat common bacterial infections and save lives. Hidden in bacterial genomic DNA is the key to understanding the causes of resistance, and hence our ability to treat these infections. In this Build Project, you'll wear the hat of a Data Scientist and build a simple machine learning pipeline to turn raw data into an AMR prediction. Under the supervision of an experienced industry expert, you'll get exposed to genomic data processing approaches, featurization for machine learning, training machine learning models and how to fairly evaluate and analyze your model’s performance. You'll become familiar with common industry tools and approaches like Python, Git & GitHub, data science packages (pandas, scikit-learn), cross validation and hyperparameter tuning amongst others. You'll become familiar with common industry tools and approaches like Python, data science packages (pandas, scikit-learn), cross validation and hyperparameter tuning amongst others.
Get to know the Project Leader and other students, ask questions about the project requirements, prepare your workspace and environment.
Explore publicly available data sources for bacterial genomes and AMR and understand more about how genomes are structured.
Analyze and visualize the data (EDA), uncover interesting features and correlations and propose questions/hypotheses.
Using your data exploration, convert the data into machine learning features and build a baseline model to provide a benchmark for future performance.
Explore cross-validation and data splitting strategies to ensure models are generalizable.
Build a final model and hyperparameter tune to achieve optimal predictive performance.
Evaluate your models, report out performance results, compare to the baseline and quantify your level of certainty in your results.
Wrap up and present your results and learnings, organize and (optionally) push all your code to GitHub.
Get access to all of our Build projects, including this one, by creating your Build account!
Get started by submitting your application.
We'll notify you when projects reopen. In the meantime, you can explore our resources and learn more about our Fellows.
Hayden is a Senior Data Scientist currently working at Day Zero Diagnostics, a biotech startup aiming to provide rapid infectious disease diagnostics using whole genome sequencing from pathogens.
He started his career in Data Science at the Ministry of Justice back in 2015 after graduating from the University of Bath. Hayden continued to build his Data Science experience by completing a Master's in Data Science at Harvard University in 2020, including undertaking an Internship in Spotify's Personalization team.
At Day Zero Diagnostics Hayden applies a combinations of analytics, visualization and machine learning research to further the field of AMR predictive modeling.