Your application

Please complete the following fields to be considered for this project.

Please fill in this required field.
Please fill in this required field.
Please fill in this required field.
Please fill in this required field.
Please fill in this required field.
How much commitment will you have to this project?
Please fill in this required field.
Are you available to dedicate 1-2 hours per week to the Build Project?
Please fill in this required field.
Your application has been 
successfully submitted!
Explore more projects
Close
There was an error submitting your form. Please try again later or contact us.
Oops! Something went wrong while submitting the form.

This project is no longer accepting applications. Subscribe to our newsletter to be notified of new projects!

Get updates
Exploring the Bacterial Genome Using Data Science
Hayden Sansum
Hayden Sansum
Get updates
Register today
Apply now

Exploring the Bacterial Genome Using Data Science

Develop a model to predict bacterial resistance to antibiotics from genomic data using an end-to-end data science approach including data exploration, feature transformation, hyperparameter tuning and statistical analysis.

Register today
Apply now
Fridays
 at
2:00
P.M.
 ET /
11:00
A.M.
PT
8 weeks, 2-3 hours per week
Intermediate
No experience required
No experience required
Some experience required
Degree and experience required

Description

Antimicrobial resistance (AMR) poses a significant global threat to our ability to treat common bacterial infections and save lives. Hidden in bacterial genomic DNA is the key to understanding the causes of resistance, and hence our ability to treat these infections. In this Build Project, you'll wear the hat of a Data Scientist and build a simple machine learning pipeline to turn raw data into an AMR prediction. Under the supervision of an experienced industry expert, you'll get exposed to genomic data processing approaches, featurization for machine learning, training machine learning models and how to fairly evaluate and analyze your model’s performance. You'll become familiar with common industry tools and approaches like Python, Git & GitHub, data science packages (pandas, scikit-learn), cross validation and hyperparameter tuning amongst others. You'll become familiar with common industry tools and approaches like Python, data science packages (pandas, scikit-learn), cross validation and hyperparameter tuning amongst others.

Session timeline

  • Applications open
    August 1, 2024
  • Application deadline
    August 25, 2024
  • Project start date
    Week of July 8, 2024
    Week of
    September 9, 2024
  • Project end date
    Week of

What you will learn

  • Tackle a data science (predictive modeling) problem from start to finish (e.g. a take home interview exercise)
  • Set up a reproducible Python environment and understand how to structure code for analysis
  • Process tabular data and sequencing data (bacterial genomes) into machine learning features
  • Train a well-fit machine learning model and comparison baseline model
  • Analyze machine learning models and understand how to use cross-validation and statistical tests to compare between models
  • At the end of each workshop, there will be an opportunity to push notebooks to GitHub and tag Hayden (or a fellow student) for review. Github is an optional part of this project, and it can be completed using just the course folders.
Build Projects are 8-week experiences that operate on a rolling basis. Selected participants engage in weekly live workshops with a Build Fellow and 2-15 other students.

Project workshops

1
Project Introduction & Setup
2
Genomic Data
3
Data Analysis
4
Featurization & Baseline Modeling
5
Model Training Approaches
6
Model Tuning
7
Performance Evaluation
8
Results Presentation & Wrap up

Prerequisites

  • Knowledge of basic programming principles (variables, documentation, debugging)
  • Experience with writing Python code (loops, functions, libraries)
  • Experience with setting up and running a Python environment
  • Understanding of basic visualization approaches (histograms, boxplots) and libraries (Matplotlib/Seaborn)

Sign up today

Get access to all of our Build projects, including this one, by creating your Build account!

Register today
Log in

Apply to

Hayden

's project today!

Get started by submitting your application.

Apply now

Stay updated!

Subscribe to our newsletter to be notified when projects reopen!

Please fill in this required field.
By clicking “Subscribe” you agree to our Terms of Services and Privacy Policy.

Thanks for subscribing!

We'll notify you when projects reopen. In the meantime, you can explore our resources and learn more about our Fellows.

Discover our articles
There was an error submitting your form. Please try again later or contact us.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
About the expert
Hayden Sansum
Visit
Hayden
's Linkedin

Hayden is a Senior Data Scientist currently working at Day Zero Diagnostics, a biotech startup aiming to provide rapid infectious disease diagnostics using whole genome sequencing from pathogens.

He started his career in Data Science at the Ministry of Justice back in 2015 after graduating from the University of Bath. Hayden continued to build his Data Science experience by completing a Master's in Data Science at Harvard University in 2020, including undertaking an Internship in Spotify's Personalization team.

At Day Zero Diagnostics Hayden applies a combinations of analytics, visualization and machine learning research to further the field of AMR predictive modeling.