Your application

Please complete the following fields to be considered for this project.

Please fill in this required field.
Please fill in this required field.
Please fill in this required field.
Please fill in this required field.
Please fill in this required field.
How much commitment will you have to this project?
Please fill in this required field.
Are you available to dedicate 1-2 hours per week to the Build Project?
Please fill in this required field.
Your application has been 
successfully submitted!
Explore more projects
Close
There was an error submitting your form. Please try again later or contact us.
Oops! Something went wrong while submitting the form.

This project is no longer accepting applications. Subscribe to our newsletter to be notified of new projects!

Get updates
GitHub Gems: Driving Open-Source Investments With Data
Ed Sioufi
Ed Sioufi
Get updates
Register today
Apply now

GitHub Gems: Driving Open-Source Investments With Data

Build a data analytics pipeline to unveil trending open-source projects on GitHub, guiding venture capitalists (VC analysts) to smarter investment decisions.

Register today
Apply now
Wednesdays
 at
3:00
P.M.
 ET /
12:00
P.M.
PT
8 weeks, 2-3 hours per week
Beginner
No experience required
No experience required
Some experience required
Degree and experience required

Description

As companies increasingly depend on vast data volumes for strategic decisions, analysts face the challenge of managing an overwhelming array of tables and data sources not tailored to their needs, leading to repetitive tasks and inconsistencies in analysis due to varied methodologies. In this Build Project, you'll take on the role of a Data Engineer, embarking on a mission to streamline the process for VC analysts by leveraging GitHub data to spotlight emerging open-source technology trends that could signal lucrative investment opportunities. Under the supervision of an experienced industry expert, you’ll develop, configure, and maintain a system designed to periodically fetch, store, and refine Github data, making it readily accessible and useful for data analysts. All this will unfold in a setting that mirrors the real-world operations of a data-driven organization, providing you with practical experience and insights into the challenges and solutions in managing and analyzing large datasets for strategic decision-making.

Session timeline

  • Applications open
    August 1, 2024
  • Application deadline
    August 25, 2024
  • Project start date
    Week of July 8, 2024
    Week of
    September 9, 2024
  • Project end date
    Week of

What you will learn

  • Design and implement efficient data pipelines that streamline analytics, significantly reducing manual effort and enabling deeper insights.
  • Utilize dbt (data build tool) for advanced data transformation, applying software engineering best practices like modularity, version control, and testing to ensure scalability and reliability.
  • Utilize advanced SQL transformation logic to refine data, meeting complex analytical requirements and preparing students for real-world data engineering challenges.
  • Apply dimensional modeling techniques and create aggregated models to restructure data for efficient querying and analysis, optimizing performance for various analytical needs.
Build Projects are 8-week experiences that operate on a rolling basis. Selected participants engage in weekly live workshops with a Build Fellow and 2-15 other students.

Project workshops

1
Introductions, Deliverables, and Initial Setup
2
User Requirements
3
Data Ingestion
4
Staging Layer
5
Optimizing and Refactoring Transformations
6
Data Loading Automation with Airflow
7
Incremental Data Loading
8
Documentation and Presentation

Prerequisites

  • Interest in data engineering and tackling big data challenges, demonstrating a keenness to dive into the complexities of managing and analyzing large datasets effectively.  
  • Familiarity with Python syntax, loops, conditions, functions, and the ability to write simple scripts.
  • Fundamental grasp of SQL and relational databases: Ability to construct basic queries to select, filter, join, insert, and update data in a relational database, along with a basic understanding of database design principles and normalization.
  • Introductory knowledge of data manipulation and analysis using pandas: Comfort with loading, inspecting, and manipulating datasets in pandas.
  • Initial exposure to data visualization techniques: Some experience with using libraries like matplotlib or seaborn to create basic plots and charts.
  • Soft skills: Curiosity, willingness to learn, and the ability to work collaboratively in a team environment.

Sign up today

Get access to all of our Build projects, including this one, by creating your Build account!

Register today
Log in

Apply to

Ed

's project today!

Get started by submitting your application.

Apply now

Stay updated!

Subscribe to our newsletter to be notified when projects reopen!

Please fill in this required field.
By clicking “Subscribe” you agree to our Terms of Services and Privacy Policy.

Thanks for subscribing!

We'll notify you when projects reopen. In the meantime, you can explore our resources and learn more about our Fellows.

Discover our articles
There was an error submitting your form. Please try again later or contact us.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
About the expert
Ed Sioufi
Visit
Ed
's Linkedin

Edouard Sioufi is a Software Development Fellow at Open Avenues Foundation based in San Francisco, California.

He currently works as CTO/CPO at Big Little Robots and has been writing code since he was 13 years old. Originally from Lebanon, Ed holds advanced degrees in Computer Engineering. In his free time, he enjoys jazz music and reading philosophy.