Your application

Please complete the following fields to be considered for this project.

Please fill in this required field.
Please fill in this required field.
Please fill in this required field.
Please fill in this required field.
Please fill in this required field.
How much commitment will you have to this project?
Please select an option.
Are you available to dedicate 1-2 hours per week to the Build Project?
Please select an option.
Your application has been 
successfully submitted!
Explore more projects
Close
You already submitted an application for this project.
Explore more projects
Close
There was an error submitting your form. Please try again later or contact us.
Oops! Something went wrong while submitting the form.

This project is no longer accepting applications. Subscribe to our newsletter to be notified of new projects!

Get updates
Uncovering Hidden Trends in Open Source
Ed Sioufi
Ed Sioufi
Get updates
Register today
Apply now

Uncovering Hidden Trends in Open Source

Build a data analytics pipeline to unveil trending open-source projects on GitHub, providing deep insights into project health, community engagement, and technology adoption patterns.

Register today
Apply now
Wednesdays
 at
3:00
P.M.
 ET /
12:00
P.M.
PT
8 weeks, 2-3 hours per week
Intermediate
No experience required
No experience required
Some experience required
Degree and experience required

Description

As organizations increasingly rely on open-source software, understanding project health and community trends becomes crucial yet challenging due to vast amounts of unstructured data. In this Build Project, you'll take on the role of a Data Engineer, creating a systematic approach to analyze GitHub's rich dataset. You'll develop a robust pipeline that periodically fetches, stores, and refines GitHub data to uncover insights about repository activity, community engagement, and technology trends. Under the supervision of an experienced industry expert, you'll build a system that transforms raw data into meaningful metrics that can guide technical decisions and open-source strategy. This project mirrors real-world scenarios where organizations need to process and analyze large datasets to understand technology trends and community dynamics.

Session timeline

  • Applications open
    February 13, 2025
  • Application deadline
    March 13, 2025
  • Project start date
    Week of July 8, 2024
    Week of
    April 7, 2025
  • Project end date
    Week of

What you will learn

  • Design and implement efficient data pipelines that streamline analytics, significantly reducing manual effort and enabling deeper insights.
  • Utilize dbt (data build tool) for advanced data transformation, applying software engineering best practices like modularity, version control, and testing to ensure scalability and reliability.
  • Utilize advanced SQL transformation logic to refine data, meeting complex analytical requirements and preparing students for real-world data engineering challenges.
  • Apply dimensional modeling techniques and create aggregated models to restructure data for efficient querying and analysis, optimizing performance for various analytical needs.

Project workshops

1
Introductions, Deliverables, and Initial Setup
2
User Requirements & Pipeline Design
3
Data Ingestion Fundamentals
4
Building the Staging Layer
5
Core dbt Transformation
6
Advanced Transformations & Optimization
7
Data Visualization & Documentation
8
Final Presentations & Project Showcase

Prerequisites

  • Interest in data engineering and tackling big data challenges, demonstrating a keenness to dive into the complexities of managing and analyzing large datasets effectively.  
  • Familiarity with Python syntax, loops, conditions, functions, and the ability to write simple scripts.
  • Fundamental grasp of SQL and relational databases: Ability to construct basic queries to select, filter, join, insert, and update data in a relational database, along with a basic understanding of database design principles and normalization.
  • Introductory knowledge of data manipulation and analysis using pandas: Comfort with loading, inspecting, and manipulating datasets in pandas.
  • Initial exposure to data visualization techniques: Some experience with using libraries like matplotlib or seaborn to create basic plots and charts.
  • Soft skills: Curiosity, willingness to learn, and the ability to work collaboratively in a team environment.

Sign up today

Get access to all of our Build projects, including this one, by creating your Build account!

Register today
Log in

Apply to

Ed

's project today!

Get started by submitting your application.

Apply now

Stay updated!

Subscribe to our newsletter to be notified when projects reopen!

Please fill in this required field.
By clicking “Subscribe” you agree to our Terms of Services and Privacy Policy.

Thanks for subscribing!

We'll notify you when projects reopen. In the meantime, you can explore our resources and learn more about our Fellows.

Discover our articles
There was an error submitting your form. Please try again later or contact us.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
About the expert

Ed Sioufi

Software Development Fellow
Open Avenues Foundation

Edouard Sioufi is a Software Development Fellow at Open Avenues Foundation based in San Francisco, California.

He currently works as CTO/CPO at Big Little Robots and has been writing code since he was 13 years old. Originally from Lebanon, Ed holds advanced degrees in Computer Engineering. In his free time, he enjoys jazz music and reading philosophy.

Visit
Ed
's Linkedin
More like this Project