This project is no longer accepting applications. Subscribe to our newsletter to be notified of new projects!
Build a data analytics pipeline to unveil trending open-source projects on GitHub, providing deep insights into project health, community engagement, and technology adoption patterns.
As organizations increasingly rely on open-source software, understanding project health and community trends becomes crucial yet challenging due to vast amounts of unstructured data. In this Build Project, you'll take on the role of a Data Engineer, creating a systematic approach to analyze GitHub's rich dataset. You'll develop a robust pipeline that periodically fetches, stores, and refines GitHub data to uncover insights about repository activity, community engagement, and technology trends. Under the supervision of an experienced industry expert, you'll build a system that transforms raw data into meaningful metrics that can guide technical decisions and open-source strategy. This project mirrors real-world scenarios where organizations need to process and analyze large datasets to understand technology trends and community dynamics.
Meet your Project Leader and peers while exploring the fundamental components of data pipelines. Learn why each component matters and set up your development environment for the journey ahead.
Explore how business needs drive data engineering decisions. Learn to gather requirements from analysts, understand their data needs, and translate these into initial pipeline designs while considering technical constraints.
Tackle real-world challenges in data ingestion and staging. Get hands-on experience loading data into your database, understanding common pitfalls and best practices.
Learn essential data cleaning techniques through hands-on practice. Create your first dbt models while learning how to handle common data quality issues at the staging level.
Dive into SQL-based data transformations. Build core dbt models that align with your user requirements, learning how to structure transformations effectively.
Learn to improve pipeline performance through intermediate models and macros. Apply dimensional modeling concepts and optimize your transformations following industry best practices.
Transform your data models into compelling visualizations. Learn to create informative charts and enhance your GitHub repository with professional documentation.
Present your end-to-end data pipeline to stakeholders. Demonstrate your technical achievements through visualizations and share key learnings from your project journey.
Get access to all of our Build projects, including this one, by creating your Build account!
Get started by submitting your application.
We'll notify you when projects reopen. In the meantime, you can explore our resources and learn more about our Fellows.
Edouard Sioufi is a Software Development Fellow at Open Avenues Foundation based in San Francisco, California.
He currently works as CTO/CPO at Big Little Robots and has been writing code since he was 13 years old. Originally from Lebanon, Ed holds advanced degrees in Computer Engineering. In his free time, he enjoys jazz music and reading philosophy.