This project is no longer accepting applications. Subscribe to our newsletter to be notified of new projects!
Build a data pipeline that aggregates and analyzes e-commerce and social media data to provide actionable insights for AI startups, empowering them to enhance customer engagement and drive strategic growth.
As e-commerce and social media become increasingly intertwined, startups face the challenge of extracting meaningful insights from vast amounts of data to inform their strategies. In this Build Project, you'll step into the role of a Data Engineer, tasked with building a comprehensive data pipeline that aggregates and analyzes customer behavior and market trends. Under the guidance of an experienced industry expert, you’ll develop Python scripts for data extraction, transform data for analysis, and load it into a structured database. You’ll become familiar with essential industry tools and methodologies like SQL and data visualization techniques, all within an environment that simulates the operations of a real data analytics team. This hands-on experience will equip you with valuable skills that are highly sought after in today’s data-driven landscape, making you a standout candidate for future roles in data engineering and analytics.
Introduce the project, discuss e-commerce and social media data's significance, and review the main deliverables. Each individual will brainstorm potential data sources.
Identify and evaluate relevant data sources for e-commerce and social media analytics. You will choose datasets to use for your project such as Amazon Customer Behavior & Product Reviews, Social Media Influencers Dataset and E-commerce Behavior Data from Multi-Category Store.
Learn how to extract data using APIs and web scraping methods. You'll write your first scripts to collect data from selected sources.
Use Python and Pandas to clean and preprocess your extracted data. You'll prepare the data for analysis, ensuring it's in the correct format.
Set up a relational database using SQL to store your cleaned data. You'll design the database schema and load your data into it.
Write SQL queries to analyze customer behavior and product trends. You’ll generate insights and begin visualizing your findings.
Create a complete ETL pipeline that automates the extraction, transformation, and loading processes. You’ll ensure the pipeline is efficient and reliable.
Prepare and deliver a presentation summarizing your project and key insights. You'll showcase your GitHub repository and discuss your findings with peers and industry experts.
Get access to all of our Build projects, including this one, by creating your Build account!
Get started by submitting your application.
We'll notify you when projects reopen. In the meantime, you can explore our resources and learn more about our Fellows.
Shubham is a Data Science Build Fellow at Open Avenues, where he works with students leading projects in data engineering. Shubham is a data engineer at Covetrus, where he focuses on designing and implementing scalable data pipelines, optimizing ETL processes, and developing event-driven data management systems. He works extensively with technologies like Apache Kafka, DBT, and cloud platforms to ensure efficient data flow and analysis across the organization. Shubham has over 4+ years of experience in the data engineering field. His career journey has taken him through impactful roles at leading organizations like Pfizer and NJIT, where he honed his skills in data optimization and analytics. He's particularly passionate about leveraging data to enhance efficiency and support strategic decision-making. He holds a master's in computer science from the New Jersey Institute of Technology. A fun fact about Shubham is that he's an avid Formula 1 fan who loves to travel to different countries to watch races live, combining his passion for data with the thrill of high-speed motorsports.