This project is no longer accepting applications. Subscribe to our newsletter to be notified of new projects!
Learn how the web utilizes the HTTP protocol to transfer HTML pages and how to build a scraper that collects structured information from any website.
These days, many companies provide daily-changing information on their website, but it could be a tedious job to collect them every day. The ability to create a scraper helps you get valuable data from static or dynamic websites. In this Build Project, you will gather and structure data from any public website without relying on external data APIs, allowing for time series analysis or monitoring recent information. Data ownership is crucial, so we will also cover the importance of robots.txt to prevent legal issues.
Get to know the Build Fellow and other students, ask questions about the project requirements, prepare your workspace.
Study the basics of Node.js and JavaScript including basic syntax and the concept of Node.js and JavaScript. We will use VSCode to write simple code blocks. We will deliver a simple runnable Node.js script.
Learn HTTP protocol and how to request/response the web documents. We will deliver a simple static web site scraper.
Learn how to define a simple data model and convert the previous data to the structured data. We will deliver the structured data of the previous static web site.
Use an external library to save data into CSV file. Learn what the file system is and how to save them into the file system. We will deliver the CSV file of the previous structured data.
Study about JSON format and how to find the JSON file from the website. We will deliver a scraper for the dynamic web site.
Learn what kind of restriction and how can we avoid the problem of scraping. We will check the robots.txt and deliver the program including how to avoid the robots.txt. And if we have time, we will learn how to send crawling data to Slack.
Polish your project deliverables and present them to the Build Fellow and other students in the final group session.
Get access to all of our Build projects, including this one, by creating your Build account!
Get started by submitting your application.
We'll notify you when projects reopen. In the meantime, you can explore our resources and learn more about our Fellows.
I am a seasoned software engineering professional with extensive experience in backend and frontend development. My expertise spans across various platforms and languages, including Kotlin, Node.js, Kubernetes (K8S), React, ReactNative, and Flutter. Throughout my career, I have successfully led engineering teams, overseen large-scale projects, and developed innovative applications for both mobile and web environments. Notably, I was appointed CTO during a pivotal spinoff from Skelter Labs, where I managed and expanded key products such as Kyte and Playwings, enhancing their functionality and user engagement.In my role as CTO, I was responsible for writing design documents, developing systems to meet specific requirements, and collaborating closely with Marketing, Sales, and Customer Support teams. I supervised and structured engineering teams of over 20 developers, optimizing resource allocation and estimating project deadlines to ensure timely delivery. Prior to my tenure as CTO, I served as a Senior Software Engineer at Skelter Labs, where I led the Kyte Team to achieve significant milestones, including over 200k downloads across mobile platforms, and spearheaded the development of IoT applications and a blockchain-based dApp. My experience with cutting-edge technologies and leadership in diverse tech domains has equipped me with a robust skill set to drive innovation and excellence in software engineering.