Your application

Please complete the following fields to be considered for this project.

Please fill in this required field.
Please fill in this required field.
Please fill in this required field.
Please fill in this required field.
Please fill in this required field.
How much commitment will you have to this project?
Please select an option.
Are you available to dedicate 1-2 hours per week to the Build Project?
Please select an option.
Your application has been 
successfully submitted!
Explore more projects
Close
You already submitted an application for this project.
Explore more projects
Close
There was an error submitting your form. Please try again later or contact us.
Oops! Something went wrong while submitting the form.

This project is no longer accepting applications. Subscribe to our newsletter to be notified of new projects!

Get updates
Build Your Scraper: Collect Reusable Data From Any Website
Dennis Lim
Dennis Lim
Get updates
Register today
Apply now

Build Your Scraper: Collect Reusable Data From Any Website

Learn how the web utilizes the HTTP protocol to transfer HTML pages and how to build a scraper that collects structured information from any website.

Register today
Apply now
Tuesdays
 at
5:00
P.M.
 ET /
2:00
P.M.
PT
8 weeks, 2-3 hours per week
Intermediate
No experience required
No experience required
Some experience required
Degree and experience required

Description

These days, many companies provide daily-changing information on their website, but it could be a tedious job to collect them every day. The ability to create a scraper helps you get valuable data from static or dynamic websites.  In this Build Project, you will gather and structure data from any public website without relying on external data APIs, allowing for time series analysis or monitoring recent information. Data ownership is crucial, so we will also cover the importance of robots.txt to prevent legal issues.

Session timeline

  • Applications open
    May 27, 2024
  • Application deadline
    June 23, 2024
  • Project start date
    Week of July 8, 2024
    Week of
    July 8, 2024
  • Project end date
    Week of

What you will learn

  • Learn how to run Node.js program and simple syntax of javascript
  • Build a simple scraper of static/dynamic websites
  • Inspect websites with Chrome and know about the web request/response process
  • Know how robot.txt works
  • Send a summary in Slack as a notification

Project workshops

1
Introduction
2
Basic Syntax
3
Static Website Scraper
4
Structured Data
5
File System
6
Dynamic Website Scraper
7
Regulation
8
Wrap-up

Prerequisites

  • Syntax of Javascript and how interpreter runs the code segment
  • Basic knowledge of Node.js and differences between Node.js (Backend) and Web Javascript (Frontend)
  • Understanding the cycle of HTTP Protocol (request and response)
  • How to read Robot.txt and why websites provide this

Sign up today

Get access to all of our Build projects, including this one, by creating your Build account!

Register today
Log in

Apply to

Dennis

's project today!

Get started by submitting your application.

Apply now

Stay updated!

Subscribe to our newsletter to be notified when projects reopen!

Please fill in this required field.
By clicking “Subscribe” you agree to our Terms of Services and Privacy Policy.

Thanks for subscribing!

We'll notify you when projects reopen. In the meantime, you can explore our resources and learn more about our Fellows.

Discover our articles
There was an error submitting your form. Please try again later or contact us.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
About the expert

Dennis Lim

Software Development Fellow
Open Avenues Foundation

I am a seasoned software engineering professional with extensive experience in backend and frontend development. My expertise spans across various platforms and languages, including Kotlin, Node.js, Kubernetes (K8S), React, ReactNative, and Flutter. Throughout my career, I have successfully led engineering teams, overseen large-scale projects, and developed innovative applications for both mobile and web environments. Notably, I was appointed CTO during a pivotal spinoff from Skelter Labs, where I managed and expanded key products such as Kyte and Playwings, enhancing their functionality and user engagement.In my role as CTO, I was responsible for writing design documents, developing systems to meet specific requirements, and collaborating closely with Marketing, Sales, and Customer Support teams. I supervised and structured engineering teams of over 20 developers, optimizing resource allocation and estimating project deadlines to ensure timely delivery. Prior to my tenure as CTO, I served as a Senior Software Engineer at Skelter Labs, where I led the Kyte Team to achieve significant milestones, including over 200k downloads across mobile platforms, and spearheaded the development of IoT applications and a blockchain-based dApp. My experience with cutting-edge technologies and leadership in diverse tech domains has equipped me with a robust skill set to drive innovation and excellence in software engineering.

Visit
Dennis
's Linkedin
More like this Project