// How to build a serverless web crawler / codeisgo.com

James Beswick wrote this piece about how to use serverless to scale an old concept for the modern era. It describes a client project which involved the need to crawl a large media site to generate a list of URLs and site assets.

For Node users, there’s a package that does this elegantly called Website Scraper, and it has plenty of configuration options to handle those questions plus a lot of other features. The package is mostly configuration-driven.

The article is split into:

Let’s crawl before we run
Serverless Crawler — Version 1.0
Serverless Web Crawler 2.0 — now slower!
Show me the code!
Serverless Web Crawler 3.0

There are many reasons to crawl a website – and crawling is different to scraping. The article describes evolution of lambda aws based solution with detailed experience and learnings. Great!

[Read More]

Tags serverless containers cicd web-development