What is web scraping/crawler?

Description

Web scraping or web crawler represents when a bot downloads part or all of the content of a website. In principle, this process is always performed by automated bots. These bots can download all the content of a website in seconds.

How do bots take content?

The content bot sends a series of HTTP GET requests. Then it copies and saves all the information that the webserver has sent in response to a database (MySQL, MongoDB, etc.).

More complex content bots can run JavaScript to, for example, fill out a form on a website and download any closed content.

Of course, anyone can manually copy or download an entire website, but bots can crawl and download all of the website's content in seconds. For large sites such as an e-commerce site with hundreds of pages, the download can take up to a few minutes.

What is the purpose of downloading content from a website?

Download text

This is a scan of websites for contact information, a phone number or an email address. Email collection bots are aimed at retrieving email addresses, usually in order to find new spam emails.

Download prices from an e-shop

This is when a company downloads all the price information from a competitor's website so that it can adjust its own prices accordingly.

Projects

linkedin icon facebook icon xing icon