IdeaCrawler is a blazing fast, highly effective, flexible client-server crawling framework written in Go. We built it to perform certain tasks better than existing crawlers. Based on the tests we conducted, we found that our crawler outperformed Scrapy and several other alternatives, many times over.

In addition to the features you would expect from a regular crawling library, IdeaCrawler makes it very easy to do things like scaling across machines, crawling through VPNs, using cookies, using Chrome as a crawling backend, etc. It was written to act as a layer on top of a regular crawling library, with a lot of pre-written glue code.

This framework also allows users to isolate crawling to a dedicated cluster, while still being controlled from a central location.

Contributor
Suresh Nakkiran

Leave a Reply

Your email address will not be published.