Preferences

1 point
Hello fellow hackers,

I am trying to create a tool for myself that can crawl a few websites that I usually go on to compare the price of the same item. They have no APIs.

A couple questions:

1. is this LEGAL? 2. if I am crawling, what is the best way to approach this? does each website's crawling mechanism have to be manually written since they are unique or is there some strategy for scale if i need to expand the number of sites I crawl through in the future?

Thank you!

-F75


Yes it’s legal. Just don’t check the price ever 2 seconds from 800 locations.

A simple way would be a headless browser [1]

But there are also hosted tools that work like a website builder.

The best way is: keep it simple and keep back (check once an hour or day and not every minute).

Many shops use Schema.org markup. So if they support it, you don’t have to write it for every site.

You could also use a library that works with raw html and css. Then you could just use css selectors for extraction.

[1] https://www.atlantbh.com/building-a-dynamic-crawler-with-pup...

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal