Preferences

showerst parent
A point orthogonal to this; consider whether you need browser automation at all.

If a website isn't using Cloudflare or a JS-only design, it's generally better to skip playwright. All the major AIs understand beautifulsoup pretty well, and they're likely to write you a faster, less brittle scraper.


Etheryte
The vast majority of the modern internet falls into one of those two buckets though, no?
showerst OP
I mostly scrape government data so the sites are a little 'behind' on that trend, but no. Even JS heavy sites are almost always pulling from a JSON or graphql source under the hood.

At scale, dropping the heavier dependencies and network traffic of a browser is meaningful.

suchintan
Yeah, reverse engineering APIs is another fantastic approach. They aren't enough if you are dealing with wizards (eg typeform), but they can work really well
suchintan
IF you can use crawlers, definitely do.

They aren't enough for anything that's login-protected, or requires interacting with wizards (eg JS, downloading files, etc)

pavel_lishin
If.

This item has no comments currently.