Comment by showerst - Hacker Neue

A point orthogonal to this; consider whether you need browser automation at all.

If a website isn't using Cloudflare or a JS-only design, it's generally better to skip playwright. All the major AIs understand beautifulsoup pretty well, and they're likely to write you a faster, less brittle scraper.

Etheryte 2 days ago

The vast majority of the modern internet falls into one of those two buckets though, no?

showerst OP 2 days ago

I mostly scrape government data so the sites are a little 'behind' on that trend, but no. Even JS heavy sites are almost always pulling from a JSON or graphql source under the hood.

At scale, dropping the heavier dependencies and network traffic of a browser is meaningful.

suchintan 2 days ago

Yeah, reverse engineering APIs is another fantastic approach. They aren't enough if you are dealing with wizards (eg typeform), but they can work really well

suchintan 2 days ago

IF you can use crawlers, definitely do.

They aren't enough for anything that's login-protected, or requires interacting with wizards (eg JS, downloading files, etc)

pavel_lishin 2 days ago

If.

This item has no comments currently.