Comment by wild_egg - Hacker Neue

wild_egg 4 days ago parent

Not to pull a "why should I use Dropbox when I have rsync" but why should we use this over adding a Playwright MCP to Claude Desktop or similar?

Does having access to Chromium internals give you any super powers over connecting over the Chrome Devtools Protocol?

felarof 4 days ago

Yes, eventually we think there is more value of owning the entire stack than just be a MCP connector.

Few ideas we were thinking of: integrating a small LLM, building MCP store into browser, building a more AI friendly DOM, etc.

Even today, we use chrome's accessibility tree (a better representation of DOM for LLMs) which is not exposed via chrome extension APIs.

pickpuck 4 days ago

> building a more AI friendly DOM

You might consider the Accessibility Tree and its semantics. Plain divs are basically filtered out so you're left with interactive objects and some structural/layout cues.

faxmeyourcode 3 days ago

I've been trying (albeit not very hard) to build an accessibility library and toolset that can be exposed via mcp server. I think it has the potential to be much more ergonomic for generalized computer-use agents than stuff like playwright or the classic screenshot approach. Low latency computer use is another thing that I'd like to solve.

The issue is mac and windows accessibility APIs are opaque and I have no idea what I'm doing so I'm forced to vibe code it all which is not turning out too well... :-)

I suffer from mild carpal tunnel so I want to build a really low latency computer use agent that can do anything on my computer without me having to learn the talon voice syntax or some other traditional accessibility software like mac dictation.

pickpuck 3 days ago

Neat, is it on github?

faxmeyourcode 3 days ago

Not yet, I've gone through a few prototypes that haven't really worked. Nothing has stuck enough to really get far enough for a repo.

I will try to publish something on gh this weekend.

xnx 3 days ago

> Few ideas we were thinking of: integrating a small LLM

Chrome has a built-in LLM: https://developer.chrome.com/docs/ai/built-in

shortrounddev2 4 days ago

I would take the position of "why use this when I have eyes and hands and a brain?"

nsonha 3 days ago

Why use any tool when you have bare hands bla bla...

A good place to start is think about for example if you need to copy paste info from 100 websites to put into a spread sheet for example.

b0ner_t0ner 3 days ago

Why should I use a calculator when I can use an abacus?

faxmeyourcode 3 days ago

Why use an abacus when I can just use my fingers and toes?

tolerance 4 days ago

My guess is that this is for impatient people; people who think that the prescribed use cases are somehow necessary for their "workflows"; people who subscribe to terms like "cognitive friction" within the context of these use cases; people who are...sort of lazy.

zahlman 4 days ago

...Why do these lazy people put so much effort into coming up with fancy words to justify that laziness?

tolerance 4 days ago

That's a really good question. Maybe it's because laziness is associated with a lack of intellect? And certain technologies, like AI and other software, are meant to augment our intellect.

These fancy words carry an intellectual/productive effect. When they're put to use it probably makes people feel like they're getting things done. And they never feel lazy because of this.

This item has no comments currently.