-- Why bother building a new browser? For the first time since Netscape was released in 1994, it feels like we can reimagine browsers from scratch for the age of AI agents. The web browser of tomorrow might not look like what we have today.
We saw how tools like Cursor gave developers a 10x productivity boost, yet the browser—where everyone else spends their entire workday—hasn't fundamentally changed.
And honestly, we feel like we're constantly fighting the browser we use every day. It's not one big thing, but a series of small, constant frustrations. I'll have 70+ tabs open from three different projects and completely lose my train of thought. And simple stuff like reordering tide pods from amazon or filling out forms shouldn't need our full attention anymore. AI can handle all of this, and that's exactly what we're building.
Here’s a demo of our early version https://dub.sh/nxtscape-demo
-- What makes us different We know others are exploring this space (Perplexity, Dia), but we want to build something open-source and community-driven. We're not a search or ads company, so we can focus on being privacy-first – Ollama integration, BYOK (Bring Your Own Keys), ad-blocker.
Btw we love what Brave started and stood for, but they've now spread themselves too thin across crypto, search, etc. We are laser-focused on one thing: making browsers work for YOU with AI. And unlike Arc (which we loved too but got abandoned), we're 100% open source. Fork us if you don't like our direction.
-- Our journey hacking a new browser To build this, we had to fork Chromium. Honestly, it feels like the only viable path today—we've seen others like Brave (started with electron) and Microsoft Edge learn this the hard way.
We also started with why not just build an extension. But realized we needed more control. Similar to the reason why Cursor forked VSCode. For example, Chrome has this thing called the Accessibility Tree - basically a cleaner, semantic version of the DOM that screen readers use. Perfect for AI agents to understand pages, but you can't use it through extension APIs.
That said, working with the 15M-line C++ chromium codebase has been an adventure. We've both worked on infra at Google and Meta, but Chromium is a different beast. Tools like Cursor's indexing completely break at this scale, so we've had to get really good with grep and vim. And the build times are brutal—even with our maxed-out M4 Max MacBook, a full build takes about 3 hours.
Full disclosure: we are still very early, but we have a working prototype on GitHub. It includes an early version of a "local Manus" style agent that can automate simple web tasks, plus an AI sidebar for questions, and other productivity features (grouping tabs, saving/resuming sessions, etc.).
Looking forward to any and all comments!
You can download the browser from our github page: https://github.com/nxtscape/nxtscape
Bookmarks don't cut it anymore when you've got 25 years of them saved.
Falling down deep rabbit holes because you landed on an attention-desperate website to check one single thing and immediately got distracted can be reduced by running a bodyguard bot to filter junk out. Those sites create deafening noise that you can squash by telling the bot to just let you know when somebody replies to your comment with something of substance that you might actually want to read.
If it truly works, I can imagine the digital equivalent of a personal assistant + tour manager + doorman + bodyguard + housekeeper + mechanic + etc, that could all be turned off and on with a switch.
Given that the browser is our main portal to the chaos that is internet in 2025, this is not a bad idea! Really depends on the execution, but yeah.. I'm very curious to see how this project (and projects like it) go.
We spend 90%+ of our time in browsers, yet they're still basically dumb windows. Having an AI assistant that remembers what you visited, clips important articles (remember Evernote web clipper?), saves highlights and makes everything semantically searchable - all running locally - would be game-changing.
Everything stays in a local PostgresDB - your history, highlights, sessions. You can ask "what was that pricing comparison from last month?" or "find my highlights about browser automation" and it just works. Plus built-in self-control features to block distracting sites when you need to focus.
Beyond search and memory, the browser can actually help you work. AI that intelligently groups your tabs ("these 15 are all Chromium research"), automation for grunt work ("compare 2TB hard drive prices across these sites"), or even "summarize all new posts in my Discord servers" - all handled locally. The browser should help us manage internet chaos, not add to it.
Would love to hear what specific workflows are painful for you!
This would be that, but even better.
My computer should remember everything I did on it, period. It should remember every website I visited, exactly how far down I scrolled on each page, every thought I typed and subsequently deleted before posting... And it should have total recall! I should be able to rewind back to any point in time and track exactly what happened, because it's a computer. I already have a lossy memory of stuff that happened yesterday and that's inside my head. The whole point of having my computer remember stuff for me is that it's supposed to do it better than me.
And I want the search to be deterministic. I want to be able to input precise timestamps and include boolean operators. Yes, it would be helpful to have fuzzy matches, recommendations and a natural language processing layer too, but Lucene et al already did that acceptably well for local datasets 20+ years ago. It's great we have a common corpus, but I don't care about getting tokenized prose from the corpus, I care about the stuff I did on my own computer!
From my perspective LLMs don't bring much value on the personalized search front. The way I understand it, the nature of their encoding makes it impossible to get back the data you were actually looking for unless that data was also stored and indexed the traditional way, in which case you could have just skipped the layer of indirection and queried the source data in the first place.
I am also curious to see how all of this develops. I get a sense that the current trend of injecting LLMs everywhere is a temporary stop-gap measure used to give people the illusion of a computer that knows everything because researchers haven't yet figured out how to actually index "everything" in a performant way. But for the use case of personalized search, the computer doesn't actually need to know "everything", it only needs to know about text that was visible on-screen, plus a bit of metadata (time period, cursor position, clipboard, URL etc). If we currently still need an LLM to index that because snapshotting the actual text and throwing it into a traditional index requires too much disk space, okay, but then what's next? Because just being able to have a vague conversation about a thing I kindasorta maybe was doing yesterday is not it. Total recall is it.
I don't know about other browsers, but Safari does this. It's come in handy when I'm like "what was that site I visited two years ago?" and I can open my history and query to filter the list of pages, and there it is, January 17, 2023, yoda ate my balls retrospective
Sheesh.
As someone who uses web browsers that delete all session data when they're closed, and routinely wipes any "recently used" lists and temporary files in all operating systems I use, the thought of the machines I'm using remembering my usage behavior to such an extent is terrifying to me.
I mean, I get why those features would be appealing. I just have zero trust in the companies that build such software, because they've violated my trust time and time again. Yet I'm expected to suddenly trust them in this case? Not a chance. Not when the data that I would be entrusting them with for the features you mention are a literal gold mine for them. Trusting a trillion-dollar corporation with a history of privacy violations and hostility towards its users is just unthinkable for me, no matter what features I might be missing out on.
It's unfortunate that computing has come to this, but I choose the hardware and software I use very carefully. In cases where I'm forced to use a system I don't trust, I try my best to limit my activity and minimize the digital footprints I leave behind. I prefer using open source software for this reason, but even that is carefully selected, since OSS is easily corruptible by unscrupulous developers, and companies that use it as a marketing tactic.
The only way I might use software with that level of intrusion is if I've inspected every line of it, I run it inside a container with very limited permissions, or if I've written it myself. Needless to say, it's more likely I'll just miss out on using those features instead, and I'm fine with that.
How it all started:
“Bookmarks and shit don’t cut it anymore”.
Tsk tsk.
I’ll just leave this here:
https://youtu.be/kGYwdVt3rhI
Is this a common and well-defined term that people use? I've never heard it.
It would appear to me from the context that it means something like "web browser with AI stuff tackled on".
By "agentic browser" we basically mean a browser with AI agents that can do web navigation tasks for you. So instead of you manually clicking around to reorder something on Amazon or fill out forms, the AI agent can actually navigate the site and do those tasks.
Does having access to Chromium internals give you any super powers over connecting over the Chrome Devtools Protocol?
Few ideas we were thinking of: integrating a small LLM, building MCP store into browser, building a more AI friendly DOM, etc.
Even today, we use chrome's accessibility tree (a better representation of DOM for LLMs) which is not exposed via chrome extension APIs.
You might consider the Accessibility Tree and its semantics. Plain divs are basically filtered out so you're left with interactive objects and some structural/layout cues.
Chrome has a built-in LLM: https://developer.chrome.com/docs/ai/built-in
A good place to start is think about for example if you need to copy paste info from 100 websites to put into a spread sheet for example.
A complicated workflow may involve other tools. For example, the input to the LLM may produce something that tells it to set the user-agent to such and such as string:
Other tools could be clicking on things in the page, or even injecting custom JavaScript when a page loads.The tl;dr is that it's AI that makes decisions on its own.
On the other hand: this has the potential to be an absolute security Chernobyl. A browser is likely to be logged into all your sensitive accounts. An agent in your browser is probably going to be exposed to untrusted inputs from the internet by its very nature.
You have the potential for prompt injection to turn your life upside down in a matter of seconds. I like the concept but I wouldn't touch this thing with a ten foot pole unless everyone in the supply chain was PCI/SOC2/ISO 27001 certified, the whole supply chain has been vetted, and I have blood oaths about its security from third party analysts.
This is exactly why we're going local-first and open source. With cloud agents (like Manus.im), you're trusting a black box with your credentials. With local agents, you maintain control:
- Agents only run when you explicitly trigger them
- You see exactly what they're doing in real-time and can stop them
- You can run tasks in separate chrome user profile
- Most importantly: the code is open source, so you can audit exactly what's happening.
regardless, you did not answer OPs point, which is that any potentially malicious site can prompt inject you at any point, and trigger an MCP or any other action or whatever before you see them and stop them. The whole point of an AI browser is like self-driving car, being able to de-focus and let it do its thing. If i have to be nervous to watch if im getting hacked at any given second, then it's probably not a great product
Appreciate the agplv3 licence, kudos on that.
I get the general sentiment. But cursor for sure has improved productivity by a huge multiplicative factor, especially for simpler stuff (like building chrome extension).
Its kinda built really well without exposing webdriver etc and can comfortably run js and communicate with LLMs.Has full agentic capabilites.
Why a new browser instead of a robust extension?
https://chromedevtools.github.io/devtools-protocol/
Not vouching for this project, but just an example of the category existing: https://github.com/AgentDeskAI/browser-tools-mcp
In the meantime, the bigger opportunity with relatively little competition at the moment is the Web itself, not which application to browse it with. The Web absolutely sucks, and that's most of the reason we even feel the need for an "elevated browser experience" in the first place (i.e. lifted trucks on the information highway).
The Web sucks, because it was built naively, then optimized for profitable friction. But all of it stood on the assumption that the cost of production on the web involves highly skilled human labor. LLMs have shattered that assumption, but the effects have not manifested. Which is to say that the entire existing Web is probably going to become a marginal, legacy corner of a much bigger base of LLM-driven hypermedia contents that is yet to come.
In fact, we built one, rtrvr.ai that has even better Web Agent performance than Open AI's Operator with human assistant and 7x faster than leading competitor: https://www.rtrvr.ai/blog/web-bench-results
Your Accessibility Tree requirement is a poor excuse, rather you should build up an agent from a first principles understanding of DOM interactions.
A browser is a SERIOUS security risk, you need a dedicated team to just pull in the latest security patches that Google pushes to Chromium or your users are sitting ducks to exploits and hacks...
[1] https://data.sa.gov.au/data/dataset/reservoir-volumes-2018
some genuine feedback on a frustrating early experience:
- I ran the suggested "Group all my tabs by topic" in productivity agent mode. It worked great.
- I then asked it to remove all tab groups and reset things, but was told this:
- Tried "agent mode" and was told: - Basically was being sent back and forth. Went back to productivity mode and argued with it for a bit. The closest I could come to it removing all tabs groups was creating a new tab group encompassing all tabs, but couldn't get it to remove groups entirely. I'm guessing it might lack that API?Overall, it'd be nice if every browser level action it took had an undo button. Or at least if it was smart enough/able to remove the tab groups it just created.
Will keep playing with it more.
edit1: one more weird issue: While running the chat interface on chrome internal pages like chrome://extensions, it would randomly browse me to google.com for some reason.
edit2: confirmed that productivity mode lacks a tool to ungroup tabs, just a tool to create tab groups.
We have this agent mode and chat mode both with separate tools. I think the "prompt" today is not good enough, will see if there is a better way to address it.
Regarding un-grouping, that API is currently missing in chrome. I'm currently looking to add this support.
Hmm couple of people have asked for "undo" now. will see how we can implement this. I imagine something like cursor's "restore checkpoint" would be neat.
Quick question, do you think these productivity features are critical in your day to day workflow? Any specific examples you can share? :)
hah, absolutely not! That specific nit was pretty low priority... I definitely don't imagine I'll be asking it to ungroup tabs regularly, or if ever. But the loop of it being stuck between Chat and Agent mode seems like more generic, and might happen whenever there's a prompt neither agent can handle. Ideally it would have just said "I can't ungroup tabs".
Restoring to a checkpoint could be nice, but might not be worth it in the early stage if its high effort for you. I've yet to do anything actually useful/productivity focused with Nxtscape yet, but will keep exploring.
Bravo, to whom ever came up with the name.
Great product though.
A chat interface works for ChatGPT because most folks use it as a pseudo-search, but productivity tools are (broadly speaking) not generative, therefore shouldn't be using freeform inputs. I have many thoughts on fixing this, and it's a very hard problem, but simply slapping an LLM onto Chrome is just lazy. I don't mean to be overly negative, but it's kind of wild to see YC funding slop like this.
And that's exactly what this is: slop. There's no technical creativity here, this isn't a new product segment, it barely deserves the "hey bro, this might be a feature, not a product" startup 101 criticism. It's what ChatGPT would spit out if you asked it what a good startup idea would be in 2025. All we need to do, even if we were being as charitable as possible, is ask who's doing the heavy lifting here (hint: it's not in the Github repo).
Good luck, but in your place i would at least start with something that a certain ICP needs more. Many, many manhours have been wasted by ambitious technical founders on taking down Chrome. (many also starting from a chrome fork itself). But none of them succeeded. We only have limited energy
Definitely agree there is good amount of competition here.
But we do think there is a gap in the market for open-source, community driven and privacy-first AI browser. (Something like Brave?)
Brave is a decent example but their business model is actually complicated, it includes a lot of little stuff. And they dont have the unit cost of LLMs (im assuming at some point you will take the burden of the llms, if not local)
Island browser, chrome enterprise have kinda of validated the need for enterprise version of browser with VPN and DLP engine (data-loss-prevention).
https://chromium.googlesource.com/chromium/src/+/main/docs/l...
It seems like gemini 2.5 is pre-trained on chromium given it has a lot of context into the current state of the codebase. That may help a bit.
Also, I've not tried this yet, but maybe using an MCP server over a clangd language server is a solution to the context problem?
https://chromium.googlesource.com/chromium/src/+/master/docs...
I asked it to find the best price for a new Seagate Exos 24TB drive on Amazon (GPT-4o):
> The best price found for a new Seagate EXOS 24TB enterprise hard drive on Amazon is $274.99 (model ST24000NM000C). This is a renewed/refurbished unit with 4.1/5 stars from 39 reviews. Note: The search results show mostly renewed units, and new units may be available at different prices on individual product pages.
Ok, fair, you couldn't find a new one on Amazon. I would have prefered a "Couldn't find one" instead of leading with the price and only in the second sentence mentioning it's a refurb. Note, this is mostly an LLM failing, not Nxtscape, that part seemed to work well overall.
I then followed up with "I only want a new drive, please look at other popular electronic retailers". It replied with (formatted slightly):
> Best Buy - WD Blue 2TB Internal Hard Drive ($59.99, on sale from $70)
> Newegg - Desktop Internal Hard Drives (500GB up to 8TB available) - Price range: $180-$299 - 4.5 store rating - Features: 30-day returns, 5-9 day delivery
> Amazon - Seagate EXOS Enterprise Hard Drives (up to 24TB)
> Additional options include SSDs from Newegg ($60-$201) if you're interested in faster storage solutions.
As almost always with LLMs, I see where it went off the beaten path. I didn't specify "a new one with the same specs I originally asked for", I shouldn't have to. This is probably mostly on the LLM, I don't know if Nxtscape could improve that with prompting (I don't know exactly what they are sending to judge either way). Also it got lazy with the Amazon response (no price).
One way that Nxtscape might be able to improve is to parse out what the user is asking for, creating a data structure to define a "result" (in my case: url, name, price, description?), use that to prompt the LLM to conform to that shape, then take the results and pass them all through a one-off LLM instance to summarize the data. I think that would help with the inconsistencies in the results. Then again, that's very "Data extraction"/"data lookup"-focused and I haven't even played with using it for input: Fill out this form, loop this process for input (mail merge), etc.
Really cool idea and I'll try throwing some other problems at it as I think of them, but mostly for fun/research, this doesn't seem like force-multiplier for my normal workflows (yet).
Reg. first part, our thinking right now is similar to other OSS projects, we will have enterprise offering and charge for that.
So your thesis is that an AI agent should decide what I pay attention to, rather than me?
What could possibly go wrong?
Web browsers as applications are made completely useless by the AI wave and I fully expect only the webview portion to survive long run.
Microsoft, Apple, and Google are best positioned to capitalize on this. Meta is further behind but has decent glasses.
The pendulum is on its way back to native.
Website operators should not get a say in what kinds of user agents I used to access their sites. Terminal? Fine. Regular web browser? Okay. AI powered web browser? Who cares. The strength of the web lies in the fact that I can access it with many different kinds of tools depending on my use case, and we cannot sacrifice that strength on the altar of hatred of AI tools.
Down that road lies disaster, with the Play Integrity API being just the tip of the iceberg.
https://www.robotstxt.org/faq/what.html
But wonder if it matter if it the agent is mostly using it for "human" use cases and not scrapping?
> A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
This is absolutely not what you are doing, which means what you have here is not a robot. What you have here is a user agent, so you don't need to pay attention to robots.txt.
If what you are doing here counted as robotic traffic, then so would:
* Speculative loading (algorithm guesses what you're going to load next and grabs it for you in advance for faster load times).
* Reader mode (algorithm transforms the website to strip out tons of content that you don't want and present you only with the minimum set of content you wanted to read).
* Terminal-based browsers (do not render images or JavaScript, thus bypassing advertising and according to some justifications leading them to be considered a robot because they bypass monetization).
The fact is that the web is designed to be navigated by a diverse array of different user agents that behave differently. I'd seriously consider imposing rate limits on how frequently your browser acts so you don't knock over a server—that's just good citizenship—but robots.txt is not designed for you and if we act like it is then a lot of dominoes will fall.
[0] https://www.robotstxt.org/faq/what.html
Maybe some new standards and maybe a user configurable per site permissions may make it better?
I'm curious to see how this will turn out to be.
As a user, the browser is my agent. If I'm directing an LLM to do something on a page in my browser, it's not that much different than me clicking a button manually, or someone using a screen reader to read the text on a page. The browser is my user agent and the specific tools I choose to use in my browser shouldn't be forbidden by a webpage. (that's why to this day all browsers still claim to be Mozilla...)
(This is very different than mass scraping web pages for training purposes. Those should absolutely respect robots.txt. There's a big difference between a user operated agentic-browser interacting with a web page and mass link crawling.)
If any type of AI based assistance is supposed to adhere to the robot.txt, then would you also say that AI based accessibility tools should refuse to work on pages blocked by robot.txt?
What coherent definition of robot excludes Chrome but includes this?
No meatsack in the loop making decisions and pushing the button? Robots.txt applies.
If your browser behaves, it's not going to be excluded in robots.txt.
If your browser doesn't behave, you should at least respect robots.txt.
If your browser doesn't behave, and you continue to ignore robots.txt, that's just... shitty.
No, it's common practice to allow Googlebot and deny all other crawlers by default [0].
This is within their rights when it comes to true scrapers, but it's part of why I'm very uncomfortable with the idea of applying robots.txt to what are clearly user agents. It sets a precedent where it's not inconceivable that we have websites curating allowlists of user agents like they already do for scrapers, which would be very bad for the web.
[0] As just one example: https://www.404media.co/google-is-the-only-search-engine-tha...
I am not sure I agree with an AI-aided browser, that will scrape sites and aggregate that information, being classified as "clearly" a user agent.
If this browser were to gain traction and ends up being abusive to the web, that's bad too.
Where do you draw the line of crawler vs. automated "user agent"? Is it a certain number of web requests per minute? How are you defining "true scraper"?
> A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
To me "recursive" is key—it transforms the traffic pattern from one that strongly resembles that of a human to one that touches every page on the site, breaks caching by visiting pages humans wouldn't typically, and produces not just a little bit more but orders of magnitude more traffic.
I was persuaded in another subthread that Nxtscape should respect robots.txt if a user issues a recursive request. I don't think it should if the request is "open these 5 subreddits and summarize the most popular links uploaded since yesterday", because the resulting traffic pattern is nearly identical to what I'd have done by hand (especially if the browser implements proper rate limiting, which I believe it should).
[0] https://www.robotstxt.org/faq/what.html
Sort of like a backwards perplexity search. (LLM context is from open tabs rather than the tool that brings you to those tabs)
I built a tab manager extension a long time ago that people used but ran into the same problem- the concept of tab management runs deeper than just the tabs themselves.
I added few features which I felt would be useful - easy way to organise and group tabs - simple way to save and resume sessions with selective context.
What are your problems that you would like to see solved?
This would of course apply to not just open tabs but tabs I used to have open, where the LLM knows about my browsing history.
But I think I would want a non-chat interface for this. (of course at any time I could chat/ask a question as well)
Resist the call to open in a tab every link in this article, overcome the fear of losing something if all these tabs lagging behind are closed right now without further consideration.
* Buying a sofa. You want to filter for sofas of a specific size, with certain features, marketing sites want to feed you a bunch of marketing slop for each sofa before giving you the details . This generalises to many domains.
* You have a few friends who are still stuck on Facebook, you want to be notified if they post anything and avoid other rubbish
* The local neighborhood is stuck organising in a Facebook group or even worse, nextdoor. You want to see any new posts except for those couple of guys who are always posting the same thing.
* A government consultation website has been put up, but as a hurdle the consultation document has been combinatorially expanded to 763 pages by bureaucratic authoring techniques. You want to undo the combinatorial expansion do you can identify things you actually care about.
This jumped out to me as well. Even sites like Amazon lack per-item-cost sorting, which can be really helpful when buying in bulk. Historically we've seen people use scraping and data science to build sites like https://diskprices.com/; without using LLMs. If LLMs are useful for those types of tasks, perhaps we'll see a surge in similar sites instead of end users doing prompt engineering in their browser.
> You want to see any new posts except for those couple of guys who are always posting the same thing.
It looks like nextdoor supports blocking users, although other sites may not.
https://help.nextdoor.com/s/article/block-a-neighbor
While reviewing the prompt's capabilities, I had an idea: implementing a Greasemonkey/Userscript-style system, where users could inject custom JavaScript or prompts based on URLs, could be a powerful way to enhance website interactions.
For instance, consider a banking website with a cumbersome data export process that requires extra steps to make the data usable. Imagine being able to add a custom button to their UI (or define a custom MCP function) specifically for that URL, which could automatically pull and format the data into a more convenient format for plain text accounting.
Was huge fan of Tampermonkey back in the days.
edit: Just read about the accessibility thing, but that's thin. Is there any usecase in the future that a browser can, but an extension can't?
The only reason to use a browser over a chrome extension is to bypass security features, for example, trusted events. If a user wants the browser window to go to full screen or play a video, a physical mouse click or key press is required. Moreover, some websites do not want to be automated like ChatGPT web console and Chase.com which checks if the event was a trusted event before accepting a button click or key press. This means that a Chrome extension can not automate voice commands inferred with audio to text. However, to get a trusted event only requires the user to press a button, any button, so message or dialog prompt that says, "Press to go full screen," is all that is required. This can be down with a remote bluetooth keyboard also.
The way I see it, these limitations are in place for very, very good reasons and should not be bypassed. Moreover, there are much larger security issues using a agentic browser which is sending entire contents of a bank website or health records in a hospital patient portal to a third party server. It is possible to run OpenAI's whisper on webgpu on a Macbook Pro M3 but most text generation models over 300M will cause it to heat up enough to cook a steak. There are even bigger issues with potential prompt injection attacks from third party websites that know agentic browsers are visiting their sites.
The first step in mitigating these security vulnerabilities is preventing the automation from doing anything a Chrome extension can't already do. The second is blacklisting or opt in only allowing the agents to read and especially to write (fill in form is a write) any webpage without explicit permission. I've started to use VSCode's copilot for command line action and it works with permissions the same way such as only session only access.
I've already solved a lot of the problems associated with using a Chrome extension for agentic browser automation. I really would like to be having this conversation with people.
EDIT: I forgot the most important part. There are 3,500,000,000 Chrome users on Earth. Getting them to install a Chrome extension is much, much easier than getting them to install a new browser.
https://developer.chrome.com/docs/extensions/ai
Don't any of these fit the bill? Are they Gemini-locked and you want something else? I am not familiar with the Chrome API, so pardon my ignorance.
- Ship a small LLM along with browser - MCP store built in
To get the page content we parse accessibility tree.
What is the tech around the thing that segments out DOM elements automatically and shows the visual representation. I think something like this would be great for automated UI testing agents?
There's a straw man here. If you want to reorder an item on Amazon: click on 'order history', scroll, and click buy. This is a well-optimized path already and it doesn't require your full attention. I suspect the agent approach takes more effort as you need to type and then monitor what the AI is doing.
Also what's the business model?
> what's the reason for no Linux/Windows?
Sorry, just lack of time. Also we use Sparkle for distributing updates, which is MacOS only.
> Also what's the business model?
We are considering an enterprise version of the browser for teams.
The hype cycle business model never changes.
Instead of manually hunting across half a dozen different elements, then copy/paste and retype to put something into a format I want…
I can just get Dia do it. In fact, I can create a shortcut to get it to do it the same way every single time. It’s the first time I’ve used something that actually feels like an extension of the web, instead of a new way to simply act on it at the surface level.
I think the obvious extension of that is agentic browsers. I can’t wait for this to get built to a standard where I can use it every day… But how well is it going to run on my 16GB M1 Pro?
Google being a big one of those companies would soon side with those companies and not with the users, it's been their modus operandi, just recently some people got threats that if they don't stop using ad blockers in YouTube they will ban them from the platform.
Download form https://www.nxtscape.ai/ or our github page.
All the same, looks like y’all are having fun working on it, and maybe some unforeseen usecase will bubble up.
Try to look at it from another angle, maybe you then can see that the web is a solution looking for a problem in a way. Didn't stop it from being a massive success.
Oh cool, will look into basic.tech to understand more.
From a founder hat, I can see why the code base is a moat, hard problem. I hope the effort is worth the cost.
Good luck.
Yes, in the long run, owning the underlying codebase would allow us to have more control and provide better user experience. (Very similar to cursor forking vscode)
feel free to add new or upvote. Want to build what people want :)
Thank you! We have ollama integration already, you can run models locally and use that for AI chat.
- https://tsdr.uspto.gov/#caseNumber=76017078&caseSearchType=U...
> PROVIDING MULTIPLE-USER ACCESS TO A GLOBAL COMPUTER INFORMATION NETWORK FOR THE TRANSFER AND DISSEMINATION OF A WIDE RANGE OF INFORMATION; ELECTRONIC TRANSMISSION OF DATA, IMAGES, AND DOCUMENTS VIA COMPUTER NETWORKS; [ELECTRONIC MAIL SERVICES; PROVIDING ON-LINE CHAT ROOMS FOR TRANSMISSION OF MESSAGES AMONG COMPUTER USERS CONCERNING A WIDE VARIETY OF FIELDS]
- https://tsdr.uspto.gov/#caseNumber=76017079&caseSearchType=U...
> PROVIDING INFORMATION IN THE FIELD OF COMPUTERS VIA A GLOBAL COMPUTER NETWORK; PROVIDING A WIDE RANGE OF GENERAL INTEREST INFORMATION VIA COMPUTER NETWORKS
- https://tsdr.uspto.gov/#caseNumber=74574057&caseSearchType=U...
> computer software for use in the transfer of information and the conduct of commercial transactions across local, national and world-wide information networks
Also the fact that it's AGPL means this project is very copyleft and not compatible with business models.
I'm not saying that there is no place for copyleft open source anymore, but when it's in a clearly commercial project that makes me question the utility of it being open source.
https://www.gnu.org/licenses/why-affero-gpl.html
This means that if this company is successful and sells me 1 license, in theory I can request the source code and spin up Dr Evil's voice 1 billion clones and not pay licenses for those.
With other forms of GPL you only have to release the source code if you release the software to the user.
Saying that such a behavior encompasses all possible business models, it's like saying directorship is the only form of governance.
have linux next on our radar. What build do you want?
https://github.com/nxtscape/nxtscape/issues/5
It was cute when the internet was cute but now it's just boring.
But not gonna lie, as a tiny startup, we don’t have marketing budget of Perplexity or Dia, so we picked a name and icon that at least hinted at “browser” right away. Definitely not trying to mislead anyone -- just needed something recognizable out of the gate.