Preferences

pscanf
Joined 181 karma
My software-related posts: https://pscanf.com/s/

  1. > when dealing with a lot of unknowns it's better to allow divergence and exploration

    I completely agree, though I'm personally sitting out all of these protocols/frameworks/libraries. In 6 months time half of them will have been abandoned, and the other half will have morphed into something very different and incompatible.

    For the time being, I just build things from scratch, which–as others have noted¹–is actually not that difficult, gives you understanding of what goes on under the hood, and doesn't tie you to someone else's innovation pace (whether it's higher or lower).

    ¹ https://fly.io/blog/everyone-write-an-agent/

  2. Thanks for the detailed answer!

    I think I do want to solve conflicts. My use case is for a personal database, which simplifies things a bit: sync is between devices of a single person, so it's unlikely for concurrent offline changes to occur.

    What I have in mind is a setup like the one from this experiment: https://tonsky.me/blog/crdt-filesync/ . I don't know if it's at all possible in my use case though, or–in case it is possible–if it ends up being practical. As you said, the resulting user experience might be so strange that it's not something users want.

    Anyway, thanks again for the info and good luck with DocNode. :)

  3. Hey German. Congrats on shipping the project!

    Reading the "Why another CRDT / OT library?" I like that you seem to have taken a "Pareto approach": going for a simpler solution, even if not theoretically perfect. In the past few months I've been building a local-first app, and I've been a bit overwhelmed by the complexity of CRDTs.

    The goal that I have with my app is to allow syncing between devices via Dropbox / Google Drive / iCloud or any another file-syncing server that the user is already using. I don't want to host a sync server for my users, and I don't want my users to need to self-host either.

    Do you think it would be possible to use Dropbox as the sync "transport" for DocNode documents? I'm thinking: since a server is needed, one device could be designated as the server, and the others as clients. (Assuming a trusted environment with no rogue clients.)

  4. > If TOOL_X needs $DATA and that data is not available in context, nor from other tools, then the LLM will determine that it cannot use or invoke TOOL_X. It won't try.

    I didn't explain myself very well, sorry. What I had in mind is: MCP is about putting together workflows using tools from different, independent sources. But since the various tools are not designed to be composed, scenarios occur in which in theory you could string together $TOOL_Y and $TOOL_X, but $TOOL_Y only exposes $DATA_SUBSET (because it doesn't know about $TOOL_X), while $TOOL_X needs $DATA. So the capability would be there if only the tools were designed to be composed.

    Of course, that's also the very strength of MCP: it allows you to compose independent tools that were not designed to be composed. So it's a powerful approach, but inherently limited.

    > About the TOOL_Z and TOOL_W scenario. It sounds like you're asking about the concept of a distributed unit-of-work which is not considered by MCP.

    Yes, distributed transactions / sagas / etc. Which are basically impossible to do with "random" APIs not designed for them.

  5. I see MCP as fundamentally limited: even if we had an LLM that knew how to use it perfectly, at the end of the day MCP workflows are integrations between many different APIs that were not designed to be composed and to work together.

    What if $TOOL_X needs $DATA to be called, but $TOOL_Y only returns $DATA_SUBSET? What happens when $TOOL_Z fails mid-workflow, after $TOOL_W has already executed?

  6. That's why I emphasized _routing in a single page application_. If one needs SEO, a client-rendered single page application is the wrong choice, regardless of the router.

    > One big thing: what if you want to support SSR, which I think is a pretty basic requirement these days?

    I agree it's a basic requirement for a certain class of apps and websites, but there are tons of apps for which SSR is not relevant and even detrimental (in the sense that it adds complexity that is not offset by the benefits it brings).

  7. Yeah, implementing it with data flowing one-way only from the URL to the state is cleaner.

    Conceptually, however, I prefer to think of my state being at the center of things. I mean, that's where I define (via types) what the state is. The URL is just one serialization of that state that is convenient to use in a web browser (making it work with links, back/forth buttons, etc). Maybe in another environment another serialization would be needed. Or maybe no serialization could be needed at all (making it a memory router).

  8. I've done something similar in a React project I'm working on to avoid dealing with the insanity that is react-router.

    Call me naïve, but routing in a single page application is just not that hard of a problem. At the core it's about having a piece of state¹ (your active route) which determines which part of the app you want to render–something you can do with a switch statement². On top of that, you want to synchronize that state to the page URL³.

    Doing it yourself requires more boilerplate code, no question about it. But it's not that much code tbh (not very complex either), and you get back control over that important piece of state, which otherwise remains opaque and difficult to work with–i.e., its "shape" is pre-determined by the routing library you use. For example, react-router doesn't support parallel routes.

    ¹ https://github.com/superegodev/superego/blob/main/packages/a...

    ² https://github.com/superegodev/superego/blob/main/packages/a...

    ³ https://github.com/superegodev/superego/blob/main/packages/a...

  9. Adding my own to the list: https://github.com/superegodev/superego (Warning: still an alpha.)

    Distinctive points:

    - It exposes the "database metaphor": your data is organized in collections of documents, each collection having a well-defined schema.

    - It's all local in an app (no server component to self-host).

    - It has an AI assistant on top that you can use to explore / create / update.

    - It allows you to create small personal apps (e.g., a custom dashboard).

    - It allows you to sync data from external sources (Strava, Google Calendar, Google Contacts.)

    Cons:

    - The database metaphor is quite "technical". A "normal" user is not comfortable with the idea of creating their own collections, defining a schema, etc. In fact, right now I only have developers and techies as a target audience.

    - It's not optimized for any one use case. So, for example, as a notes-keeper Notion is obviously much better.

    - It's still in early stages (I'm working on it alone), so:

      - There's no mobile app yet.
    
      - It doesn't yet support syncing between devices.
    
      - There are just 3 connectors to sync from external sources.
  10. Ah ah, probably true. :)
  11. I agree when talking about Facebook and other ad-tech companies. But "my data" is also my runs that I track with Garmin, my notes in Notion, my meals on MyFitnessPal, my events on Google Calendar...

    I want to have _that_ data locally. And why are all these companies making it so incredibly difficult for me to get it? It's MY data after all!

    But the default approach of every app is:

    > We'll manage your data for you, in our cloud! Ah, btw, you'll only have access to it when online, and only for as long as you pay the subscription fee. Also, if we go out of business, sorry, it's gone.

    > <fineprint> You _can_ request a copy of it (damn GDPR), but only once every 30 days, it'll take us 48 hours to prepare it, and we'll send (some of) it to you as a badly-formatted CSV carefully crafted to make it as useless as possible. </fineprint>

  12. > Imagine a world where your data isn’t trapped in distant data centers. Instead, it’s close to home—in a secure data wallet or pod, under your control. Now imagine pairing that with a loyal personal AI assistant, a private, local tool that lives with you, learns from you (with your permission), and acts on your behalf. Your AI. Not theirs.

    This is almost exactly what I say on the landing page¹ of the product I'm building (an open-source personal database, with an AI assistant on top).

    I want to believe this can be a reality, and I'm trying to make it become one, but there are two significant challenges:

    1. AI = cloud. Taking my app as an example, it'll be at least 2 years before consumer hardware will be able to run the smallest model that performs somewhat decently (gpt-oss-20b). And of course in 2 years that model will be beyond obsolete. Would a regular user pay the price of a subpar experience in order to get data ownership and privacy? It's a very hard sell.

    2. Apps/services are very jealous of their users' data. As a user I have to jump through incredible hoops just to get a point-in-time copy of my data. If I can get that at all. There is no incentive for apps to allow their users to own their data. On the contrary, it's better if they don't, so they remain locked in the app. Also, regular Joe and Jane users are not really asking to have access to their data, because there's no benefit for them either.

    That is, I think, the key to overcome challenge #2: giving regular Joes and Janes an immediate and obvious benefit. If they see that only by owning their data they can do $INCREDIBLY_VALUABLE_THING, then they will themselves start demanding companies access to it, or they will jump through the hoops to get it. (That's the way I'm going about it. I'm nowhere near the end goal, of course, but I see promising results.²)

    I have no idea how to overcome challenge #1 yet. Mainly because currently there aren't really any big downsides to using cloud models. Or, at least, we haven't seen them yet. Maybe if OpenAI starts injecting ads in GPT-8 responses, people will reconsider using a "stupider" but local, ad-free model.

    ¹ https://superego.dev/

    ² https://pscanf.com/s/350/

  13. (Sorry for the shameless self-promotion.) I'm building an app _conceptually similar_, but with an AI on top, so you get a chat/assistant with your personal context. https://github.com/superegodev/superego (Warning: still in alpha.)
  14. Hey! Your project looks very similar, from a conceptual point of view, to what I'm doing with https://github.com/superegodev/superego. Would you like to have a chat about it? Email in my profile, if you're interested.
  15. https://github.com/superegodev/superego

    An open-source, local database which collects all your personal data, hooks it to an LLM (BYO), and gives you an assistant that can answer any question about your life.

    It also allows you to vibe-code (or just code) small apps on top of your data (e.g., your custom dashboard for your expenses).

    I have a short demo here: https://www.youtube.com/watch?v=gqAyvENDjSA

  16. Nice experiment!

    I'm using a similar approach in an app I'm building. Seeing how well it works, I now really believe that in the coming years we'll see a lot of "just-in-time generation" for software.

    If you haven't already, you should try using qwen-coder on Cerebras (or kimi-k2 on Groq). They are _really_ fast, and they might make the whole thing actually viable in terms of speed.

  17. I have a similar use case in the app I'm working on. Initially I went with JSONata, which worked, but resulted in queries that indeed felt more like incantations and were difficult even for me to understand (let alone my users).

    I then switched to JavaScript / TypeScript, which I found much better overall: it's understandable to basically every developer, and LLMs are very good at it. So now in my app I have a button wherever a TypeScript snippet is required that asks the LLM for its implementation, and even "weak" models one-shot it correctly 99% of the times.

    It's definitely more difficult to set up, though, as it requires a sandbox where you can run the code without fears. In my app I use QuickJS, which works very well for my use case, but might not be performant enough in other contexts.

  18. A better-argued, less ranty version of a blog post I wrote some time ago which generated some interesting discussions here, and to which the article above is a reply / complement.

    I think the prime number example (if obviously artificial) illustrates well the point I also wanted to make: while it might be technically possible to perfectly represent this function's I/O with the type system, what are the benefits and what are the costs? We seem to broadly agree that, as one approaches type perfection, benefits get smaller while costs grow bigger. A classic engineering tradeoff.

    I also really liked how the author highlights the differences between application code and library code, which I hadn't taken much into consideration in my post. For example, I really didn't think about how using a library can be frustrating in many other ways, not only types! But, as he notes, we're accustomed to that, so it goes more unnoticed. And I really like the idea of having features in the language that allow encapsulating types. No clue what they could look like (I don't have strong CS foundations, so I'm out of my depth here), but the idea sounds very appealing.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal