Preferences

Article literally starts with

    One way to solve this is end-to-end encryption. You and your friend agree
    on a secret key, known only to each other. You each use that key to encrypt
    your changes before sending them, decrypt them upon receipt, and no one in
    the middle is able to listen in. Because the document is a CRDT, you can
    each still get the latest document without the sync server merging the
    updates.

    That is indeed a solution,
but then for some reason claims that this schemes requires both parties to be on-line simultaneously. No, it doesn't, unless this scheme is (tacitly) supposed to be directly peer-to-peer which I find unlikely: if it were P2P, there would be no need for "the sync server" in the first place, and the description clearly states that in this scheme it doesn't do anything with document updates except for relaying them.

jakelazaroff
Hi, author here! The scenario was meant to be high level — I guess I should have gotten more into the various architectures and tradeoffs, but the article is already pretty long.

The way I see it there are a couple of ways this can shake out:

1. If you have a sync server that only relays the updates between peers, then you can of course have it work asynchronously — just store the encrypted updates and send them when a peer comes back online. The problem is that there's no way for the server to compress any of the updates; if a peer is offline for an extended period of time, they might need to download a ton of data.

2. If your sync server can merge updates, it can send compressed updates to each peer when it comes online. The downside, of course, is that the server can see everything.

Ink & Switch's Keyhive (which I link to at the end) proposes a method for each peer to independently agree on how updates should be compressed [1] which attempts to solve the problems with #1.

[1] https://github.com/inkandswitch/keyhive/blob/main/design/sed...

josephg
> The problem is that there's no way for the server to compress any of the updates; if a peer is offline for an extended period of time, they might need to download a ton of data.

There are ways to solve this, using deterministic message chunking. Essentially clients compress and encrypt “chunks” of messages. You can use metadata tags to tell the server which chunks are being superseded. This is fast and efficient.

Alex Good gave a talk about how he’s implementing this in automerge at the local first conference a few weeks ago:

https://www.youtube.com/watch?v=neRuBAPAsE0

nixpulvis
I would really love this to be added to the article (or a followup), since it was my conclusion as well, but most readers are going to be thinking the same thing.

I'd also love to know how balancing the trade off of compute time between FHE and the bloat of storing large change sets affects latency for online and offline cases.

Perhaps, as with many things, a hybrid approach would be best suited for online responsiveness and offline network and storage use?

Admittedly, I haven't read the linked research papers at the end. Perhaps they have nice solutions. Thanks for that.

jakelazaroff
Okay, I updated it to hopefully clarify!
killerstorm
There's another option: let the clients do the compression. I.e. a client would sign & encrypt a message "I applied messages 0..1001 and got document X". Then this can be a starting point, perhaps after it's signed by multiple clients.

That introduces a communication overhead, but is still likely to be orders of magnitude cheaper than homomorphic encryption

Now you need a consensus algorithm.
josephg
You don’t. Instead of sending the resulting state after compressing, you can compress a chunk of operations which get merged in a block by the client.

It works fine.

jason_oster
The protocol half of most real world CRDTs do not want to send the raw stream of changes. They prefer to compress changes into a minimal patch set. Each patch set is specific to individual peers, based on the state of their local CRDT at merge time.

The naive raw stream of changes is far too inefficient due to the immense amount of overhead required to indicate relationships between changes. Changing a single character in a document needs to include the peer ID (e.g., a 128-bit UUID, or a public key), a change ID (like a commit hash - also about 128-bit), and the character’s position in the document (usually a reference to the parent’s ID and relative marker indicating the insert is either before or after the parent).

The other obvious compression is deletions. They will be compressed to tombstones so that the original change messages for deleted content does not need to be relayed.

And I know it is only implied, but peer to peer independent edits are the point of CRDTs. The “relay server” is there only for the worst case scenario described: when peers are not simultaneously available to perform the merge operation.

This item has no comments currently.