Comment by gojomo - Hacker Neue

gojomo Jun 8, 2022 parent

Interesting! But, it'd be helpful to clarify further the strength of the following claim:

> This demo allows private access to 6 GB (~30%) of English Wikipedia. In theory, even if the server is malicious, it will be unable to learn which articles you request. All article title searches are performed locally, and no images are available.

In this demo, the number of article-titles is relatively small – a few million – & enumerable.

If the server is truly malicious, and it issues itself requests for every known title, does it remain true that this "Private Information Retrieval" (PIR) scheme still gives it no hints that subsequent requests from others for individual articles retrieve particular data?

(Presumably: every request touches every byte of the same full 6GB of data, and involves every such byte in constant-run-time calculations that vary per request, and thus have the effect of returning only what each request wanted – but not at all in any way correlatable with other requests for the exact same article, from the same or different clients?)

blintz Jun 8, 2022

Indeed! The encryptions of the client queries are fully semantically secure - under relatively solid lattice-based cryptographic assumptions, a server would need to do more than 2^128 work to recover the plaintext of someone’s query. One query for one item in the database is indistinguishable (without the client’s key) from another query for the same item later; in other words, it’s similar to something like the guarantee of CBC or GCM modes, where as long as you use it correctly, it is secure even if the attacker can see many encryptions of its choosing.

throwaway2016a Jun 8, 2022

> without the client’s key

Thank you, these 4 words really helped with my understanding so I'm calling it out incase it helps others. So I was thinking, what prevents you from replaying the query and getting the same page back? But it seems the answer is: that would only produce a gibberish response because you don't have the key.

benlivengood Jun 8, 2022

How indistinguishable is a Not Found result by the server? It seems like user behavior would be to re-request the article a second time, so the client should probably protect the user against this kind of server (which could bisect on article popularity to find the requested article in ~20 tries) by throwing up a warning about an article in the index not being retrievable.

blintz Jun 8, 2022

Indeed, client behavior when an article fails to be retrieved is very important. A malicious server could guess what article is being retrieved, remove it from the database, and see if the client 'reacts' in some observable way.

The mitigations for this are somewhat heuristic; the client should perhaps 'pace'/'debounce' subsequent requests when a retrieval fails. For example, enforce that the re-retrieval will happen at a uniformly random time between 1-5 seconds later.

It's good to note that this kind of attack is somewhat unavoidable, since denial-of-service is (generally) always possible. An interesting mitigation might be for the server to produce a ZKP that the full dot product was computed correctly, and have the client always check the ZKP before displaying the article. This is mostly a theoretical fix for now I believe, since ZKP's over homomorphic computation are in the realm of theory more than practice.

benlivengood Jun 8, 2022

If the protocol is that the server returns a specific "index N deleted" result for deleted N then the client can at least trust a valid response from the server as opposed to a DDoS or unmasking attempt. Any server that returns something other than a valid document or "N deleted" should no longer be trusted or communicated with, but retries for communication failures or timeouts should still be safe.

Edit: this assumes that the client gets a trusted index from a set of trusted servers who are at least as up to date as the latest index that is made available, which timestamped signatures or similar should suffice for. Or the client can supply the timestamped index signature and the server can reply with a different message if it's too old.

lmm Jun 8, 2022

I think their model doesn't allow you to express such a thing? At implementation level the client requests an id in a fixed range and gets back a 100KB chunk corresponding to that id; if there's no article for that id then it'll just get a chunk of all 0s, and to anyone without the client's key that's indistinguishable from a chunk with content in it.

danuker Jun 8, 2022

Would this be vulnerable to a side-channel attack as follows?

1. Record what item was retrieved from disk for a query

2. Run a dictionary through the query system, and see which item matches the record

blintz Jun 8, 2022

The server processing has to scan the entire database to answer a query- so the entire database stays in memory, and access patterns tell you nothing about the query.

gojomo OP Jun 8, 2022

Notably to some who may misinterpret the "stays in memory" aspect: even the copy in memory is fully scanned by every query. So it's not just disk access patterns that don't give anything away, but RAM access patterns.

This item has no comments currently.