Thank you, these 4 words really helped with my understanding so I'm calling it out incase it helps others. So I was thinking, what prevents you from replaying the query and getting the same page back? But it seems the answer is: that would only produce a gibberish response because you don't have the key.
The mitigations for this are somewhat heuristic; the client should perhaps 'pace'/'debounce' subsequent requests when a retrieval fails. For example, enforce that the re-retrieval will happen at a uniformly random time between 1-5 seconds later.
It's good to note that this kind of attack is somewhat unavoidable, since denial-of-service is (generally) always possible. An interesting mitigation might be for the server to produce a ZKP that the full dot product was computed correctly, and have the client always check the ZKP before displaying the article. This is mostly a theoretical fix for now I believe, since ZKP's over homomorphic computation are in the realm of theory more than practice.
Edit: this assumes that the client gets a trusted index from a set of trusted servers who are at least as up to date as the latest index that is made available, which timestamped signatures or similar should suffice for. Or the client can supply the timestamped index signature and the server can reply with a different message if it's too old.
1. Record what item was retrieved from disk for a query
2. Run a dictionary through the query system, and see which item matches the record
> This demo allows private access to 6 GB (~30%) of English Wikipedia. In theory, even if the server is malicious, it will be unable to learn which articles you request. All article title searches are performed locally, and no images are available.
In this demo, the number of article-titles is relatively small – a few million – & enumerable.
If the server is truly malicious, and it issues itself requests for every known title, does it remain true that this "Private Information Retrieval" (PIR) scheme still gives it no hints that subsequent requests from others for individual articles retrieve particular data?
(Presumably: every request touches every byte of the same full 6GB of data, and involves every such byte in constant-run-time calculations that vary per request, and thus have the effect of returning only what each request wanted – but not at all in any way correlatable with other requests for the exact same article, from the same or different clients?)