Or does the server go through a large chunk of its memory (say, at least a quarter of all of Wikipedia) and perform some oblivious computation on all of that data (applying the result modulo this 100KB return buffer)? That sounds very resource-intensive, at least for something large like Wikipedia (a doctor's office with some information pages of a few KB each could more easily do such a thing).
In the latter case, is each request unique (does it involve some sort of IV that the client can xor out of the data again) or could an index be built similar to a list of hashed PIN codes mapped back to plain text numbers?
Edit: I had already read some comments but just two comments further would have been my answer... :) https://www.hackerneue.com/item?id=31669630
> > With a proper implementation of PIR, the server still needs to scan through the entire encrypted dataset (this is unavoidable, otherwise its I/O patterns would leak information)
Now let's see if I can also find the answer regarding the ability to build a lookup table...
Edit #2: found that also! https://www.hackerneue.com/item?id=31669924
> One query for one item in the database is indistinguishable (without the client’s key) from another query for the same item later; in other words, it’s similar to something like the guarantee of CBC or GCM modes, where as long as you use it correctly, it is secure even if the attacker can see many encryptions of its choosing.
That is some cool stuff indeed. I'm going to have to up my game when building or reviewing privacy-aware applications. Sure, a file sharing service is not going to practically allow this, but I'm sure that with this knowledge, I will come across places where it makes sense from both a usefulness (e.g. medical info) and practicality (data set size) perspective.
Well, as the author points out here [0], it doesn't actually translate to exorbitant cost increases when almost all the cost is in bandwidth rather than compute. A file sharing service rather seems like an ideal example (assuming you can derive additional revenue from the privacy promises).
For me, compute has always been by far my expense, and that's without homomorphic encryption! Buying a server or the monthly rental (e.g. VPS) costs is where most of my budget goes. Next on the expense list are a few domain names, and probably about as much in electricity if we're considering the at-home scenario. Bandwidth is usually included, be it with the VPS or with the server I run at home (since I want that internet uplink for normal use anyway).
The memory overhead is significant but not prohibitive… we have to keep something like a 5-8x larger encoded database in memory, but this overhead is from encoding the plaintext in a special format to allow a fast dot product, not from inefficiency in the packing.