Comment by lifthrasiir

lifthrasiir Aug 18, 2021 parent

After more reading of the whitepaper I think you are right. As I understand it, given the image hash H0 and CSAM hashes H[1]...H[n] (some might be duplicates in disguise) the algorithm proceeds like this:

- The device generates a secret X and divides it into X[1]...X[m] with the secret sharing algorithm. m is some large number and any k (but no less) copies out of X[i] are enough to reconstruct X.

- The device stores blinded hashes f(H[1])...f(H[n]). The function f itself is not known to the client.

- The image hash H0 is compressed with another function g to the range between 1 and n.

- The downscaled image data (for the human check) is encrypted with X and appended with (probably) random X[i].

- The result is then encrypted again with a key derived from f(H0) and sent to the server with an associated data g(H0).

- The server tries to decrypt it with a key derived from f(H[g(H0)]). This is only possible when H[g(H0)] = H0, i.e. H0 represents some known CSAM.

- You can only decrypt the second layer with at least k copies of X[i] then.

At this stage Apple can still learn the number of CSAM images less than k. The fix is described in an overly technical document and I can't exactly follow, but supposedly the client can inject an appropriate amount of synthetic data where only the first layer can be always decrypted and the second layer is bogus (including the presumed X[i]).

---

Assuming this scheme is correctly implemented, the only attack I can imagine is the timing attack. As I understand a malicious client can choose not to send false data. This will affect the number of items that pass the first layer of encryption, so the client can possibly learn the number of actual matches by adjusting the number of synthetic data since the server can only proceed to the next step with at least k such items.

This attack seems technically possible, but is probably infeasible to perform (remember that we already need 2^95 oracle operations, which is only vaguely possible even in the local device). Maybe the technical report actually has a solution for this, but for now I can only guess.

falcolas Aug 18, 2021

That synopsis disagrees with Apple's own descriptions - or rather it goes into the secondary checks, which confuses the issue that the initial hash checks are indeed performed on-device:

> Apple’s method of detecting known CSAM is designed with user privacy in mind. Instead of scanning images in the cloud, the system performs on-device matching using a database of known CSAM image hashes provided by NCMEC and other child-safety organizations. Apple further transforms this database into an unreadable set of hashes, which is securely stored on users’ devices.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

This item has no comments currently.