Comment by robertoandred

robertoandred Aug 18, 2021 parent

You cannot extract or reverse the CSAM hashes. They've been encrypted and blinded using server-side-only keys. If TFA said that, it's lying.

lifthrasiir Aug 18, 2021

After more reading of the whitepaper I think you are right. As I understand it, given the image hash H0 and CSAM hashes H[1]...H[n] (some might be duplicates in disguise) the algorithm proceeds like this:

- The device generates a secret X and divides it into X[1]...X[m] with the secret sharing algorithm. m is some large number and any k (but no less) copies out of X[i] are enough to reconstruct X.

- The device stores blinded hashes f(H[1])...f(H[n]). The function f itself is not known to the client.

- The image hash H0 is compressed with another function g to the range between 1 and n.

- The downscaled image data (for the human check) is encrypted with X and appended with (probably) random X[i].

- The result is then encrypted again with a key derived from f(H0) and sent to the server with an associated data g(H0).

- The server tries to decrypt it with a key derived from f(H[g(H0)]). This is only possible when H[g(H0)] = H0, i.e. H0 represents some known CSAM.

- You can only decrypt the second layer with at least k copies of X[i] then.

At this stage Apple can still learn the number of CSAM images less than k. The fix is described in an overly technical document and I can't exactly follow, but supposedly the client can inject an appropriate amount of synthetic data where only the first layer can be always decrypted and the second layer is bogus (including the presumed X[i]).

---

Assuming this scheme is correctly implemented, the only attack I can imagine is the timing attack. As I understand a malicious client can choose not to send false data. This will affect the number of items that pass the first layer of encryption, so the client can possibly learn the number of actual matches by adjusting the number of synthetic data since the server can only proceed to the next step with at least k such items.

This attack seems technically possible, but is probably infeasible to perform (remember that we already need 2^95 oracle operations, which is only vaguely possible even in the local device). Maybe the technical report actually has a solution for this, but for now I can only guess.

falcolas Aug 18, 2021

That synopsis disagrees with Apple's own descriptions - or rather it goes into the secondary checks, which confuses the issue that the initial hash checks are indeed performed on-device:

> Apple’s method of detecting known CSAM is designed with user privacy in mind. Instead of scanning images in the cloud, the system performs on-device matching using a database of known CSAM image hashes provided by NCMEC and other child-safety organizations. Apple further transforms this database into an unreadable set of hashes, which is securely stored on users’ devices.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

falcolas Aug 18, 2021

One does not need to reverse the CSAM hashes to find a collision with a hash. If the evaluation is being done on the phone, including identifying a hash match, the hashes must also be on the phone.

robertoandred OP Aug 18, 2021

No, matches are not verified on the phone. On the phone, your image hash is used to look up an encrypted/blinded (via the server's secret key) CSAM hash. Then your image data (the hash and visual derivative) is encrypted with that encrypted/blinded hash. This encrypted payload, along with a part of your image's hash, is sent to Apple. Then on the server, Apple uses that part of your image's hash and their secret key to create a decryption key for the payload. If your image hash matches the CSAM hash, the decryption key would unlock the payload.

In addition, they payload is protected at another layer by your user key. Only with enough mash matches can Apple put together the user decryption key and open the very innards of your image's payload containing the full hash and visual derivative.

falcolas Aug 18, 2021

To quote a sibling comment, who looked into the horses' mouth:

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

mannerheim Aug 18, 2021

I believe the hash comparisons are made on Apple's end. Then the only way to get hashes will be a data breach on Apple's end (unlikely but not impossible) or generating it from known CSAM material.

falcolas Aug 18, 2021

That's not what Apple's plans state. The comparisons are done on phone, and are only escalated to Apple if there are more than N hash matches, at which point they are supposedly reviewed by Apple employees/contractors.

Otherwise, they'd just keep doing it on the material that's actually uploaded.

mannerheim Aug 18, 2021

Ah, never mind, you're right:

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

cyanite Aug 18, 2021

He is not right, though. The system used will not reveal matches to the device, only to the server and only if the threshold is reached.

cyanite Aug 18, 2021

> That's not what Apple's plans state. The comparisons are done on phone

Yes but as stated in the technical description, this match is against a blinded table, so the device doesn’t learn if it’s a match or not.

This item has no comments currently.