Comment by electrograv

electrograv Jul 10, 2019 parent

So Google’s response is (paraphrased as fairly as I can while removing the sugar-coating):

’Yes, we hire people to listen in to and transcribe some conversations from the private homes of our customers (so as improve our speech recognition engines); but the recordings aren’t linked to personally identifiable information.’

Even assuming they have only the purest intentions here, I still don’t understand how they can possibly guarantee that these recorded conversations are not linked to personally identifiable information!

For example, what’s to stop me from saying “Hey Google, I am <full legal name / ID> and my most embarrassing and private secret is <...>”?

One might argue that they could detect this in the recognized text and omit those samples, but presumably the whole purpose of hiring people to create transcripts is because the existing speech-to-text engine isn’t perfect, and they need more training data.

anbop Jul 10, 2019

“I rue the day I married you, Steven Robert Parker, you HIV-infected cheating scumbag! I wish I had never lied to the FBI about those classified documents you stole!”

Bartweiss Jul 10, 2019

It seems even worse than this - I'd argue your voice is personally identifiable information! The vast majority of these clips open with "Hey Google".

Meanwhile, Android allows you to personalize voice commands based on its ability to recognize that a specific person is the one saying "OK Google". Voice authentication has already reached high accuracy with a few seconds of unconstrained text, or a few words of fixed text. Voice identification on open sets takes more data, but sub-minute clips are still reasonably effective.

At the very least, Google itself could make a credible attempt to identify whether the speaker in any voice clip heard by Google Home is a regular user, and plausibly de-anonymize users of OK Google. More alarmingly, we're told that about 1 in 500 Google Home clips is heard by a human, and this employee apparently shared "thousands" of clips with a news organization. It seems plausible that anyone with access to any large voiceprint database could attempt to obtain clips from a random contractor and de-anonymize the most interesting or salacious content.

hammock Jul 10, 2019

You paraphrased it in a different way and that might be why you're confused.

Google says "the excerpts are not linked to personally identifiable information." To me that means the metadata is stripped, not that they strip anything out of the audio.

electrograv OP Jul 10, 2019

Thank you, good catch. I’ve edited my paraphrase to make it more accurate in this way.

That said, it still sounds like Google is trying to convince us that the data they capture (not just the metadata) is never linkable to personally identifiable information, which if true would genuinely ease many privacy concerns here.

As far as I know, just because data is not explicitly annotated with PII doesn’t erase the legal (and ethical) responsibilities associated with handling data that contains PII.

So even if they worded their response so it’s truthfulness is legally/technically defendable, it’s still a bit of a ‘red herring’ at least (I don’t think anyone is accusing Google of explicitly associating these audio recordings with user IDs).

d1zzy Jul 10, 2019

But in order to tell if it contains PII it has to be listened in by a human to transcribe it... It's like Schrödinger's audio assistant ;)

smsm42 Jul 10, 2019

> For example, what’s to stop me from saying “Hey Google, I am <full legal name / ID>

Even more fun, if you call a bank, you often have to key-in your account number (which can be easily decoded if your phone sounds back the tones, which most do), then tell you name, your address and sometimes your other PII like Social Security number or part of it. Record that call and that's a complete identity theft package, nicely wrapped, just replay it to the bank (which name you've also have recorded, if the user called on speaker, which they did because who wants to keep the phone pressed to your head all the time while you're waiting and listening to the muzak) and you get full access to the user's bank account.

bryan_w Jul 10, 2019

I'm not sure Google devices can make calls, but if they could, the only part that would be sent to Google (which is what these people would have to analyze) is "hey Google, call bank"

smsm42 Jul 12, 2019

From what I understand, you don't have to call bank using Google Home device, enough that you'd call bank while Google Home device is within the earshot while something else says "OK google" while you're talking.

fastball Jul 11, 2019

The Google Home can make phone calls in the US and UK.

freeAgent Jul 10, 2019

I would count a recording of my voice as "personally identifiable information" right off the bat. Voice printing is a thing, and anyone will also tell you that they recognize the voices of people they interact with regularly. If someone played an audio clip of someone I know talking to Google Assistant to me, I would recognize who it was based on their voice.

Bartweiss Jul 10, 2019

This sent me down the rabbithole of learning how identifiable voiceprints are. As you might guess, the answer is "very", although to my surprise our voices change enough that recordings lose a great deal of fidelity over a few years.

Authentication on fixed phrases is reasonably accurate within a very few words, so at minimum it should be possible to associate "Hey Google" clips with regular users of Google Assistant voice control (i.e. "OK Google"). Identifying whether someone is present in a large dataset on open phrases is much harder, but a ~30s clip could do the job fairly consistently for anyone with access to a significant amount of voice data. And if this employee (who isn't directly working for Google) shared 'thousands' of clips with a news org, the cautious bet is that some other employee might share them with anyone willing to pay for the records.

tinus_hn Jul 10, 2019

So without connecting this phrase to a person or other phrases, what information leaks? That the person exists?

arien Jul 10, 2019

In terms of GDPR:

https://gdpr-info.eu/art-4-gdpr/

> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

tinus_hn Jul 10, 2019

So you’ve identified a person. What information has been revealed about that person?

pgoggijr Jul 10, 2019

It could be anything - the voice snippet could include a query about a particular medical condition or include specific financial records for example.

arien Jul 10, 2019

Anything that falls in that 'personal data' segment above that belongs to me has to be obtained with my prior, explicit consent. And these bits of data with my name or other details in them must be included if I send Google a "right to be forgotten" or "show me all the data you've got about me" request. That's GDPR in a nutshell.

This might be a grey area for now, as both GDPR and listening devices are both quite new. But Google, Amazon & co aren't super popular with EU regulators and governments, so they might side with users' rights on this one.

blueboo Jul 11, 2019

Without the metadata PII, there’s no evidence adversarial utterances like that are true.

It’s hard not to feel like this outrage is trumped-up anti-Google FUD. So many more worthy fronts to assail Google et al. on!

After all, they let you upload photos and video that are, per various policies and with some non-zero frequency, reviewed by humans — and users are begging them to do it more often.

MassiveOwl Jul 10, 2019

This is a good point, one of the reasons to use it is for reminders "Hey Google, remind me to visit x at place y"

AgloeDreams Jul 10, 2019

Even then, the voice print of someone is obviously ID-able right?

burpmaster5k Jul 10, 2019 (dead)

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous