“My voice is my password” single phrase voice biometrics has been almost universally adopted by banks across the UK, US and Australia.
A fast and efficient way to authenticate customers accessing telephone customer services and telephone banking. But with the advances in voice cloning from OpenAI and others, simplistic single factor single phrase solution is all but obsolete!
I any event, “My voice is my password” never was a very good idea in the first place!
Firstly, it is a one-shot identity verification process. It assumes that once you have passed this test you remain the same caller. That’s a big assumption. Passive verification that listens in to the call in the background overcomes this limitation. But then passive verification also misses the point that voice is a multi-factor credential containing both the biometric of the speaker and the intent (or message) of the speaker all within a single credential.
Fusing speaker and speech recognition extracts both the biometric and spoken information. This creates a continuous multi-factor authentication process. Ask a question and check to see if the spoken answer and the voice biometric align.
Every spoken interact become yet another biometric match. Then ask the same question twice, and make sure the answer you get back is different from the previous one. If there is on thing about being human, is that we can never answer the same question the same way twice.
And all the time, the algorithm is checking for synthetic artifacts. We have all looked at an AI generated image and through it looked a bit too perfect. So, it is for voice. It is a bit too consistent, a bit too perfect. Where are the ums and ars, the hesitations, errors and mistakes.
And then we get on to detecting those acoustic artifacts, imperceptible to the human ear, generated by the underlying speech synthesis algorithms. There are some sound synthesisers make that are impossible to make with a human vocal tract.
“My voice is my password” was never a good idea that’s become a very bad idea!