See: Microsoft’s new ‘deepfake’ video generator in action
Microsoft’s new AI generation system has highlighted the advancements in deepfake technology – generating convincing video from a single image and audio clip.
The tool takes an image and turns it into a realistic video, complete with convincing emotions and movements such as raising eyebrows.
One demo shows Mona Lisa coming to life and singing Lady Gaga’s Papparazzi – Microsoft says the system wasn’t specifically trained to handle the sound of singing, but it does. But the ability to generate video from a single image and audio file has scared some experts.
Microsoft has not yet revealed when the AI system will be released to the general public. Yahoo spoke to two AI and privacy experts about the risks associated with this type of technology.
What is remarkable about this new technology?
The VASA system (which stands for ‘visual affective skill’) allows users to tell where the fake person is looking, and what emotions they are showing on the screen. Microsoft says the technology paves the way for ‘real-time’ engagement with realistic speaking avatars.
Microsoft says, ‘Our performance model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with sound, but also capturing a wide spectrum of facial nuances and natural head movements that enhance the perception of authenticity and liveliness. . .’
Why are some people worried?
Not everyone is thrilled with the new system, with one blog describing it as a ‘deep nightmare machine’. Microsoft has stressed that the system is a demonstration and says there are no current plans to release it as a product.
But while VASA-1 represents a step forward in animating people, the technology is not unique: audio start-up Eleven Labs allows users to create incredibly realistic human audio doppelgangers, based on 10 minutes of audio only.
Eleven Labs technology was used to create a ‘deepfake’ audio clip of Joe Biden by ‘training’ a fake version on publicly available audio clips of the President, and then sending a fake audio clip of Biden begging people not voting. The incident, in which a user was banned from Eleven Labs, showed how that technology can easily be used to manipulate real events.
In another incident, a worker at a multinational firm paid out $25 million to fraudsters after a video call with dozens of other team members where everyone was a deep lie. Deepfakes are becoming more and more popular online, and one survey by Prolific found that 51% of adults said they had come across deepfakes on social media.
Simon Bain, CEO of OmniIndex, says, ‘Deepfake technology is on a mission to produce content that has no telltale signs or ‘recognizable artefacts’ to show that it is fake. The recent VASA-1 demonstration is the latest such development that offers a significant step in this direction, and Microsoft’s statement ‘Risk considerations and responsible AI’ praises this campaign to perfection, saying:
“Currently, there are still recognizable artifacts in the videos generated by this method, and the numerical analysis shows that there is still a gap to achieve the authenticity of real video.”
‘Personally, this is a matter of great concern to me, as we need these identifiable artefacts to prevent depths from doing irreparable damage.
What are the telltale signs that you are looking deeply?
Small signs like inconsistencies in skin texture and flickering in facial movements can tip you off that you’re looking at a deep face, says Bain. But soon, even those could disappear, he explains.
Bain says, Only these potential inconsistencies in skin texture and slight flickering in facial movements can tell us visually about the authenticity of a video. That way, we know that when we watch politicians ruin their upcoming election chances, it’s really them and not AI deep intelligence.
‘This begs the question: why does deepfake technology seem determined to eliminate these and other visual cues rather than ensure they stay in? After all, what good could a really cute and ‘real’ fake video do other than trick people? In my opinion, a deep fake that is almost real but unrecognizable can have as much social benefit as it is impossible to recognize as a fake.’
What are tech companies doing about it?
Twenty of the world’s largest technology companies, including Meta, Google, Amazon, Microsoft and TikTok signed a voluntary agreement earlier this year to work together to stop the spread of deepfakes around elections.
Nick Clegg, president of global affairs at Meta, said, “With so many general elections taking place this year, it’s vital that we do everything we can to prevent people from being misled by AI-generated content.
“This work is bigger than any one company and will require a massive effort across industry, government and civil society.”
But the wider effect of deepfakes is that soon no one will be able to trust anything online, and companies should use other methods to ‘validate’ videos, says Jamie Boote, associate chief adviser at Synopsys Software Integrity Group:
Boote said, “The threat of Deepfakes is that they are a way to trick people into believing what they see and hear transmitted through digital channels. Previously, it was difficult for attackers to identify a person’s voice or likeness. fake, and even more difficult to do so with live video and audio Now AI makes that possible in real time and we can no longer believe what is on the screen.
“Deepfakes open another avenue of attacks against human users of IT systems or other non-digital systems such as the stock market. This means that video calls from the CEO or announcements from PR people can be faked to manipulate stock prices in external attacks or be .used by spearphishers to manipulate employees into revealing information, changing network settings or permissions, or downloading and opening files.
“To protect against this threat, we must learn to validate that the face on the screen is the face in front of the sender’s camera and this can be done through additional means such as a phone call to the sender’s mobile phone, a message from a trusted account, or for public announcements, a press release on a public site controlled by the company.