Whenever Madonna sings 1980’s “La Isla Bonita” on her concert tour, moving images of sunset clouds play on giant screens behind her.
To achieve that ethereal look, the pop legend incorporated a branch of artificial intelligence generation that was not yet spoken of – the text-to-video tool. Type a few words — say, “surreal cloud at sunset” or “jungle waterfall at dawn” — and a video is instantly made.
Following in the footsteps of AI chatbots and still image generators, some AI video enthusiasts say the emerging technology could one day add to entertainment, enabling you to choose your own movie with tailored stories and endings. But there is a long way to go before they can do that, and plenty of ethical pitfalls along the way.
For early adopters like Madonna, who have long been pushing the boundaries of art, it was more of an experiment. She linked an earlier version of the “La Isla Bonita” concert videos that used more conventional computer graphics to evoke a tropical mood.
“We tried CGI. It looked really bland and cheesy and she didn’t like it,” said Sasha Kasiuha, director of content for Madonna’s Celebration Tour which continues until the end of April. “And then we decided to try AI.”
OpenAI, the maker of ChatGPT, gave a glimpse of what sophisticated text-to-video technology could look like when the company recently showed off Sora, a new tool that is not yet publicly available. Madonna’s team tried a different product from New York-based startup Runway, which helped pioneer the technology by releasing its first public text-to-video model last March. The company released a more advanced “Gen-2” version in June.
Runway CEO Cristóbal Valenzuela said that while some see these tools as “a magic device where you type in a word and somehow it spits out exactly what was in your head,” they are the most effective approaches or creative professionals who have been looking for an upgrade for years. digital editing software they already use.
He said Runway can’t make a full documentary yet. But it could help with some background video, or b-roll – the supporting shots and scenes that help tell the story.
“That saves you maybe a week of work,” Valenzuela said. “The common thread of many use cases is that people use it as a way to increase or speed up something they might have done before.”
Runway’s target customers are “major streaming companies, production companies, post-production companies, visual effects companies, marketing teams, advertising companies. A lot of people do stuff for a living,” Valenzuela said.
Dangers await. Without effective safeguards, AI video generators could threaten democracies through “deep-sea” videos of things that never happened, or – as is already the case with AI image generators – flood the internet with fake pornographic scenes showing the people who look like real people. recognizable faces. Under pressure from regulators, major tech companies are promising to watermark AI-generated output to help identify what’s real.
There are also copyright disputes over the collections of videos and images the AI systems are trained on (Runway and OpenAI do not disclose their data sources) and how unfairly they are replicating trademarked works. And there are fears that, at some point, video machines could replace human jobs and art.
To date, the longest AI-generated video clips are still measured in seconds, and can include jerky movements and narrative glitches like distorted hands and fingers. That fix is ”just a matter of more data and more training,” and the computing power that training depends on, said Alexander Waibel, a computer science professor at Carnegie Mellon University who researches AI. since the 1970s.
“I can now say, ‘Make me a video of a rabbit dressed as Napoleon walking through New York City,'” Waibel said. “He knows what New York City looks like, what a rabbit looks like, what Napoleon looks like.”
Which is great, he said, but still far from crafting a compelling storyline.
Before releasing its first-generation model last year, Runway’s claim to AI fame was co-developer of the Stable Diffusion image generator. Another company, Stability AI, based in London, has developed Stable Diffusion.
The basic “diffusion model” technology behind most of the leading AI image and video generators is to map noise, or random data, onto images, effectively erasing an original image and then creating what one should look like predict new. It borrows an idea from physics that can be used to describe, for example, how a gas spreads out.
“What diffusion models do is reverse that process,” said Phillip Isola, an associate professor of computer science at the Massachusetts Institute of Technology. “They take the randomness and re-introduce it into the instant. That’s the way to go from randomness to content. And that’s how you can make random videos.”
Creating video is even more complicated than images because it has to take into account temporal dynamics, or how elements within the video change over time and across frames, said Daniela Rus, another MIT professor who directs its Computer Science and Information Lab Artificial.
Rus said the computing resources required are “much higher than generating still images” because “it involves processing and generating multiple frames for every second of video.”
That’s not stopping some great tech companies from trying to break out and show higher quality AI video generation at longer intervals. It was just a matter of requiring a written description to make an image. Google recently showed off a new project called Genie that can be prompted to turn a photo or even a sketch into an “infinite variety” of explorable video game worlds.
In the short term, AI-generated videos are likely to be featured in marketing and educational content, providing a cheaper alternative to producing original footage or acquiring stock video, said Aditi Singh, a researcher at Cleveland State University who surveyed the text-to-. video market.
When Madonna first talked to her team about AI, “it wasn’t the primary thought, ‘Oh, look, it’s an AI video,'” said Kasiuha, the creative director.
“She asked me, ‘Can you just use one of those AI tools to make the picture crisper, to make sure it looks current and looks high-resolution?’” Kasiuha said. “She loves when you introduce new technology and new types of visual elements.”
Longer AI-generated films are already being made. Runway hosts an annual AI film festival to showcase such works. But still is that what a human audience will choose to watch.
“I still believe in people,” said Waibel, the CMU professor. “I still believe that in the end it will be a symbiosis where you get an AI suggesting something and a human improves or guides it. Or the people will do it and the AI will fix it.”
————
Associated Press journalist Joseph B. Frederick contributed to this report.