Skip to main content

Meta is using AI to generate videos from just a few words

Meta is using AI to generate videos from just a few words. (Courtesy Meta/CNN) Meta is using AI to generate videos from just a few words. (Courtesy Meta/CNN)
Share

Artificial intelligence is getting better and better at generating an image in response to a handful of words, with publicly available AI image generators such as DALL-E 2 and Stable Diffusion. Now, Meta researchers are taking AI a step further: they're using it to concoct videos from a text prompt.

Meta CEO Mark Zuckerberg posted on Facebook on Thursday about the research, called Make-A-Video, with a 20-second clip that compiled several text prompts that Meta researchers used and the resulting (very short) videos. The prompts include "A teddy bear painting a self portrait," "A spaceship landing on Mars," "A baby sloth with a knitted hat trying to figure out a laptop," and "A robot surfing a wave in the ocean."

The videos for each prompt are just a few seconds long, and they generally show what the prompt suggests (with the exception of the baby sloth, which doesn't look much like the actual creature), in a fairly low-resolution and somewhat jerky style. Even so, it demonstrates a fresh direction AI research is taking as systems become increasingly good at generating images from words. If the technology is eventually released widely, though, it will raise many of the same concerns sparked by text-to-image systems, such as that it could be used to spread misinformation via video.

A web page for Make-A-Video includes these short clips and others, some of which look fairly realistic, such as a video created in response to the prompt "Clown fish swimming through the coral reef" or one meant to show "A young couple walking in a heavy rain."

In his Facebook post, Zuckerberg pointed out how tricky it is to generate a moving image from a handful of words.

"It's much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they'll change over time," he wrote.

A research paper describing the work explains that the project uses a text-to-image AI model to figure out how words correspond with pictures, and an AI technique known as unsupervised learning — in which algorithms pore over data that isn't labeled to discern patterns within it — to look at videos and determine what realistic motion looks like.

As with massive, popular AI systems that generate images from text, the researchers pointed out that their text-to-image AI model was trained on internet data — which means it learned "and likely exaggerated social biases, including harmful ones," the researches wrote. They did note that they filtered data for "NSFW content and toxic words," but as datasets can include many millions of images and text, it may not be possible to remove all such content.

Zuckerberg wrote that Meta plans to share the Make-A-Video project as a demo in the future.

CTVNews.ca Top Stories

Do these exercises for core strength if you can't stomach doing planks

Planks are one of the most effective exercises for strengthening your midsection, as they target all of your major core muscles: the transverse abdominis, rectus abdominis, external obliques and internal obliques. Yet despite the popularity of various 10-minute plank challenges, planking is actually one of the most dreaded core exercises, according to many fitness experts.

Local Spotlight

N.B. man wins $64 million from Lotto 6/49

A New Brunswicker will go to bed Thursday night much richer than he was Wednesday after collecting on a winning lottery ticket he let sit on his bedroom dresser for nearly a year.

Record-setting pop tab collection for Ontario boy

It started small with a little pop tab collection to simply raise some money for charity and help someone — but it didn’t take long for word to get out that 10-year-old Jace Weber from Mildmay, Ont. was quickly building up a large supply of aluminum pop tabs.