OpenAI's innovative Sora model enables the creation of one-minute videos from text.
Just yesterday, OpenAI unveiled its latest model called Sora, which has the ability to generate high-resolution videos of up to one minute in length based on text instructions. Sora, which means "heaven" in Japanese, will not be available to the general public anytime soon. The tool is currently being made available to a select group of scientists and researchers to evaluate the risk of misuse and harm.
The OpenAI website shares details of this groundbreaking development, which has attracted widespread interest: "Sora is able to create complex scenarios with multiple characters, specific gestures and detailed descriptions of objects and backgrounds. The model not only understands the user's desire from the prompt, but also how these elements exist in the real world."
OpenAI has published some impressive videos of Sora on its website and social networks. These sample videos in particular caused quite a stir as Sora's ability to create 60-second videos amazed many. One of the videos shows a couple walking through Tokyo, surrounded by blowing cherry blossom petals and snowflakes.
Introducing Sora, our text-to-video model.
- OpenAI (@OpenAI) February 15, 2024
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W
Prompt: "Beautiful, snowy... pic.twitter.com/ruTEWn87vf
Another video shows lifelike mammoths wandering through a snow-covered landscape against the backdrop of impressive, snow-covered mountains.
Prompt: "Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance... pic.twitter.com/Um5CWI18nS
- OpenAI (@OpenAI) February 15, 2024
Other notable videos include:
Prompt: "Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. the art style is 3d and realistic, with a focus on lighting and texture. the mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with... pic.twitter.com/aLMgJPI0y6
- OpenAI (@OpenAI) February 15, 2024
Yes, impressive, but still room for improvement
OpenAI emphasizes that Sora is based on "deep language understanding", which enables precise interpretation of text input. Like other current AI-supported image and video generators, Sora is not flawless. The company admits that the model currently still has difficulties in recognizing causal relationships. For example, it could generate a video of someone eating a cookie without the cookie showing bite marks. There has also been criticism on social networks that Sora overlooks details in the prompts and that there are occasional inaccuracies in the movement sequences of the characters that are only noticed by experts. One particularly noticeable flaw is the white, glowing border that appears around a woman's head in one of the videos, which makes her stand out clearly from the background and is very noticeable in some scenes. On social media, people from the creative industry in particular are expressing concern: "I will lose my job" and "This is damaging our profession".
Although Sora is not the first model to generate videos from text - similar tools are also offered by Meta, Google and Runway - it stands out for its ability to create videos up to 60 seconds at a time, rather than stitching them together frame by frame as is the case with other models.
"I fear that such technologies could influence elections."
The development of text-to-video conversion tools has raised concerns about the potential for artificial intelligence to create misinformation. Oren Etzioni, professor of artificial intelligence at the University of Washington and founder of True Media, an organization dedicated to combating disinformation in political media, expressed concern: "It's alarming that such technologies could influence elections." These advances are also leading to resistance from artists and creatives who fear for their jobs and copyright.
OpenAI is working with experts to review the tool for risk of misinformation, hate speech and bias before it is released to the public. The company is also developing tools that identify videos created with Sora and add metadata to them to enable easier recognition. It was also emphasized that both public domain and licensed videos are used by copyright holders, although no details on Sora's training are currently disclosed.