Contact us
Earhartstrasse 17, 8152 Glattpark
Mobile: +41 76 577 32 59
Work requests
Mobile: +41 76 577 32 59

From words to images: OpenAI's 'Sora' creates videos from text descriptions


OpenAI's innovative Sora model enables the creation of one-minute videos from text.

Just yesterday, OpenAI unveiled its latest model called Sora, which has the ability to generate high-resolution videos of up to one minute in length based on text instructions. Sora, which means "heaven" in Japanese, will not be available to the general public anytime soon. The tool is currently being made available to a select group of scientists and researchers to evaluate the risk of misuse and harm.

The OpenAI website shares details of this groundbreaking development, which has attracted widespread interest: "Sora is able to create complex scenarios with multiple characters, specific gestures and detailed descriptions of objects and backgrounds. The model not only understands the user's desire from the prompt, but also how these elements exist in the real world."

OpenAI has published some impressive videos of Sora on its website and social networks. These sample videos in particular caused quite a stir as Sora's ability to create 60-second videos amazed many. One of the videos shows a couple walking through Tokyo, surrounded by blowing cherry blossom petals and snowflakes.

Another video shows lifelike mammoths wandering through a snow-covered landscape against the backdrop of impressive, snow-covered mountains.

Other notable videos include:

Yes, impressive, but still room for improvement

OpenAI emphasizes that Sora is based on "deep language understanding", which enables precise interpretation of text input. Like other current AI-supported image and video generators, Sora is not flawless. The company admits that the model currently still has difficulties in recognizing causal relationships. For example, it could generate a video of someone eating a cookie without the cookie showing bite marks. There has also been criticism on social networks that Sora overlooks details in the prompts and that there are occasional inaccuracies in the movement sequences of the characters that are only noticed by experts. One particularly noticeable flaw is the white, glowing border that appears around a woman's head in one of the videos, which makes her stand out clearly from the background and is very noticeable in some scenes. On social media, people from the creative industry in particular are expressing concern: "I will lose my job" and "This is damaging our profession".

Although Sora is not the first model to generate videos from text - similar tools are also offered by Meta, Google and Runway - it stands out for its ability to create videos up to 60 seconds at a time, rather than stitching them together frame by frame as is the case with other models.

"I fear that such technologies could influence elections."

The development of text-to-video conversion tools has raised concerns about the potential for artificial intelligence to create misinformation. Oren Etzioni, professor of artificial intelligence at the University of Washington and founder of True Media, an organization dedicated to combating disinformation in political media, expressed concern: "It's alarming that such technologies could influence elections." These advances are also leading to resistance from artists and creatives who fear for their jobs and copyright.

OpenAI is working with experts to review the tool for risk of misinformation, hate speech and bias before it is released to the public. The company is also developing tools that identify videos created with Sora and add metadata to them to enable easier recognition. It was also emphasized that both public domain and licensed videos are used by copyright holders, although no details on Sora's training are currently disclosed.


Write comment

Your email address will not be published. Required fields are marked with *

This website stores cookies on your computer. Cookie Policy