- OpenAI teases an amazing new generative video model Sora, which is based on previous research on DALL-E and GPT models;
- Sora is capable of generating up to 60 seconds of video from text instructions and is capable of providing scenes with multiple characters, specific types of action, and detailed background details;
- Sora can also create multiple shots in a generated video.
The global leader in artificial intelligence models, OpenAI, has launched a model named Sora that can instantly generate short videos based on text instructions. Earlier in 2023, during the highly competitive multimodal AI model competition, companies such as Google, Meta, and startups like Runway and Pika Labs had also released similar models. However, the videos demonstrated by OpenAI continue to garner attention due to their high quality.
Also read: OpenAI cures GPT-4 ‘laziness’ with new updates
Sora can interact with the real world
Currently, there is limited information about Sora on the OpenAI official website. OpenAI has not provided the source material for training the model, only stating, ‘We are teaching AI to understand and simulate the physical world in motion, with the aim of training models to help people solve problems that require interaction with the real world.’ OpenAI claims that Sora can generate videos up to 60 seconds long from textual descriptions and can provide scenes with multiple characters, specific types of actions, and detailed background details. Sora can also create multiple shots within a generated video, showcasing characters and visual styles.
Furthermore, Sora can generate entire videos at once or extend generated videos to make them longer. OpenAI states, ‘By having the model generate multiple frames at once, we address a challenging problem: ensuring that the subject remains consistent even when temporarily out of sight.’ OpenAI also acknowledges that the current Sora model has weaknesses. It may struggle to accurately simulate physical phenomena in complex scenes and may fail to understand specific causal relationships. For example, a person might take a bite of a cookie, but after the bite, there might be no mark on the cookie. The model may also confuse spatial details mentioned, such as left and right, and may struggle to accurately describe events occurring over time, such as following a specific camera trajectory.
No worries regarding the safety issue
Regarding AI safety issues, which OpenAI CEO Sam Altman has been consistently addressing, OpenAI states, ‘Currently, Sora has been made available to ‘red teamers‘ (those who conduct ‘red team testing’ on potential harmful outputs of AI large models) to assess harm or risks in critical areas. We also allow some visual artists, designers, and filmmakers access to gather feedback on how to improve the model, making it most useful for creative professionals.’
OpenAI indicates that Sora builds upon past research on DALL-E and GPT models. It adopts the techniques from DALL·E 3, enabling it to more faithfully follow users’ textual instructions in generated videos. Besides generating videos from scratch, the model can also generate videos based on existing static images and accurately and intricately animate the content of the images. The model can also extract existing videos and extend or fill in missing frames.
Currently, the OpenAI website has updated with 48 demo videos generated by Sora, featuring vibrant colors and realistic effects






