- Veo can create high-quality, 1080p videos over 60 seconds in various styles, from photorealistic to surreal and animated.
- Veo offers unparalleled creative control, interpreting cinematic terms for precise video edits from text and it can edit AI-generated videos.
- Google improved performance by incorporating comprehensive captions into the training dataset and leveraging high-fidelity.
Google introduced Veo on Wednesday, its advanced generative AI video model developed by the DeepMind AI division, at the annual I/O developer conference. Veo aims to rival OpenAI’s Sora in terms of realism and quality of AI-generated motion visuals.
High-quality video generation
Veo is capable of creating high-quality 1080p video clips exceeding 60 seconds. According to a post from DeepMind on the social network X, Veo can handle various cinematic styles, from photorealism to surrealism and animation. This model supports text-to-video, video-to-video, and image-to-video transformations, making video production accessible to everyone, whether they are seasoned filmmakers, aspiring creators, or educators.
Also read: Google launches Trillium AI chip that’s five times faster
Also read: Google and HP to launch 3D video conferencing platform Project Starline
In a notable collaboration, polymath artist Donald Glover, also known as Childish Gambino, tested Veo’s capabilities through his creative studio, Gilga. This partnership underscores the model’s potential to generate stunning, near-indistinguishable videos from text prompts. Examples include realistic jellyfish swimming and neon cityscapes, showcasing Veo’s ability to produce high-quality, lifelike videos.
Unprecedented creative control
Google’s VP of Product Management, Eli Collins, and Senior Research Director, Douglas Eck, highlighted Veo’s unprecedented level of creative control. The model understands cinematic terms like “timelapse” and “aerial shots,” enabling precise and high-quality video edits from text prompts. Veo can edit AI-generated videos or user-uploaded clips, maintaining consistency between frames using advanced latent diffusion transformers. This technology reduces inconsistencies and keeps characters, objects, and styles stable.
To enhance performance, Google added detailed captions to the training data and used high-quality compressed video representations. These improvements boost overall video quality and reduce generation time. Additionally, all Veo videos are embedded with SynthID, Google’s content credentials tracking watermark, ensuring they can be detected as AI-generated.






