Google Introduces Gemini Omni: Feature, Ability, and Everything to Know

Google introduced Gemini Omni, a new world model it claims is a key step towards artificial general intelligence (AGI) at Google I/O 2026.

Liputan6.com, Jakarta - Google DeepMind officially introduced Gemini Omni on May 19, 2026, at Google I/O 2026, a keynote event focused on artificial intelligence innovation.

This new world model was described as a crucial step toward Artificial General Intelligence (AGI) by Google DeepMind CEO Demis Hassabis.

The introduction of Gemini Omni was a key highlight among a series of other Gemini announcements, marking a new era in generative AI capabilities.

The first model in this family, Gemini Omni Flash, launched the same day as the announcement.

Gemini Omni Flash is now accessible to Google AI Plus, Pro, and Ultra subscribers through the Gemini app and Google Flow.

The Multimodal Revolution and Unique Capabilities

Unlike conventional text-to-video AI models, Gemini Omni excels in its multimodal capabilities for both input and output.

Users can input various types of data, such as text, audio, images, and video, to trigger the model's creation.

Omni then generates a unique interactive world, leveraging Gemini's "real-world knowledge," designed to create videos with more accurate physics and more realistic-looking content.

Gemini Omni's capabilities also include understanding the context of a prompt, including historical facts, to generate more accurate and relevant video content.

"We’re introducing Gemini Omni, where Gemini’s ability to reason meets the ability to create. Omni is our new model that can create anything from any input — starting with video. With Omni, you can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge," Google said on its official website.

Features and Workflow Integration

Gemini Omni's initial focus is video generation, although Google plans to add image and text capabilities over time.

Conversational editing features allow users to modify videos through direct conversation with the Omni model.

For example, users can change the background, visual style, point of view, or other specific details in a video clip with just verbal commands.

Omni serves as a "world model" capable of maintaining a cohesive, reality-based world.

This model is designed to unify the entire video creation workflow, from scriptwriting to animation and editing, into a single, intuitive, conversational interface.

To ensure security and accountability, all videos generated by Gemini Omni will be watermarked with SynthID, identifying them as AI-generated.

Google Gemini Omni Comparison and Future Plans

Gemini Omni represents a significant leap forward in world understanding, multimodality, and editing capabilities, enabling users to generate any output from any input, starting with video.

This model intelligently combines Gemini's reasoning capabilities with the power of creation.

While the visual quality of the leaked clips doesn't significantly surpass Veo 3.1, the editing workflow offered by Omni is radically different and more revolutionary.

Google has also announced the development of a more robust Omni model, "Omni Pro."

Additionally, Google plans to expand Omni's capabilities to edit audio and speech in videos.

However, this feature will be rolled out once the company can ensure its implementation is responsible.