latentbrief
Back to news
Launch3d ago

Google's Gemini AI Breaks New Ground in Multimodal Creation

DeepMind Safety1 min brief

In brief

  • Gemini Omni, a groundbreaking AI model from Google, can generate and edit videos using any input.
    • It combines images, audio, video, and text to create high-quality content, grounded in real-world knowledge.
    • This innovation allows users to edit videos through simple conversations, transforming scenes with consistent physics and character behavior.
  • The initial release, Gemini Omni Flash, is now available in the Gemini app, Google Flow, and YouTube Shorts.
  • Users can change specific elements or entire environments in videos, like making a sculpture out of bubbles or adding ripples to a mirror touch.
  • The system allows for iterative refinement without losing the original scene's thread, offering creative possibilities beyond traditional filming.
  • Looking ahead, Google plans to expand Gemini Omni to support image and audio outputs.
    • This development marks a significant step in AI's ability to manipulate and create content across multiple modalities, promising new tools for creators and researchers.
  • Stay tuned for further updates on this evolving technology.

Terms in this brief

Omni
A term used by Google to describe their advanced AI model, Gemini Omni, which can generate and edit videos using various inputs like images, audio, video, and text. It's designed to create high-quality content based on real-world knowledge.

Read full story at DeepMind Safety

More briefs