Launch1mo ago

Google Launches New AI Model for Multimodal Computing

DeepMind SafetyJune 9, 20261 min brief

In brief

Google has introduced Gemma 4 12B, a groundbreaking multimodal AI model designed to run efficiently on laptops.
- This model eliminates the need for separate encoders for vision and audio, instead processing both inputs directly within its architecture.
- This innovation reduces latency and memory usage while maintaining high performance comparable to larger models.
The new model is particularly notable for being the first mid-sized model with native audio support, making it versatile for tasks like speech recognition and image analysis.
- It requires only 16GB of VRAM, enabling seamless operation on standard laptops.
Developers have already used earlier Gemma models to create applications ranging from robotic arms to AI security systems.
- This release marks a significant step in bringing advanced AI capabilities to everyday devices without compromising speed or functionality.
As Google continues to refine its multimodal approach, we can expect even more powerful and accessible tools for developers and users alike.

Terms in this brief

Gemma 4 12B: A multimodal AI model developed by Google that processes both visual and audio inputs directly within its architecture. It's efficient enough to run on laptops with only 16GB of VRAM, making it accessible for various applications like speech recognition and image analysis.

More briefs