Launch1mo ago

Google Launches Efficient Multimodal AI Model for Laptops

Hugging Face Blog, DeepMind SafetyJune 9, 20261 min brief

In brief

Google has unveiled Gemma 4 12B, a new multimodal AI model designed to run efficiently on laptops.
- This model eliminates the need for separate encoders for vision and audio, allowing these inputs to directly interact with the language processing core.
- This streamlined approach reduces memory usage while maintaining performance comparable to larger models.
The model's key features include advanced reasoning capabilities, laptop-friendly size requirements (16GB of VRAM), and native support for audio inputs.
- It also introduces Multi-Token Prediction drafters, reducing latency for smoother user experience.
Gemma 4 12B is open-source under Apache 2.0, making it accessible to developers worldwide.
With over 150 million downloads across its models, Google expects Gemma 4 12B to unlock new possibilities in AI development, from enterprise security tools to assistive wearables.
The model's efficiency and versatility aim to bring cutting-edge AI capabilities to everyday devices, setting the stage for broader AI adoption in personal computing.

Terms in this brief

Gemma 4 12B: A new multimodal AI model by Google designed to efficiently run on laptops. It processes vision and audio directly with its language core, using less memory while still performing well like larger models. This makes it ideal for devices with limited resources, enabling smarter applications in everyday tech.

More briefs