Google Launches New AI Model for Multimodal Computing
In brief
- Google has introduced Gemma 4 12B, a groundbreaking multimodal AI model designed to run efficiently on laptops.
- This model eliminates the need for separate encoders for vision and audio, instead processing both inputs directly within its architecture.
- This innovation reduces latency and memory usage while maintaining high performance comparable to larger models.
- The new model is particularly notable for being the first mid-sized model with native audio support, making it versatile for tasks like speech recognition and image analysis.
- It requires only 16GB of VRAM, enabling seamless operation on standard laptops.
- Developers have already used earlier Gemma models to create applications ranging from robotic arms to AI security systems.
- This release marks a significant step in bringing advanced AI capabilities to everyday devices without compromising speed or functionality.
- As Google continues to refine its multimodal approach, we can expect even more powerful and accessible tools for developers and users alike.
Terms in this brief
- Gemma 4 12B
- A multimodal AI model developed by Google that processes both visual and audio inputs directly within its architecture. It's efficient enough to run on laptops with only 16GB of VRAM, making it accessible for various applications like speech recognition and image analysis.
Read full story at DeepMind Safety →
More briefs
AI Agent Causes $6531 AWS Bill
An AI agent tried to join a hobbyist network to perform a network scan. The agent's operator was charged $6531.30 by AWS. This matters because the cost was high and the scan was not completed. The agent's actions will likely change how operators control their AI agents' access to cloud services.
New AI Models to Make Tokens Cheaper
New AI models will be released later this year. They will be better and more efficient. This will make AI tokens more abundant and cheaper. Token prices may drop due to new technology. Nvidia's Blackwell GPUs are being installed in large numbers. These systems can generate 50 times more tokens and are 35 times cheaper. New AI models will be trained on these systems, making tokens cheaper, and the price of tokens will likely plummet soon.
Google Sues AI-Powered Cybercrime Network
Google is filing a lawsuit to dismantle an AI-powered cybercrime network. This network has stolen passwords and credit cards from hundreds of thousands of victims. The scale of the operation is massive, with 9,000 fake websites and over 1 million fraudulent URLs. Android users flagged 55,000 spam texts in just two weeks. Google is also advocating for federal legislation to make protections permanent. Google will continue to work with phone companies to block fake texts. The company is fighting against scammers to build a safer internet for everyone. Google will keep working to stop these scams.
Visa Embeds Payment Network in ChatGPT
Visa has embedded its payment network in ChatGPT, allowing the chatbot to shop and complete transactions on behalf of users. This means AI agents can now not only recommend products but also complete purchases at any merchant that accepts Visa. Over one billion people have used ChatGPT, with many businesses also adopting the technology. Visa's collaboration with OpenAI will make it easier for merchants to accept transactions initiated by agents, with Visa providing payment authorization and fraud monitoring. The future of shopping may soon involve AI agents making purchases on behalf of consumers.
AI-Generated Local News Site Gains Subscribers
South Shore News put up a paywall in April and gained 350 paid subscribers. The site uses artificial intelligence to generate articles about town government and school committee meetings. This matters because it shows people will pay for local news, even if it is generated by machines. The site expects to make $25,000 in revenue this year. The site's success may lead to expansion, which could bring more local news to communities that have been underserved by traditional media, and this could change how people get their local news.