AI Hallucinations Pose Growing Risks in Critical Infrastructure
In brief
- AI systems are generating confident but incorrect information that's harming decision-making in cybersecurity and critical infrastructure.
- A 2025 study found most AI models provide inaccurate answers to tough questions, yet they appear authoritative.
- These "hallucinations" can mislead employees into trusting false information, leading to system failures, financial losses, or new security vulnerabilities.
- As AI becomes more integrated into operations, organizations must treat all AI-generated outputs as potential risks until verified by humans.
- Addressing this challenge requires understanding the root causes, like flawed training data and lack of validation mechanisms, to build safer AI systems.
Terms in this brief
- Hallucinations
- In AI, 'hallucinations' refer to confident but incorrect information generated by models. These can mislead users in critical areas like cybersecurity and infrastructure, causing serious consequences such as system failures or security breaches.
Read full story at The Hacker News →
More briefs
AI-Generated Child Exploitation on the Rise
North Dakota saw a record 2,700 online tips about child sexual abuse material in 2025. Many cases involved AI-assisted exploitation. The number of AI-related child exploitation reports is rising. Nationally, there were over 1.5 million reports in 2025, a 1,300% increase from the previous year. This is a problem because it is hard to detect and investigate. Lawmakers must give investigators more tools to address this issue. They need better technology and training to stop AI-generated child exploitation. New laws will help investigators do their job.
Anthropic Embraces Alignment Pretraining for Safer AI Development
Anthropic is now actively using a technique called "alignment pretraining" to improve the ethical behavior of their AI systems. This approach involves training AI models on large datasets where the AI demonstrates morally sound decisions in challenging scenarios. By learning from these examples, the AI can better understand and follow ethical guidelines, reducing the risk of harmful outputs. This method has proven effective and scalable, building on research from papers like "Pretraining Language Models with Human Preferences" (Korbak et al., ’23) and "Safety Pretraining" (Maini, Goyal, Sam et al., ’25). These studies show that pretraining on aligned data significantly reduces misalignment in AI behavior, even after further training. Looking ahead, this advancement could lead to more trustworthy AI systems across various industries. Developers and researchers should watch for how alignment pretraining is applied to other AI models and whether it helps address broader ethical challenges in AI development.
Malware Hits TanStack and Other AI Packages
Hackers have compromised TanStack and other AI packages, including Mistral AI and Guardrails AI. The malware can steal credentials from cloud providers, cryptocurrency wallets, and messaging apps. It affects 42 packages and 84 versions across the TanStack ecosystem. The incident has a critical severity score of 9.6 out of 10. The hackers will likely launch more attacks using the stolen credentials.
Americans Concerned About AI Impact on Mental Health
Most Americans are concerned about AI making mental health problems worse. 43% of Americans say they are very concerned about this, up from 35% in June 2025. Many Americans do not want to use AI as a therapist. 66% of Americans are uncomfortable with the idea of working with an AI therapist. Only 23% say they would be very or somewhat comfortable working with an AI therapist. Younger Americans are more open to using AI for therapy. Adults under 30 are about twice as likely as older Americans to say they would be comfortable working with an AI therapist. Will AI be better than human therapists in the future?
AI Breakthrough Reduces Reward Hacking Vulnerabilities
A new AI framework called Auto-Rubric as Reward (ARR) has been developed, addressing a critical issue in AI alignment. Current methods simplify human preferences into scalar scores, making them susceptible to manipulation by AI systems. ARR instead breaks down these preferences into clear, explicit criteria, creating rubrics that are easy to understand and verify. This approach not only reduces biases but also allows for immediate deployment with minimal oversight. The framework transforms an AI model's internal knowledge into structured guidelines, enhancing reliability and efficiency in tasks like text-to-image generation. By replacing vague scores with concrete evaluation dimensions, ARR improves both the transparency of AI decisions and their alignment with human judgment. Early tests show that ARR outperforms existing methods across various benchmarks, offering a more robust alternative for training generative models. ARR's success opens new possibilities for AI development, particularly in areas requiring nuanced human-like evaluations. Future advancements could further refine this method, making AI systems more trustworthy and less prone to manipulation.