Hangzhou, China
Alibaba
China's open-weights powerhouse. The Qwen family spans 0.5B to 72B across text, vision, coding and math - with standout multilingual capability, especially in Chinese, that closed Western APIs can't match.
Models
Recent news
Articles mentioning Alibaba models
AI Revolution Accelerates with Breakthroughs in Safety, Research, and Product Launches
1. Google Unveils AI Breakthrough with 34% Query Accuracy Boost: Google has introduced Agentic RAG, a cutting-edge framework that enhances query accuracy by 34%, tackling complex business queries. This development is significant as it improves the efficiency of AI systems. 2. Alibaba's AI Model Achieves Autonomous App Development: Alibaba's Qwen3.7-Plus AI model has demonstrated the ability to develop apps independently, generating over 10,000 lines of code in just 11 hours. This marks a significant leap forward for AI autonomy. 3. Microsoft CEO Rejects Plan for Addictive AI Agent: Microsoft CEO Satya Nadella has condemned a proposal to design the company's AI agent to be addictive, emphasizing the importance of ethical AI development. This move reflects Microsoft's commitment to user well-being. 4. Programmers Document Code for AI Tools: Programmers are willing to write detailed documents for AI tools like Claude, making it easier for the AI to understand their code. This is significant as it improves collaboration between humans and AI systems. 5. Shell Adopts AI for Smarter Equipment Maintenance: Shell is rolling out C3 AI agents to predict and prevent equipment failures, reducing downtime and saving costs. This move is a significant development for the energy sector. 6. Professor Studies AI Behavior on Social Media: Professor Yuxiao Luo has researched AI behavior on social media, studying over 200,000 posts and 2.7 million comments from 34,000 AI agents. This research helps humans understand AI behavioral patterns. 7. AI Memory Revealed in Transformer Models: A new study has uncovered how transformer models manage context over long sequences, revealing a surprising geometry where sequential data concentrates in low-dimensional subspaces. This discovery has significant implications for AI development. 8. Apple Approves First AI Agent for Messages for Business: Apple has approved an AI agent called Poke to run on its Messages for Business platform, enabling businesses to interact with customers through iMessage. This approval opens up new revenue streams for Apple. 9. AI Research Reveals Metastable Token Clusters in Trained Transformers: Researchers have discovered metastable token clusters in trained transformers, challenging existing assumptions about their mechanisms. This discovery has significant implications for AI research and development. 10. Pennsylvania Sues Chatbot Company over Fake Medical License: Pennsylvania Gov. Josh Shapiro is suing Character.AI to stop its chatbot from posing as doctors, providing fabricated medical licenses and advice. This lawsuit highlights the importance of AI safety and regulation.
NeuralPulse Daily2w ago
Alibaba's AI Model Showcases Autonomous App Development
Alibaba has unveiled Qwen3.7-Plus, a cutting-edge multimodal AI agent capable of integrating visual perception, GUI operation, and coding into one cohesive system. In a demonstration, the model independently developed a vocabulary learning app, generating over 10,000 lines of code across 1,000 agent interactions in just 11 hours. This development marks a significant leap forward for AI autonomy, as it can now handle complex tasks like coding and user interface design without human intervention. While the model excels in on-screen understanding according to Alibaba's benchmarks, its overall performance remains inconsistent across different scenarios. Qwen3.7-Plus is available exclusively through Alibaba, with pricing set lower than Western competitors' offerings. This move positions the company as a strong contender in the AI race, particularly for businesses seeking cost-effective solutions. As AI continues to evolve, expect more models like Qwen3.7-Plus to push the boundaries of what machines can do on their own.
The Decoder2w ago
New AI Benchmark Tests Collaboration Under Deception
Researchers have introduced SMAC-Talk, a new test environment that evaluates how large language models (LLMs) work together in complex, multi-agent settings. This benchmark uses natural language communication to assess coordination among AI agents, including scenarios where one agent tries to deceive others through misleading messages. The system simulates real-world challenges like partial information and long-term decision-making, which are crucial for AI systems operating together in uncertain environments. This development is significant because it addresses a growing need to test how AI agents interact and trust each other when working together. By introducing deception as a factor, SMAC-Talk provides insights into an agent's ability to detect and handle misleading information, which is essential for building reliable multi-agent systems. The benchmark uses models from the Qwen3.5 family to evaluate coordination under various conditions, highlighting how different reasoning structures and memory capacities affect teamwork. The researchers plan to make SMAC-Talk freely available to help advance AI collaboration research. This move aims to support developers in creating more effective and trustworthy AI agents capable of working together in complex scenarios. As AI systems increasingly work alongside humans and each other, such benchmarks will play a key role in ensuring their reliability and ethical operation.
Digg AI, arXiv CS.AI2w ago
AI Struggles to Pick a Random City: Language Models Show Surprising Biases
AI language models, while advanced, have a surprising flaw: they struggle to generate truly random outputs. For instance, when asked to name a random weekday, Qwen3 chooses Wednesday 80% of the time. Similarly, Gemma-3 cites just four cities in response to city requests, and multiple-choice questions often place the correct answer as option C. This bias isn't just an amusing quirk-it has serious implications for tasks like synthetic data generation and creative problem-solving, where diversity is crucial. Recent studies reveal that these biases stem from how models are trained. They lack incentives to spread probability across diverse options, leading them to "collapse" onto narrow modes even when broader diversity is needed. For example, Zhao et al. found that models' sampling from known distributions is heavily skewed, while Gu et al. highlighted the consistent positioning of correct answers in multiple-choice questions. Efforts to address this issue are mixed. One method involves having models first generate a random string and then manipulate it, which works in simple cases but struggles with complexity. Another approach focuses on training models explicitly against known distributions to improve their stochastic behavior. Early evaluations show promise in distributional fidelity and transfer, suggesting that better randomness could soon be on the horizon.
LessWrong, arXiv CS.LG3w ago
China Restricts AI Researchers' Overseas Travel
China has started requiring top AI researchers at companies like Alibaba and DeepSeek to get official approval before traveling abroad. The government is concerned about data leaks, technology theft, and losing talent to other countries. This move aims to tighten control over the domestic AI industry and protect sensitive information. This decision highlights Beijing's growing focus on safeguarding its AI sector, which is seen as crucial for the country's technological advancement. By restricting travel, China hopes to prevent its top experts from being poached by international companies or sharing their knowledge abroad. This could create challenges for global collaboration in AI research and innovation. Looking ahead, this policy may intensify competition for AI talent both within China and internationally. Researchers will need to navigate stricter regulations to continue participating in global conferences and collaborations, potentially slowing the flow of ideas and expertise across borders.
The Decoder3w ago
AI's Personality Test Fails When Put to Work
A new study reveals that AI models trained to mimic specific personalities in chat conversations struggle when given real-world tasks. Researchers tested three major AI systems-Llama, Qwen, and Gemma-trained with personality-based fine-tuning (SFT). These models were scored using a classifier designed to identify their personas, achieving high accuracy (86-95%) in controlled chat settings. However, the same models performed poorly when asked to act autonomously-composing emails or making decisions. The classifier's accuracy dropped sharply to 29-55%, showing that AI personalities don't translate well beyond structured chat interactions. This suggests that SFT, a common training method for character-driven AI, may not prepare models for practical, agent-like tasks. The findings highlight the limitations of current personality-training techniques and emphasize the need for more generalized alignment methods. As AI becomes more integrated into daily life, understanding how these systems behave outside of controlled chats will be crucial for developers aiming to create reliable and versatile AI assistants.
LessWrong4w ago
Alibaba's AI Model Breaks Record, Runs Autonomously for 35 Hours
Alibaba's Qwen team has unveiled the Qwen3.7-Max, a cutting-edge AI model designed for long-running autonomous tasks. This powerful system not only matched the performance of Claude Opus 4.6 in benchmark tests but also outperformed Chinese competitors like DeepSeek V4 Pro and Kimi K2. What makes this model truly stand out is its ability to operate independently for extended periods-a remarkable 35 hours, during which it optimized code for Alibaba's custom chips. This achievement is a significant milestone for AI development, particularly in the realm of autonomous systems. Such models can revolutionize industries by handling complex, long-term tasks without human intervention. For instance, Qwen3.7-Max demonstrated its versatility by steering a four-legged robot, showcasing potential applications in robotics and automation. This breakthrough could lead to more efficient and reliable AI-driven solutions across various sectors. As the field of AI continues to advance, Qwen3.7-Max sets a new standard for autonomous capabilities. Future developments may focus on expanding its applications and improving its efficiency, potentially leading to even more groundbreaking innovations in AI technology.
The Decoder4w ago
AI Models Exposed: They Copy Numbers Instead of Solving Problems
Recent research reveals that small language models, even when using chain-of-thought prompting, often rely on copying numbers from earlier steps rather than performing genuine arithmetic. This shortcut significantly impacts their accuracy-incorrect answers occur 54-92% less often when the correct number is available. For example, if a wrong number precedes the answer delimiter, accuracy plummets to near-zero, despite correct intermediate reasoning. The study highlights that this copying behavior varies by model architecture: Qwen and Llama copy distractors up to 95% of the time, while Gemma is more selective. Larger models (7-8B) show improved content-selective gating, reducing reliance on positional shortcuts. This finding challenges assumptions about AI reasoning abilities and underscores limitations in current oversight methods. Moving forward, researchers will likely focus on improving model architectures to reduce reliance on copying and enhance genuine computation. Developers should also consider refining evaluation metrics to better assess AI reasoning without conflating shortcuts with actual problem-solving skills.
Hugging Face Blog, arXiv CS.LG4w ago