Alibaba · Qwen

Qwen 3.6 Plus

Alibaba's broad open-weights family, strong on multilingual.

Qwen 3.6 Plus is a multimodal AI model developed by Alibaba, enabling advanced text, image, and video processing. It features a massive context window of 1,000,000 tokens, allowing it to handle extensive inputs and long-form tasks seamlessly.

Its standout multimodal capabilities and scalable cost-efficiency make it suitable for a wide array of applications. Qwen 3.6 Plus integrates advanced media understanding and synthesis, making it a versatile tool for developers and product teams aiming for complex, cross-modal workflows.

Qwen 3.6 Plus represents the flagship of the Qwen family, designed with expanded multimodal capabilities and a significantly larger 1,000,000 token context window. It reflects a major leap in scope and efficiency compared to its predecessors.

Background

Alibaba Cloud, also known as Aliyun, is a cloud computing company, a subsidiary of Alibaba Group. Alibaba Cloud provides cloud computing services to online businesses and Alibaba's own e-commerce ecosystem. Its international operations are registered and headquartered in Singapore.

Wikipedia

Specs

Context window: 1M tokens
Max output: 66K tokens
Input ($/1M tokens): $0.179
Output ($/1M tokens): $1.07
Modalities: Text · Image · Video
Released: Sep 19, 2024
Weights: Open

Pricing last synced May 22, 2026 via OpenRouter. Confirm against official docs before committing.

Capabilities

Tool use
Vision
Extended thinking
Prompt caching
Open weights

What it excels at

Massive context handling
Processes up to 1,000,000 tokens for long-form content and extended context retention.
Multimodal capabilities
Seamlessly integrates text, image, and video inputs and outputs, enabling diverse applications.
Competitive pricing
Offers cost-effective input and output processing optimized for large-scale tasks.
Sophisticated media processing
Handles and synthesizes complex multimodal content with nuanced reasoning across formats.

When to use this model

→Long document parsing - Its expansive context window supports detailed analysis without losing input coherence.
→Multimodal content generation - Combines text, image, and video creation for rich, diversified outputs.
→Customer interaction systems - Maintains extensive conversational history for context-aware, high-quality responses.
→Cross-modal search and analysis - Enables precise querying and comparisons between text, image, and video inputs.

Analysis synthesized from gpt-4o, llama-4-maverick, phi-4, etc.

API model id

qwen/qwen3.6-plus

Vendor docs: qwenlm.github.io

Compare Qwen 3.6 Plus with