Ship or Skip — Daily AI Tool Reviews

Alibaba's Qwen 3.6 Plus Arrives With 1M Context, Chain-of-Thought Always On — and It's Free on OpenRouter

Alibaba released Qwen 3.6 Plus with a 1 million token context window, always-on chain-of-thought reasoning, native tool use, and up to 65,536 output tokens — beating Claude 4.5 Opus on Terminal-Bench 2.0 and leading all models on OmniDocBench v1.5. It's available free on OpenRouter as a preview.

Original source

Alibaba's Qwen team released Qwen 3.6 Plus on April 2, 2026 — a flagship model with a 1 million token context window that enables repo-level code analysis, full-book comprehension, and long-horizon agentic tasks without chunking.

The model runs chain-of-thought reasoning at all times, unlike many frontier models that offer reasoning as a separate mode. This always-on approach means every response benefits from intermediate reasoning steps, at the cost of higher latency and token usage — a trade-off Alibaba argues is worth it for complex tasks.

On benchmark performance, Qwen 3.6 Plus posted 61.6 on Terminal-Bench 2.0 (compared to Claude 4.5 Opus at 59.3) and led all evaluated models on OmniDocBench v1.5 with a score of 91.2, a benchmark focused on understanding complex documents including tables, charts, and mixed-layout PDFs. It supports up to 65,536 output tokens per request — essential for generating long-form code, reports, or structured data.

The model is available on OpenRouter as a free preview, making it immediately accessible to developers who want to test 1M-context capabilities without committing API credits. The free tier has rate limits but covers the vast majority of experimental use cases.

For developers already using Chinese frontier models via OpenRouter, Qwen 3.6 Plus is a meaningful step up from Qwen 2.5 72B, particularly for document-heavy workflows and code generation at repository scale.

Panel Takes

The Builder

Developer Perspective

“Free on OpenRouter with 1M context is the headline. Beating Claude 4.5 Opus on Terminal-Bench means it's genuinely competitive for agentic coding tasks, not just a benchmark-optimized model. The 65K output token limit is particularly useful for generating complete codebases in a single request.”

The Skeptic

Reality Check

“Terminal-Bench 2.0 and OmniDocBench are relatively niche benchmarks — beating Claude on them doesn't mean Qwen 3.6 Plus is better in practice for most users. Always-on chain-of-thought also means slower responses and higher costs at scale. The free OpenRouter preview will tell us more than any benchmark.”

The Futurist

Big Picture

“The race to 1M-plus context is about making entire codebases, legal documents, and research corpora first-class inputs. Alibaba shipping this for free is a strategic move to capture developer mindshare globally. Within 18 months, 1M context will be the baseline expectation for any frontier model.”