MiniMax M2.5: A Coding Agent That's Actually Affordable

By Prahlad Menon Published 2026-02-17 2 min read

MiniMax just dropped M2.5, and the numbers are hard to ignore: 80.2% on SWE-Bench Verified, making it state-of-the-art for AI coding agents. But what caught my attention is the pricing: $1/hour at 100 tokens/second.

That’s “intelligence too cheap to meter” territory.

The Headlines

80.2% SWE-Bench Verified — Best publicly reported score
51.3% Multi-SWE-Bench — Multi-repo problem solving
76.3% BrowseComp — Web browsing and research tasks
37% faster than M2.1, matching Claude Opus 4.6 speed

What Makes It Different

M2.5 was trained with reinforcement learning across 200,000+ real-world coding environments. Not toy problems—actual software development scenarios.

The result is a model that thinks like an architect:

Before writing any code, M2.5 actively decomposes and plans the features, structure, and UI design of the project from the perspective of an experienced software architect.

This “spec-writing tendency” emerged during training. It doesn’t just fix bugs—it designs systems.

Full-Stack Coverage

Trained on 10+ languages (Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, Ruby) covering:

0-to-1 — System design, environment setup
1-to-10 — Core development
10-to-90 — Feature iteration
90-to-100 — Code review, testing

Across Web, Android, iOS, Windows—backend APIs, databases, business logic. Not just frontend demos.

The Cost Story

At $1/hour for 100 tokens/second, you could run a coding agent continuously for a workday for less than a coffee. Compare that to:

Claude Opus: ~$75/million tokens
GPT-4: ~$30/million tokens

For agentic tasks that require sustained reasoning, this pricing changes what’s economically viable.

My Take

We’re past the point where “can it code?” is the question. The question is “how fast and how cheap?”

M2.5’s combination of SOTA performance + aggressive pricing suggests MiniMax is betting on volume. If coding agents become commodity infrastructure—like cloud compute—the cheapest capable option wins.

For developers building AI-powered coding tools, this is worth benchmarking against your current stack. The performance/price ratio is compelling.

Links: