MiniMax-M2.5: The $1/hour Frontier Model

Summary

MiniMax has released M2.5, a new frontier AI model notable for its high performance and cost-efficiency. Launched on February 12th, 2026, M2.5 achieves competitive benchmarks like 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench (first place), and 76.3% on BrowseComp, often matching or exceeding models like Claude Opus 4.6 and GPT-5.2. Its striking feature is its low operational cost, approximately $1 per hour at 100 tokens per second.

M2.5 is a 230 billion parameter Mixture-of-Experts (MoE) model with only 10 billion active parameters per forward pass, making its inference footprint manageable. It comes in two API variants: M2.5-Lightning (100 tokens/second, $0.30/M input, $2.40/M output) and M2.5 Standard (50 tokens/second, $0.15/M input, $1.20/M output). This pricing is significantly cheaper than competitors like Claude Opus 4.6 ($5/M input, $25/M output) and GLM-5 ($1/M input, $3.20/M output). The model was trained primarily through large-scale reinforcement learning using MiniMax's in-house Forge framework across over 200,000 real-world environments. It handles both code (10+ languages including Python, C++, Rust) and full office productivity workflows (Word, Excel, PowerPoint).

Key innovations in Forge RL include:

CISPO (Clipped Importance Sampling Policy Optimization): A custom RL algorithm that clips importance sampling weights, ensuring all tokens contribute to gradient computations for scalable RL.
Asynchronous scheduling + tree-structured sample merging: Achieves approximately a 40x training speedup over naive approaches by optimizing throughput and sample off-policyness.
Process rewards for credit assignment: Monitors generation quality throughout long agent trajectories and estimates real-world task completion time as a reward signal.

The model exhibits emergent spec-writing behavior, proactively planning before coding, which improves token efficiency (3.52M tokens/task on SWE-Bench for M2.5 vs. 3.72M for M2.1). MiniMax also developed the GDPval-MM benchmark for office productivity, where M2.5 achieved a 59.0% average win rate against mainstream models. While strong, early reports suggest occasional sloppiness, but it represents a clear path for future cost-efficient frontier models. The training period for M2.5 was about two months.

MiniMax-M2.5: The $1/hour Frontier Model

Summary

Build your own second brain