Skip to main content

    MiniMax-M2.5: The $1/hour Frontier Model

    Web Content

    minimax m2.5frontier ai modelreinforcement learningagentic aicost-efficient aicoding agent
    February 13, 2026
    Source

    Summary

    MiniMax has released M2.5, a new frontier AI model notable for its high performance and cost-efficiency. Launched on February 12th, 2026, M2.5 achieves competitive benchmarks like 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench (first place), and 76.3% on BrowseComp, often matching or exceeding models like Claude Opus 4.6 and GPT-5.2. Its striking feature is its low operational cost, approximately $1 per hour at 100 tokens per second.

    M2.5 is a 230 billion parameter Mixture-of-Experts (MoE) model with only 10 billion active parameters per forward pass, making its inference footprint manageable. It comes in two API variants: M2.5-Lightning (100 tokens/second, $0.30/M input, $2.40/M output) and M2.5 Standard (50 tokens/second, $0.15/M input, $1.20/M output). This pricing is significantly cheaper than competitors like Claude Opus 4.6 ($5/M input, $25/M output) and GLM-5 ($1/M input, $3.20/M output). The model was trained primarily through large-scale reinforcement learning using MiniMax's in-house Forge framework across over 200,000 real-world environments. It handles both code (10+ languages including Python, C++, Rust) and full office productivity workflows (Word, Excel, PowerPoint).

    Key innovations in Forge RL include:

    • CISPO (Clipped Importance Sampling Policy Optimization): A custom RL algorithm that clips importance sampling weights, ensuring all tokens contribute to gradient computations for scalable RL.
    • Asynchronous scheduling + tree-structured sample merging: Achieves approximately a 40x training speedup over naive approaches by optimizing throughput and sample off-policyness.
    • Process rewards for credit assignment: Monitors generation quality throughout long agent trajectories and estimates real-world task completion time as a reward signal.

    The model exhibits emergent spec-writing behavior, proactively planning before coding, which improves token efficiency (3.52M tokens/task on SWE-Bench for M2.5 vs. 3.72M for M2.1). MiniMax also developed the GDPval-MM benchmark for office productivity, where M2.5 achieved a 59.0% average win rate against mainstream models. While strong, early reports suggest occasional sloppiness, but it represents a clear path for future cost-efficient frontier models. The training period for M2.5 was about two months.

    Build your own second brain

    Save and connect content like this in your personal library.

    Save any link or doc

    One place for everything

    AI summaries and topics

    Understand at a glance

    Your second brain

    Search and connect ideas

    Join thousands building their second brain