Inception Labs launches Mercury 2, first diffusion-based reasoning LLM hitting 1,009 tokens/sec

AI startup Inception Labs released Mercury 2, a reasoning model that replaces standard autoregressive decoding with a diffusion-based approach, generating multiple tokens simultaneously. Running on Nvidia Blackwell GPUs, it hits 1,009 tokens/sec with end-to-end latency of 1.7 seconds, compared to 14.4s for Gemini 3 Flash. Pricing undercuts competitors at $0.25/1M input and $0.75/1M output tokens. The model offers 128K context, native tool use, and tunable reasoning depth. Inception positions it for high-throughput production workloads where autoregressive bottlenecks compound across agent loops.

View full digest for February 25, 2026