Seven Models in Three Weeks: China's AI Labs Aren't Waiting
Between late January and mid-February, Chinese labs released seven major models in rapid succession. Open weights, aggressive pricing, agentic features, and domestic chip deployment define the wave.
Between January 27 and February 17, 2026, Chinese AI labs released seven major models. The timing wasn't random. Last year, DeepSeek dropped R1 right before Lunar New Year and briefly topped the US App Store, triggering roughly $600 billion in Nvidia market cap losses in a single day. This year, every major Chinese lab tried to replicate that playbook. The result: a three-week sprint of frontier-class releases.
Three patterns define this wave. Mixture-of-Experts (MoE) architectures dominate: most text models here use sparse activation, running only 3-10% of total parameters per token. Open-weight licensing under MIT or Apache 2.0 is common. And every lab is building for agents, not chatbots. The direction is the same everywhere: cheap, open, agentic models that companies can deploy on their own infrastructure.
The releases
Kimi K2.5 (Moonshot AI, January 27)
Moonshot AI's Kimi K2.5 is a ~1-trillion-parameter MoE model that activates only 32 billion parameters per token, routing through 8 of 384 experts per inference step. It's natively multimodal, pretrained on 15 trillion mixed visual and text tokens. Images and text go through a single architecture, not separate models stitched together.
The selling point is Agent Swarm: K2.5 can spin up and coordinate up to 100 sub-agents working in parallel across up to 1,500 coordinated steps. In wide-search scenarios, Agent Swarm reduces end-to-end execution time by up to 4.5x compared to single-agent runs. On browsecomp, scores jump from 60.6% without Agent Swarm to 78.4% with it. On aime-2025, K2.5 scores 96.1%, approaching GPT-5.2 while outperforming Claude Opus 4.5.
The model supports a 256K context window, is open-weight under a modified MIT license, and costs $0.60 per million input tokens and $3.00 per million output tokens via the Kimi API.
Kling 3.0 (Kuaishou, February 5)
Kuaishou's Kling 3.0 is a video generation model, not a text LLM, but it belongs in this wave. The pitch: native 4K at 60 frames per second, generated at the pixel level during diffusion rather than upscaled afterward. Kuaishou claims it is the first AI video model to generate natively at this resolution. Clips run up to 15 seconds with a multi-shot storyboard system supporting up to 6 distinct shots in a single generation pass.
Audio and video are generated in a single pass from a unified multimodal architecture, with native lip-sync in Chinese (including Cantonese and Sichuan dialects), English (American, British, Indian accents), Japanese, Korean, and Spanish. Different characters can speak different languages in the same scene. The Elements system lets creators upload reference images to "lock" a character's visual identity across shots and lighting changes.
Per Kuaishou's press release, Kling AI serves over 60 million creators and has produced more than 600 million videos since its June 2024 launch. Pricing is credit-based, with API access estimated at $0.07-$0.14 per second of video.
GLM-5 (z.ai, February 11)
z.ai's GLM-5 is a 744B/40B MoE model released under the MIT license. As of February 2026, it has the lowest hallucination rate of any model on the artificial-analysis AA-Omniscience Index, though there's a tradeoff: it says "I don't know" more aggressively, giving fewer wrong answers but also fewer answers overall. Pricing is $1.00/$3.20 per million input/output tokens.
The hardware story is worth separating from the model. GLM-5 runs on domestically manufactured chips for inference, like Huawei Ascend. z.ai says training also used Huawei Ascend chips without NVIDIA hardware, but the model's GitHub repo only confirms Ascend support for deployment, not training. Inference on domestic chips is confirmed; training on Ascend-only remains a company claim, not independently verified.
MiniMax M2.5 (MiniMax, February 12)
MiniMax M2.5 has 230 billion total parameters with only 10 billion active per forward pass, using 256 experts with 8 active per token. It ships in two variants: Standard ($0.30/$1.20 per million tokens) and Lightning ($0.30/$2.40), where Lightning delivers 100 tokens per second, roughly 2x the throughput of other frontier models.
80.2% on swe-bench Verified, matching Claude Opus 4.6. MiniMax says the total cost per swe-bench task is about 10% that of Claude Opus 4.6. MiniMax developed the model using Forge, a custom agent-native reinforcement learning framework trained across 200,000+ real-world environments. Their algorithmic contribution, CISPO (Clipped Importance Sampling Policy Optimization), clips importance sampling weights rather than token updates, ensuring all tokens contribute to gradient computation.
The modified MIT license requires commercial users to prominently display "MiniMax M2.5" on their product's user interface. MiniMax's tagline: "the $1/hour frontier model," referring to Lightning running continuously at 100 tokens per second.
Seedance 2.0 (ByteDance, February 12)
ByteDance released Seedance 2.0 on February 12, a video generation model that produces 2K clips up to 20 seconds with native audio-visual sync and multi-shot storytelling in 8+ languages. Physics-aware training means gravity, fabric draping, and fluid dynamics look more believable than previous models, and hand anatomy — long the tell for AI video — is largely solved. Two days earlier, ByteDance launched Seedream 5.0, an image generation model with 2K/4K output and reasoning-based editing, integrated into CapCut.
"Deadpool" screenwriter Rhett Reese said "it's likely over for us." The Motion Picture Association filed formal complaints the same week, with chairman Charles Rivkin claiming Seedance 2.0 engaged in "unauthorized use of US copyrighted works on a massive scale" without "meaningful safeguards against infringement." ByteDance disabled the ability to generate clips of recognizable public figures.
Doubao 2.0 / Seed 2.0 (ByteDance, February 14)
ByteDance also released Seed 2.0 on February 14, a four-model family (Pro, Lite, Mini, Code) that powers Doubao, China's most-used AI chatbot with 155 million weekly active users. ByteDance calls this their entry into the "agent era": models built to execute multi-step tasks, not just answer questions.
The Pro variant targets deep reasoning: 98.3 on aime-2025, 88.9 on gpqa-diamond, gold medals on icpc, imo, and CMO competitions, and 77.3 on browsecomp for autonomous agent workflows. Lite is the production default (93 AIME 2025, 2233 codeforces), Mini handles high-throughput batch work, and Code specializes in software development. Seed 2.0 Pro costs $0.47/$2.37 per million input/output tokens — roughly 1/10th of Claude Opus on input, 1/10th on output.
Qwen 3.5 (Alibaba, February 16)
Alibaba's Qwen 3.5 dropped hours before the Lunar New Year celebration. The flagship model, Qwen3.5-397B-A17B, has 397 billion total parameters with only 17 billion active per token, routing through 11 of 512 experts. It handles text, images, and video natively across 201 languages and dialects, with a 262K native context window extendable to 1 million tokens.
Alibaba claims Qwen 3.5 outperforms GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro on 80% of evaluated categories. It beats GPT-5.2 on document recognition (omnidocbench: 90.8 vs 85.7) but trails on coding (livecodebench v6: 83.6 vs 87.7) and math (AIME 2026: 91.3 vs 96.7). Strong across the board, but still behind the best closed models on the hardest reasoning and coding tasks.
Qwen 3.5's agentic angle is the most concrete. It can control mobile and desktop applications by interpreting screenshots, identifying UI elements, and executing multi-step workflows — scoring 62.2 on osworld-verified and 66.8 on androidworld. Open-weight under Apache 2.0. The hosted Qwen 3.5-Plus is priced at $0.40/$2.40 per million input/output tokens on Alibaba Cloud (international); the open-weight 397B model API costs $0.60/$3.60.
DeepSeek V4 (unconfirmed)
As of publication, DeepSeek has not made an official announcement. But on February 11, users noticed its chatbot silently upgrading its context window from 128K to 1 million tokens. Industry reports suggest V4 was planned for mid-February with approximately 1 trillion parameters, Engram conditional memory architecture (a system that separates static knowledge retrieval from dynamic reasoning, offloading facts to a scalable lookup layer), and coding-first focus. Reuters reported that multiple Chinese labs accelerated their releases specifically to avoid being overshadowed by a potential DeepSeek drop during the holiday.
How they compare
What this wave actually tells us
Domestic hardware, at least for one lab
GLM-5 runs on domestic chips for inference. z.ai claims training also uses Huawei's Ascend hardware. After launch, demand exceeded capacity. z.ai expanded its domestic chip cluster multiple times and asked outside companies to help run GLM-5 on their hardware. U.S. export controls were designed to keep Chinese labs dependent on NVIDIA, but Huawei plans to double Ascend output to 1.6 million chips in 2026, and Chinese government officials are now overseeing allocation of remaining high-end parts, prioritizing domestic options. If frontier-scale training on domestic hardware becomes routine, the export control strategy loses its leverage.
Most are open-weight
Four of the five text models in this wave (Kimi K2.5, GLM-5, MiniMax M2.5, Qwen 3.5) ship with open-weights under mit-license or apache-2-0. Seed 2.0 is the only proprietary one.
This isn't altruism. Open weights let developers fine-tune, self-host, and integrate without lock-in, and that adoption compounds fast. Stanford HAI/DigiChina calls Chinese open-weight families "unavoidable" in global developer ecosystems. Give away the weights, sell the platform around them. For Alibaba, that means cloud. For Zhipu and MiniMax (both IPO'd in Hong Kong last month), it's the API adoption that justifies their valuations.
Qwen has over 700 million downloads on Hugging Face and 180,000+ derivative models. Over 90,000 enterprises run Qwen through Alibaba Cloud's Model Studio, and AI-related revenue has grown triple digits for nine consecutive quarters, pushing cloud revenue up 34% year-over-year. Airbnb CEO Brian Chesky said the company "relies a lot" on Qwen, choosing it over ChatGPT because it's "fast and cheap."
Open releases also double as free R&D. The community finds failure modes, builds tooling, publishes evals. And when your models are the default in other countries, you set the standards. The open-weight push is part of a broader AI self-reliance strategy.
The agent era
Every lab in this wave is selling agents, not chatbots. Kimi K2.5 has Agent Swarm (100 sub-agents in parallel). ByteDance explicitly positions Doubao 2.0 for the "agent era." Qwen 3.5 controls desktop and mobile apps from screenshots. MiniMax trained in 200,000+ real-world environments. Alibaba is already putting money behind it. On February 6 it spent 3 billion yuan ($400 million) on coupons for "agentic commerce," where AI handles the shopping. 120 million orders in six days. Western labs are headed the same way. Anthropic shipped agent teams in Opus 4.6, and OpenAI hired OpenClaw founder Peter Steinberger to lead its agent push. The shared bet: the money in AI shifts from answering questions to executing workflows.
Pricing pressure
Every Chinese text model in this wave costs under $1 per million input tokens. Claude Opus 4.6 costs $5. GPT-5.2 costs $1.75. MiniMax claims M2.5 matches Opus on swe-bench at about 1/10th the cost per task. When models at these price points score within a few points of the best closed models, paying 5-25x more gets hard to justify.
Seven models in three weeks, with one in the oven. The playbook is the same across labs: sparse MoE architectures that cut inference costs, open weights that drive adoption, agent-first design for enterprise workflows, and pricing 5-10x cheaper than Western equivalents.
The benchmark gap with Western models is small and shrinking. The price gap is large and growing. But most of these scores are vendor-reported, and benchmarks don't measure what matters in production: reliability over long sessions, edge-case handling, content restrictions from Chinese regulations. Two questions define the rest of this story — whether Western labs can justify 5-25x pricing premiums, and whether Chinese models actually hold up outside the leaderboards.
Back to 7min.ai