Hélain Zimmermann

Why Chinese AI Models Are Breaking the Economics of AI

Something remarkable has happened in the AI industry over the past six months, and it has less to do with model capabilities than with model economics. Chinese AI labs -- DeepSeek, Moonshot (Kimi), Zhipu (GLM), and others -- have released a wave of large language models that match or closely approach the performance of GPT-4 and Claude Opus 4 on major benchmarks, at a fraction of the inference cost. For startups, developers, and anyone building AI-powered products, this shift changes the calculus of model selection.

As the Latent Space newsletter reported in its coverage of GLM-5's release, we are witnessing a new wave of open-weights models that achieve state-of-the-art results while being dramatically more affordable to run. The implications ripple through every layer of the AI stack, from infrastructure to applications to business models. For a broader look at how these labs reached this point, see the trajectory of Chinese AI models catching up to Western counterparts.

The Benchmark Shock

The moment that crystallized this trend was DeepSeek-V3's performance on SWE-bench Verified, the widely-used benchmark for evaluating AI models on real-world software engineering tasks. DeepSeek-V3 scored 67.3%, placing it in the same tier as GPT-4o and ahead of many Western models that cost significantly more to operate. Kimi K2, released shortly after, posted similarly competitive numbers across a range of coding and reasoning benchmarks.

But the headline performance is only half the story. The other half is cost. Running DeepSeek-V3 through its API costs approximately $0.27 per million input tokens and $1.10 per million output tokens. Compare this with GPT-4o at $2.50/$10.00 or Claude Opus 4 at $15.00/$75.00 for the same token volumes. The gap is not incremental -- it is an order of magnitude.

For a concrete example: processing 1,000 customer support tickets, each averaging 500 tokens of input and 300 tokens of output, costs roughly $0.47 with DeepSeek-V3, $3.25 with GPT-4o, and $11.25 with Claude Opus 4. When you scale to millions of interactions per month, these differences translate to tens or hundreds of thousands of dollars in annual savings.

How They Got There

The cost advantage of Chinese models stems from several factors, not all of them purely technical.

Training efficiency. DeepSeek and others have pioneered mixture-of-experts (MoE) architectures that activate only a fraction of the model's parameters for any given query. DeepSeek-V3 has 671 billion total parameters but activates only 37 billion per token, meaning you get the knowledge of a massive model with the compute cost of a much smaller one. This is not a new idea (Google's Switch Transformer explored it years ago), but Chinese labs have been aggressive in pushing MoE to production scale.

Infrastructure cost advantages. Training and serving models in China benefits from lower electricity costs in certain regions, government subsidies for AI research, and a deep talent pool of ML engineers whose compensation, while rising, remains below Silicon Valley levels. These structural advantages compound across the entire model development lifecycle.

Competitive pressure. The Chinese AI market is ferociously competitive. Over a dozen well-funded labs are competing for developers, enterprise clients, and government contracts. This competition drives aggressive pricing, sometimes below cost, as labs race to build market share and ecosystem lock-in. DeepSeek, in particular, has been transparent about subsidizing API pricing to accelerate adoption.

Open-weights strategy. By releasing model weights, Chinese labs enable self-hosting, which further reduces costs for users with their own infrastructure. A startup running Kimi K2 on a leased GPU cluster can achieve per-token costs even lower than the already-cheap API pricing.

Why This Matters for Startups

For AI-native startups, and increasingly for any software company incorporating AI features, model cost is becoming a strategic variable, not just an operational one. The choice between open-source and closed AI providers now carries direct financial weight.

Consider a startup building an AI coding assistant. At GPT-4o pricing, each user generating 50,000 tokens per day costs roughly $1.25/day or $37.50/month in model inference alone. At DeepSeek-V3 pricing, the same usage costs approximately $0.19/day or $5.70/month. If the product charges $30/month, the difference between 125% gross margin and negative unit economics is the choice of model backend.

This does not mean everyone should immediately switch to the cheapest model. Quality differences exist, and they matter. For tasks requiring nuanced reasoning, complex instruction following, or safety-critical outputs, frontier models from OpenAI and Anthropic often outperform. But for many production workloads, summarization, classification, extraction, code generation, customer support, the gap has narrowed enough that cheaper models are not just viable but rational.

The result is a segmentation of model usage by task. Sophisticated AI teams now route queries to different models based on complexity: simple questions go to small, cheap models; complex reasoning goes to expensive frontier models. This routing strategy, sometimes called a "model cascade," can reduce total inference costs by 60 to 80 percent without meaningful degradation in output quality. Storing and retrieving context efficiently across these tiers often relies on vector databases under the hood.

The European Developer Perspective

For developers working in Europe, as I do, the rise of Chinese models introduces both opportunities and complications. Tasks that combine vision and text, for instance, benefit from multimodal AI pipelines where cheaper inference unlocks use cases that were previously too expensive.

Latency is a practical concern. DeepSeek's API servers are primarily located in Asia, and while they have expanded to some US endpoints, European developers often face 200 to 400 milliseconds of additional round-trip latency compared to OpenAI or Anthropic endpoints in Europe. For real-time applications, chatbots, IDE copilots, this latency is noticeable. For batch processing or async workloads, it is irrelevant.

Data residency is a regulatory question that has not been fully resolved. GDPR requires that personal data processed by AI models be handled in compliance with EU data protection standards. When your API requests transit through servers in China, the data protection picture becomes complex. Self-hosting open-weights models sidesteps this issue entirely but requires infrastructure investment.

Geopolitical risk is the elephant in the room. US export controls on AI chips have already affected Chinese labs' ability to train next-generation models, though they have proven resourceful in working around these constraints. European companies relying on Chinese model APIs face the possibility of service disruptions due to sanctions, trade disputes, or policy changes. For production workloads, this argues for maintaining a multi-model strategy with fallback options.

Despite these concerns, the pragmatic reality is that many European startups are quietly integrating Chinese models into their stacks, particularly for internal tools and non-customer-facing workloads where latency and data sensitivity are less critical.

Impact on the AI SaaS Market

The commoditization of model performance has cascading effects on the AI software market. If the cost of the underlying intelligence drops by 10x, several things happen.

AI wrapper companies face margin compression. Products that primarily provide a user interface over an LLM, AI writing tools, summarizers, chatbots, find their core cost advantage evaporating. When anyone can run a near-GPT-4-quality model for pennies, the value must come from data, workflow integration, or domain expertise, not from access to the model itself.

Incumbents are forced to respond. OpenAI has already reduced GPT-4o pricing twice in 2025 and introduced GPT-4o-mini for cost-sensitive workloads. Anthropic has expanded its Haiku model's capabilities. Google has aggressively priced Gemini Flash. The pricing pressure from Chinese models accelerates this race to the bottom.

Open-source wins. The availability of high-quality open-weights models strengthens the position of companies that build tooling around open-source AI, fine-tuning platforms, inference engines, deployment tools. If the model itself is a commodity, the value accrues to the ecosystem layer.

New applications become viable. Tasks that were too expensive to automate at $10 per million output tokens become practical at $1. This is not a hypothetical: several companies have told me they greenlit AI features only after Chinese model pricing made the unit economics work.

The "Good Enough" Threshold

There is a concept in technology adoption that is particularly relevant here: the "good enough" threshold. For any given task, there is a level of quality below which the output is not useful and above which additional quality improvements have diminishing returns. Once a cheaper option crosses that threshold, it captures the market regardless of whether a more expensive option is marginally better.

For many practical AI tasks, formatting data, generating boilerplate code, summarizing documents, translating text, answering factual questions, Chinese models have crossed the "good enough" threshold. A model that is 95% as good at one-tenth the cost wins in any rational economic framework. The remaining 5% matters for some use cases, but not for most.

This is analogous to what happened with cloud computing. AWS was not always the highest-performing option, but it was good enough and dramatically cheaper than on-premises infrastructure. The economic argument overwhelmed the performance argument for the vast majority of workloads. We are seeing the same dynamic play out with LLMs.

Implications for AI Infrastructure Investment

The rapid improvement of Chinese models raises uncomfortable questions for investors who have poured billions into AI infrastructure based on assumptions about sustained high margins.

If model inference is a commodity, the value of owning inference infrastructure diminishes. If open-weights models are competitive with proprietary ones, the moat of data and training compute narrows. If a dozen well-funded labs are all producing frontier-class models, the market structure looks more like cloud computing (a few big players, fierce price competition, thin margins) than like enterprise software (high margins, strong lock-in).

This does not mean AI is a bad investment. It means the returns will flow to different parts of the stack than many assumed. Applications that solve specific problems, not the underlying models, are likely where the real value accrues. Picks and shovels plays (inference infrastructure, fine-tuning platforms, evaluation tools) also look attractive. Pure model providers, unless they achieve genuine and sustainable differentiation, face a tougher road.

What Comes Next

The current moment is a transition. Chinese models have disrupted pricing assumptions, but the market has not yet reached equilibrium. Several dynamics will shape the next phase.

OpenAI and Anthropic will continue investing in capabilities that justify premium pricing, better reasoning, superior safety, enterprise features, and compliance certifications. Their argument will be that for mission-critical applications, the premium is worth paying. They are probably right for a segment of the market.

Chinese labs will continue pushing the frontier of cost-efficiency, potentially opening new architectures and training techniques that further reduce the compute required for a given capability level. The MoE approach has room to grow, and there are hints of even more efficient architectures in development.

The open-source ecosystem will mature, making it easier to self-host, fine-tune, and deploy models regardless of origin. This democratization of capability is, in the long run, the most consequential trend.

For developers and startups making model choices today, the practical advice is straightforward: evaluate models on the intersection of capability and cost for your specific use case. Do not default to the most expensive option out of inertia. Build your architecture to be model-agnostic so you can switch as the landscape evolves. And keep a close eye on the next generation of releases, the pace of improvement shows no sign of slowing down.

Related Articles

All Articles