The Rise of China's Four Domestic GPU Dragons: Can They Copy NVIDIA's Success?
Unlock More Features
Login to access AI-powered analysis, deep research reports and more advanced features

About us: Ginlix AI is the AI Investment Copilot powered by real data, bridging advanced AI with professional financial databases to provide verifiable, truth-based answers. Please use the chat box below to ask any financial question.
In 1993, Jensen Huang and two co-founders established NVIDIA at Denny’s restaurant in San Jose, California, USA. In the early days, the company faced huge difficulties:
- Unclear tech direction: There was no unified standard for 3D graphics processing at the time, and the market had a competitive landscape of multiple graphics APIs (e.g., DirectX and OpenGL had not yet become mainstream)
- Severe cash flow strain: Before the launch of the RIVA 128 product, the company’s total funds could only cover one month’s salary expenses
- Product failure lessons: Early NV1 and NV2 products did not achieve market success, almost pushing the company to the brink of bankruptcy
- From “black magic” to inclusive computing: Before CUDA, scientists needed to “disguise” scientific problems as graphics computing problems to use GPU parallel computing capabilities, a painful and inefficient process [2]
- Trinity of developer kingdom: CUDA is not just a programming model, but also includes:
- Killer libraries (e.g., cuDNN)
- Exclusive compiler
- Complete analysis toolchain
| Gravity Level | Lock-in Mechanism | Competitive Barrier |
|---|---|---|
Level 1: Ecosystem |
CUDA has accumulated over 53 million downloads, with a global developer base of approximately 5 million | Network effects and economies of scale |
Level 2: Code Assets |
Massive AI code optimized for CUDA, with extremely high migration costs | Sunk costs and path dependency |
Level 3: Talent Supply |
Top universities worldwide teach CUDA in parallel computing courses, and engineers’ career development revolves around CUDA | Lock-in of talent training systems |
Level 4: Performance and Trust |
The “it just works” stability standard formed through decades of iteration | Trust barriers and engineering accumulation [3] |
- Technological foresight: Strategic transformation from graphics GPU to general-purpose computing GPGPU
- Ecosystem monopoly: The software ecosystem built by CUDA becomes the core moat (70% gross margin “software tax” [3])
- Industrial collaboration: Establish deep cooperative relationships with manufacturing partners like TSMC
- Strategic patience: Continuous investment for over 15 years before reaping explosive growth in the AI era
| Company | Tech Route | Core Strategy | Commercial Progress |
|---|---|---|---|
Moore Threads |
GPU/GPGPU | Full-function GPU, building MUSA ecosystem to benchmark CUDA | Listed on STAR Market in December 2025, peak market capitalization exceeding 359.5 billion yuan; accelerated commercialization, direct sales ratio increased to 90% [4,5,7] |
Muxi Semiconductor |
GPU/GPGPU | Focus on high-performance general-purpose computing | Listed on STAR Market in December 2025, network API call volume exceeding 13 million; expected break-even time around 2026 [5,7] |
Biren Technology |
GPU/GPGPU | Pursue extreme computing power, aggressive chip design | Hong Kong IPO filing; first flagship GPU BR100 adopts dataflow parallelism + Chiplet + TSMC N7 process [5,7] |
Suiyuan Technology |
ASIC/NPU | DSA path, focus on AI training/inference | Entered STAR Market IPO counseling; participated in launching the “Chip-Model Ecological Innovation Alliance” for ecosystem co-construction [5,8] |
- Advantages: High versatility, supports parallel computing, tech path similar to NVIDIA
- Challenges: Must face CUDA ecosystem compatibility issues
- Moore Threads achieves CUDA code translation through the MUSA toolchain, but there are still performance losses
- As of October 2025, Muxi’s MXMACA software stack has approximately 150,000 registered users and actual customers, with network API calls exceeding 13 million [7]
- Advantages: Specialized design, higher energy efficiency ratio, lower software development difficulty
- Challenges: Limited versatility, needs optimization for specific scenarios [6]
- Production-sales ratio breakthrough: Both Moore Threads and Muxi have production-sales ratios exceeding 100%, marking the leap from testing and validation to large-scale commercial use [7]
- Customer structure upgrade: Shift from relying on distributors to directly docking key clients, with end customers covering internet enterprises, AI enterprises, computing power service providers, operators, etc. [7]
- Accelerated domestic substitution: In 2025, leading tech enterprises’ AI computing power investment is expected to reach 450 billion yuan, of which 30% is used for domestic chip verification and adaptation [8]
| Indicator | NVIDIA CUDA | Moore Threads MUSA | Muxi MXMACA |
|---|---|---|---|
| Global developer scale | Approximately 5 million (as of mid-2024) | Approximately 200,000 (as of December 2025) | Approximately 150,000 (as of October 2025) |
| Cumulative downloads/calls | Over 53 million | - | Over 13 million |
| Ecological maturity | Extremely high, accumulated over nearly 20 years | Under construction, version 5.0 iteration | Initial stage [3,7,10] |
- Toolchain translation: Moore Threads’ MUSIFY tool实现 CUDA code translation, but there are compatibility and performance losses
- Computing library replacement: Achieve one-to-one replacement of CUDA APIs through the MUSA-X computing library [6]
- “Chip-Model Ecological Innovation Alliance”: In July 2025, Jiyue Xingchen联合 nearly 10 domestic GPU manufacturers to launch it, connecting the entire chain of chip, model and platform technologies
- DeepSeek adaptation wave: The appearance of domestic large model DeepSeek in early 2025 triggered active adaptation by domestic GPU manufacturers (e.g., Muxi’s Xiyun C550 and Moore Threads’ MTT S4000 both passed relevant verification by CAICT), forming a closed-loop ecosystem of “domestic computing power + domestic large model” [7,8,11]
- 10,000-card cluster breakthrough: System-level breakthroughs such as Huawei Ascend 384 super node and Sugon scaleX 10,000-card supercomputing cluster provide large-scale application scenarios for domestic GPUs [8]
- Moore Academy: Covers over 200 universities, attracting more than 100,000 students to participate
- Open source initiative: Moore Threads announced the gradual open source of components such as computing acceleration libraries, communication libraries and system management frameworks [10]
| Dimension | GPU/GPGPU Path | ASIC/NPU/DSA Path |
|---|---|---|
Versatility |
High, supports extensive parallel computing | Low, optimized for specific scenarios |
Energy efficiency ratio |
Medium | High (optimized for AI loads) |
Ecological threshold |
Extremely high (need to break CUDA barriers) | Lower (focus on AI framework adaptation) |
Development difficulty |
High (need complete software stack) | Medium (focus on specific fields) |
Applicable scenarios |
General computing, AI training/inference | Specific AI inference, edge computing [6] |
| Dimension | NVIDIA 30 Years Ago | Today’s Domestic Four GPU Dragons |
|---|---|---|
Competitive landscape |
Multiple GPU manufacturers coexist, no absolute monopoly | NVIDIA dominates, but U.S. export controls create a localization window |
Market demand |
Explosion of PC graphics processing demand | Explosion of AI large model training/inference computing power demand |
Technical foundation |
Exploring 3D graphics from scratch | Can learn from mature GPU architectures, but advanced processes are constrained |
Capital environment |
Extremely difficult in early startup phase | National strategic support, high capital market enthusiasm (e.g., PS ratio exceeding 300x [7]) |
Ecological starting point |
DirectX and other APIs not yet unified | CUDA ecosystem is strong, but can cut in through migration and co-construction |
-
Tech architecture innovation: Break through single-chip process limitations through Chiplet, super node and other technologies
- Biren’s BR100 adopts Chiplet technology, achieving nominal computing power close to NVIDIA’s N4 process under TSMC N7 process
- Huawei Ascend 384 super node has a total computing power of 300 PFLOPS, forming a system-level breakthrough [7,8]
-
Ecosystem construction path:
- Education system penetration: Moore Academy covers over 200 universities
- Developer community: MUSA developer community has nearly 200,000 members
- Industrial alliance: Chip-Model Ecological Innovation Alliance and other organizations [10]
-
Industrial collaboration:
- Deep adaptation with domestic CPUs (Phytium, Hygon, Loongson), OS (Kylin, Tongxin), and PC manufacturers (Lenovo, Inspur) [7]
- “Mutual collaboration” between large model manufacturers and GPU manufacturers (e.g., DeepSeek adaptation wave) [8,11]
-
Historical window: NVIDIA established standards in the PC graphics era, while today it needs to face the existing CUDA monopoly ecosystem
-
Process constraints: U.S. advanced process controls limit the upper limit of single-chip performance, requiring breakthroughs through architectural innovations (e.g., Chiplet, super node) [6]
-
Time accumulation: The CUDA ecosystem has accumulated over nearly 20 years, and it is difficult to reach the same maturity in the short term
- Government, education, finance and other markets: High requirements for localization, relatively high tolerance for ecological maturity
- Inference scenarios: Compared with training scenarios, inference has lower dependence on the ecosystem, and domestic GPUs are rapidly penetrating into inference scenarios such as text-to-image and speech recognition [7]
- Cost advantage: Biren’s inference card Bili 110E provides 1.3x the computing power density of mainstream solutions with 70% energy savings [7]
- Localized services: Closer to domestic customer needs, quick response to customized needs
- Domestic large models like DeepSeek: Deeply adapt to domestic GPUs, forming a closed loop of “domestic chips + domestic models + domestic applications”
- Increasing proportion of inference loads: In the future, the demand for inference computing power in the enterprise sector will be 100x or even 1000x that of training computing power [8]
| Tech Route | Commercial Advantages | Commercial Challenges | Typical Cases |
|---|---|---|---|
GPU/GPGPU |
High versatility, large market space | High CUDA ecosystem barriers, need long-term investment | Moore Threads’ MUSA ecosystem construction; Muxi’s MXMACA toolchain [7,10] |
ASIC/NPU |
High specialized efficiency, low development difficulty | Limited market space, insufficient versatility | Huawei Ascend’s breakthrough in AI inference field [6,8] |
DSA (Domain-Specific Architecture) |
Balance versatility and efficiency | Need clear scenario definition | Suiyuan Technology’s participation in the “Chip-Model Ecological Innovation Alliance” [8] |
- Through the AIBOOK computing power notebook product, it packages hardware, drivers, software stacks, toolchains and systems into one device, providing partners with “full-link verification”
- This “prove it works” strategy reduces customers’ adaptation risks and accelerates the commercialization process [10]
- Moore Threads’ muDNN has GEMM and FlashAttention efficiency exceeding 98%, communication efficiency reaching 97%, and compiler performance improved by 3x [10]
- These engineering capabilities are as important as chip peak parameters, which is why manufacturers put software stacks and cluster engineering on the same roadmap [10]
According to prospectus disclosures:
- Muxi Semiconductor: Expected break-even time is around 2026
- Moore Threads: Expected to achieve consolidated statement profitability as early as 2027 [7]
- Low domestic chip penetration rate, need to break through ecological construction
- Market expansion shows gradual development
- Key AI chip products need to go through strict technical verification and ecological adaptation cycles to enter key industry customers [7]
Based on the above analysis, I believe this is a complex topic worthy of launching
- Valuation bubble risk: Some companies have PS ratios exceeding 300x, with unreasonable valuation risks [7]
- Revenue stability risk: Revenue is supported by new orders, with poor stability [7]
- Ecological risk: Insufficient ecology may affect platform compatibility [7]
- Tech route comparison: Long-term competitiveness analysis of GPU vs ASIC vs NPU
- Ecosystem construction progress: Developer community scale, toolchain maturity, adaptation coverage of each company
- Commercialization quality: Customer structure, order sustainability, clarity of profit path
- Policy and industrial chain impact: U.S. export controls, localization substitution policies, industrial chain synergy effects
- International comparison: Comparison with AMD and Intel’s catch-up experience in the GPU field
In in-depth research mode, further analysis can be conducted on:
- Financial data comparison: Revenue structure, R&D investment, cash flow status of the four dragons
- Technical indicator analysis: FP16/FP32/INT8 computing power comparison of each product line
- Patent layout: Patent accumulation of each company in GPU architecture, interconnection technology, compiler and other fields
- Supply chain analysis: Cooperation relations and capacity allocation with foundries like TSMC and SMIC
- Competitor dynamics: Market strategies and product roadmaps of NVIDIA, AMD and Intel
China’s four domestic GPU dragons
| Dimension | NVIDIA Path | Domestic GPU Differentiated Path |
|---|---|---|
Technical starting point |
From scratch, no reference architecture | Learn from mature architectures, but process constrained |
Ecological strategy |
Build CUDA ecosystem from scratch | Dual-line parallel approach of CUDA compatibility + native ecosystem co-construction |
Market strategy |
Global market | Prioritize domestic substitution, cost-performance advantage |
Time window |
PC graphics era | AI large model era, but facing CUDA monopoly |
-
Tech route differentiation:
- GPU manufacturers (Moore Threads, Muxi, Biren): Focus on breaking CUDA compatibility and native ecosystem
- ASIC/NPU manufacturers (Suiyuan, Huawei): Focus on AI inference and specific scenario optimization
-
Systematic ecosystem construction:
- Education system penetration (Moore Academy)
- Developer community cultivation (200,000 MUSA developers)
- Industrial alliance collaboration (Chip-Model Ecological Innovation Alliance)
- Large model manufacturer adaptation (DeepSeek adaptation wave)
-
Commercial pragmatism:
- Gradual path from “usable” to “good to use”
- Domestic substitution priority markets like government, education and finance
- Rapid penetration of inference scenarios
- System-level solutions (super nodes, 10,000-card clusters)
Insights are generated using AI models and historical data for informational purposes only. They do not constitute investment advice or recommendations. Past performance is not indicative of future results.
About us: Ginlix AI is the AI Investment Copilot powered by real data, bridging advanced AI with professional financial databases to provide verifiable, truth-based answers. Please use the chat box below to ask any financial question.
