Ginlix AI
50% OFF

The Rise of China's Four Domestic GPU Dragons: Can They Copy NVIDIA's Success?

#gpu #国产芯片 #英伟达 #生态建设 #技术路线 #商业化 #人工智能
Neutral
A-Share
December 30, 2025

Unlock More Features

Login to access AI-powered analysis, deep research reports and more advanced features

The Rise of China's Four Domestic GPU Dragons: Can They Copy NVIDIA's Success?

About us: Ginlix AI is the AI Investment Copilot powered by real data, bridging advanced AI with professional financial databases to provide verifiable, truth-based answers. Please use the chat box below to ask any financial question.

The Rise of China’s Four Domestic GPU Dragons: Can They Copy NVIDIA’s Success?
I. Historical Mirror: NVIDIA’s Rise Path 30 Years Ago
1.1 The Darkest Hour in the Early Startup Phase (1993-1996)

In 1993, Jensen Huang and two co-founders established NVIDIA at Denny’s restaurant in San Jose, California, USA. In the early days, the company faced huge difficulties:

  • Unclear tech direction
    : There was no unified standard for 3D graphics processing at the time, and the market had a competitive landscape of multiple graphics APIs (e.g., DirectX and OpenGL had not yet become mainstream)
  • Severe cash flow strain
    : Before the launch of the RIVA 128 product, the company’s total funds could only cover one month’s salary expenses
  • Product failure lessons
    : Early NV1 and NV2 products did not achieve market success, almost pushing the company to the brink of bankruptcy

Key turning point
: In 1997, it launched the high-performance 128-bit graphics chip RIVA 128 supporting the Direct3D graphics interface, selling 1 million units in four months and enabling NVIDIA to rise from the ashes [1].

1.2 CUDA Ecosystem: Building the Moat (2006-Present)

Background of CUDA’s birth
: In 2006, NVIDIA launched the Unified Computing Device Architecture (CUDA), a revolutionary milestone in GPU development history:

  • From “black magic” to inclusive computing
    : Before CUDA, scientists needed to “disguise” scientific problems as graphics computing problems to use GPU parallel computing capabilities, a painful and inefficient process [2]
  • Trinity of developer kingdom
    : CUDA is not just a programming model, but also includes:
    • Killer libraries (e.g., cuDNN)
    • Exclusive compiler
    • Complete analysis toolchain

Four-layer gravitational lock-in of the moat
:

Gravity Level Lock-in Mechanism Competitive Barrier
Level 1: Ecosystem
CUDA has accumulated over 53 million downloads, with a global developer base of approximately 5 million Network effects and economies of scale
Level 2: Code Assets
Massive AI code optimized for CUDA, with extremely high migration costs Sunk costs and path dependency
Level 3: Talent Supply
Top universities worldwide teach CUDA in parallel computing courses, and engineers’ career development revolves around CUDA Lock-in of talent training systems
Level 4: Performance and Trust
The “it just works” stability standard formed through decades of iteration Trust barriers and engineering accumulation [3]
1.3 Summary of NVIDIA’s Core Success Factors
  1. Technological foresight
    : Strategic transformation from graphics GPU to general-purpose computing GPGPU
  2. Ecosystem monopoly
    : The software ecosystem built by CUDA becomes the core moat (70% gross margin “software tax” [3])
  3. Industrial collaboration
    : Establish deep cooperative relationships with manufacturing partners like TSMC
  4. Strategic patience
    : Continuous investment for over 15 years before reaping explosive growth in the AI era

II. China’s Four Domestic GPU Dragons: Comparison of Tech and Business Status
2.1 Differentiated Positioning of the Four Dragons
Company Tech Route Core Strategy Commercial Progress
Moore Threads
GPU/GPGPU Full-function GPU, building MUSA ecosystem to benchmark CUDA Listed on STAR Market in December 2025, peak market capitalization exceeding 359.5 billion yuan; accelerated commercialization, direct sales ratio increased to 90% [4,5,7]
Muxi Semiconductor
GPU/GPGPU Focus on high-performance general-purpose computing Listed on STAR Market in December 2025, network API call volume exceeding 13 million; expected break-even time around 2026 [5,7]
Biren Technology
GPU/GPGPU Pursue extreme computing power, aggressive chip design Hong Kong IPO filing; first flagship GPU BR100 adopts dataflow parallelism + Chiplet + TSMC N7 process [5,7]
Suiyuan Technology
ASIC/NPU DSA path, focus on AI training/inference Entered STAR Market IPO counseling; participated in launching the “Chip-Model Ecological Innovation Alliance” for ecosystem co-construction [5,8]
2.2 Deep Differences in Tech Routes
GPU/GPGPU Camp (Moore Threads, Muxi, Biren)
  • Advantages
    : High versatility, supports parallel computing, tech path similar to NVIDIA
  • Challenges
    : Must face CUDA ecosystem compatibility issues
    • Moore Threads achieves CUDA code translation through the MUSA toolchain, but there are still performance losses
    • As of October 2025, Muxi’s MXMACA software stack has approximately 150,000 registered users and actual customers, with network API calls exceeding 13 million [7]
ASIC/NPU Camp (Suiyuan, Huawei Ascend)
  • Advantages
    : Specialized design, higher energy efficiency ratio, lower software development difficulty
  • Challenges
    : Limited versatility, needs optimization for specific scenarios [6]

Expert opinion
: Against the background of U.S. export controls, the DSA (TPU, NPU) design idea may be a breakthrough point for China, and Huawei Ascend (NPU) has already taken the lead [6].

2.3 Phased Achievements in Commercialization

2025 becomes the inflection point year
:

  • Production-sales ratio breakthrough
    : Both Moore Threads and Muxi have production-sales ratios exceeding 100%, marking the leap from testing and validation to large-scale commercial use [7]
  • Customer structure upgrade
    : Shift from relying on distributors to directly docking key clients, with end customers covering internet enterprises, AI enterprises, computing power service providers, operators, etc. [7]
  • Accelerated domestic substitution
    : In 2025, leading tech enterprises’ AI computing power investment is expected to reach 450 billion yuan, of which 30% is used for domestic chip verification and adaptation [8]

III. Core Challenges: Tech Routes and Ecosystem Construction
3.1 Reality of Ecological Gaps

Scale comparison of CUDA moat
:

Indicator NVIDIA CUDA Moore Threads MUSA Muxi MXMACA
Global developer scale Approximately 5 million (as of mid-2024) Approximately 200,000 (as of December 2025) Approximately 150,000 (as of October 2025)
Cumulative downloads/calls Over 53 million - Over 13 million
Ecological maturity Extremely high, accumulated over nearly 20 years Under construction, version 5.0 iteration Initial stage [3,7,10]

Partners’ perceptions
: Some partners admit that compared with international mainstream products, domestic products are still about “1 to 2 generations behind” in performance and ecology [10]. Another partner pointed out that the CUDA ecosystem has been “used by hundreds of thousands of people for many years”, and when migrating to MUSA, toolchain smoothness, adaptation coverage breadth, etc., still need continuous investment [10].

3.2 Practical Paths for Ecosystem Breakthrough
Path 1: Compatibility and Migration
  • Toolchain translation
    : Moore Threads’ MUSIFY tool实现 CUDA code translation, but there are compatibility and performance losses
  • Computing library replacement
    : Achieve one-to-one replacement of CUDA APIs through the MUSA-X computing library [6]
Path 2: Native Ecosystem Co-construction
  • “Chip-Model Ecological Innovation Alliance”
    : In July 2025, Jiyue Xingchen联合 nearly 10 domestic GPU manufacturers to launch it, connecting the entire chain of chip, model and platform technologies
  • DeepSeek adaptation wave
    : The appearance of domestic large model DeepSeek in early 2025 triggered active adaptation by domestic GPU manufacturers (e.g., Muxi’s Xiyun C550 and Moore Threads’ MTT S4000 both passed relevant verification by CAICT), forming a closed-loop ecosystem of “domestic computing power + domestic large model” [7,8,11]
  • 10,000-card cluster breakthrough
    : System-level breakthroughs such as Huawei Ascend 384 super node and Sugon scaleX 10,000-card supercomputing cluster provide large-scale application scenarios for domestic GPUs [8]
Path 3: Open Source and Developer Cultivation
  • Moore Academy
    : Covers over 200 universities, attracting more than 100,000 students to participate
  • Open source initiative
    : Moore Threads announced the gradual open source of components such as computing acceleration libraries, communication libraries and system management frameworks [10]
3.3 Key Trade-offs of Tech Routes
Dimension GPU/GPGPU Path ASIC/NPU/DSA Path
Versatility
High, supports extensive parallel computing Low, optimized for specific scenarios
Energy efficiency ratio
Medium High (optimized for AI loads)
Ecological threshold
Extremely high (need to break CUDA barriers) Lower (focus on AI framework adaptation)
Development difficulty
High (need complete software stack) Medium (focus on specific fields)
Applicable scenarios
General computing, AI training/inference Specific AI inference, edge computing [6]

IV. In-depth Analysis of Whether NVIDIA’s Path Can Be Copied
4.1 Historical Environment Comparison
Dimension NVIDIA 30 Years Ago Today’s Domestic Four GPU Dragons
Competitive landscape
Multiple GPU manufacturers coexist, no absolute monopoly NVIDIA dominates, but U.S. export controls create a localization window
Market demand
Explosion of PC graphics processing demand Explosion of AI large model training/inference computing power demand
Technical foundation
Exploring 3D graphics from scratch Can learn from mature GPU architectures, but advanced processes are constrained
Capital environment
Extremely difficult in early startup phase National strategic support, high capital market enthusiasm (e.g., PS ratio exceeding 300x [7])
Ecological starting point
DirectX and other APIs not yet unified CUDA ecosystem is strong, but can cut in through migration and co-construction
4.2 Replicable Success Factors
✅ Can be replicated
:
  1. Tech architecture innovation
    : Break through single-chip process limitations through Chiplet, super node and other technologies

    • Biren’s BR100 adopts Chiplet technology, achieving nominal computing power close to NVIDIA’s N4 process under TSMC N7 process
    • Huawei Ascend 384 super node has a total computing power of 300 PFLOPS, forming a system-level breakthrough [7,8]
  2. Ecosystem construction path
    :

    • Education system penetration: Moore Academy covers over 200 universities
    • Developer community: MUSA developer community has nearly 200,000 members
    • Industrial alliance: Chip-Model Ecological Innovation Alliance and other organizations [10]
  3. Industrial collaboration
    :

    • Deep adaptation with domestic CPUs (Phytium, Hygon, Loongson), OS (Kylin, Tongxin), and PC manufacturers (Lenovo, Inspur) [7]
    • “Mutual collaboration” between large model manufacturers and GPU manufacturers (e.g., DeepSeek adaptation wave) [8,11]
❌ Difficult to replicate
:
  1. Historical window
    : NVIDIA established standards in the PC graphics era, while today it needs to face the existing CUDA monopoly ecosystem

  2. Process constraints
    : U.S. advanced process controls limit the upper limit of single-chip performance, requiring breakthroughs through architectural innovations (e.g., Chiplet, super node) [6]

  3. Time accumulation
    : The CUDA ecosystem has accumulated over nearly 20 years, and it is difficult to reach the same maturity in the short term

4.3 More Promising Differentiated Paths
Path A: Prioritize Domestic Substitution
  • Government, education, finance and other markets
    : High requirements for localization, relatively high tolerance for ecological maturity
  • Inference scenarios
    : Compared with training scenarios, inference has lower dependence on the ecosystem, and domestic GPUs are rapidly penetrating into inference scenarios such as text-to-image and speech recognition [7]
Path B: Cost-performance Advantage
  • Cost advantage
    : Biren’s inference card Bili 110E provides 1.3x the computing power density of mainstream solutions with 70% energy savings [7]
  • Localized services
    : Closer to domestic customer needs, quick response to customized needs
Path C: Large Model Native Ecosystem
  • Domestic large models like DeepSeek
    : Deeply adapt to domestic GPUs, forming a closed loop of “domestic chips + domestic models + domestic applications”
  • Increasing proportion of inference loads
    : In the future, the demand for inference computing power in the enterprise sector will be 100x or even 1000x that of training computing power [8]

V. Key Impact Analysis of Commercial Implementation
5.1 Impact of Tech Routes on Commercialization
Tech Route Commercial Advantages Commercial Challenges Typical Cases
GPU/GPGPU
High versatility, large market space High CUDA ecosystem barriers, need long-term investment Moore Threads’ MUSA ecosystem construction; Muxi’s MXMACA toolchain [7,10]
ASIC/NPU
High specialized efficiency, low development difficulty Limited market space, insufficient versatility Huawei Ascend’s breakthrough in AI inference field [6,8]
DSA (Domain-Specific Architecture)
Balance versatility and efficiency Need clear scenario definition Suiyuan Technology’s participation in the “Chip-Model Ecological Innovation Alliance” [8]
5.2 Commercial Value of Ecosystem Construction

Moore Threads case
:

  • Through the AIBOOK computing power notebook product, it packages hardware, drivers, software stacks, toolchains and systems into one device, providing partners with “full-link verification”
  • This “prove it works” strategy reduces customers’ adaptation risks and accelerates the commercialization process [10]

Key quantitative indicators
:

  • Moore Threads’ muDNN has GEMM and FlashAttention efficiency exceeding 98%, communication efficiency reaching 97%, and compiler performance improved by 3x [10]
  • These engineering capabilities are as important as chip peak parameters, which is why manufacturers put software stacks and cluster engineering on the same roadmap [10]
5.3 Profitability Timeline

According to prospectus disclosures:

  • Muxi Semiconductor
    : Expected break-even time is around 2026
  • Moore Threads
    : Expected to achieve consolidated statement profitability as early as 2027 [7]

Profitability challenges
:

  • Low domestic chip penetration rate, need to break through ecological construction
  • Market expansion shows gradual development
  • Key AI chip products need to go through strict technical verification and ecological adaptation cycles to enter key industry customers [7]

VI. In-depth Research Recommendations

Based on the above analysis, I believe this is a complex topic worthy of launching

in-depth research mode
for the following reasons:

6.1 Valuation and Risk Tips
  • Valuation bubble risk
    : Some companies have PS ratios exceeding 300x, with unreasonable valuation risks [7]
  • Revenue stability risk
    : Revenue is supported by new orders, with poor stability [7]
  • Ecological risk
    : Insufficient ecology may affect platform compatibility [7]
6.2 Worthwhile In-depth Research Directions
  1. Tech route comparison
    : Long-term competitiveness analysis of GPU vs ASIC vs NPU
  2. Ecosystem construction progress
    : Developer community scale, toolchain maturity, adaptation coverage of each company
  3. Commercialization quality
    : Customer structure, order sustainability, clarity of profit path
  4. Policy and industrial chain impact
    : U.S. export controls, localization substitution policies, industrial chain synergy effects
  5. International comparison
    : Comparison with AMD and Intel’s catch-up experience in the GPU field
6.3 Recommended Research Tools

In in-depth research mode, further analysis can be conducted on:

  1. Financial data comparison
    : Revenue structure, R&D investment, cash flow status of the four dragons
  2. Technical indicator analysis
    : FP16/FP32/INT8 computing power comparison of each product line
  3. Patent layout
    : Patent accumulation of each company in GPU architecture, interconnection technology, compiler and other fields
  4. Supply chain analysis
    : Cooperation relations and capacity allocation with foundries like TSMC and SMIC
  5. Competitor dynamics
    : Market strategies and product roadmaps of NVIDIA, AMD and Intel

VII. Conclusion: Not Replication, but Differentiated Surpassing
7.1 Core Conclusion

China’s four domestic GPU dragons

cannot fully replicate
NVIDIA’s rise path 30 years ago, but have the opportunity to
achieve breakthroughs through differentiated paths
:

Dimension NVIDIA Path Domestic GPU Differentiated Path
Technical starting point
From scratch, no reference architecture Learn from mature architectures, but process constrained
Ecological strategy
Build CUDA ecosystem from scratch Dual-line parallel approach of CUDA compatibility + native ecosystem co-construction
Market strategy
Global market Prioritize domestic substitution, cost-performance advantage
Time window
PC graphics era AI large model era, but facing CUDA monopoly
7.2 Key Success Factors
  1. Tech route differentiation
    :

    • GPU manufacturers (Moore Threads, Muxi, Biren): Focus on breaking CUDA compatibility and native ecosystem
    • ASIC/NPU manufacturers (Suiyuan, Huawei): Focus on AI inference and specific scenario optimization
  2. Systematic ecosystem construction
    :

    • Education system penetration (Moore Academy)
    • Developer community cultivation (200,000 MUSA developers)
    • Industrial alliance collaboration (Chip-Model Ecological Innovation Alliance)
    • Large model manufacturer adaptation (DeepSeek adaptation wave)
  3. Commercial pragmatism
    :

    • Gradual path from “usable” to “good to use”
    • Domestic substitution priority markets like government, education and finance
    • Rapid penetration of inference scenarios
    • System-level solutions (super nodes, 10,000-card clusters)
7.3 Final Judgment

Can China’s four domestic GPU dragons copy NVIDIA?
The answer is:
Cannot fully replicate, but have the opportunity for differentiated

Ask based on this news for deep analysis...
Alpha Deep Research
Auto Accept Plan

Insights are generated using AI models and historical data for informational purposes only. They do not constitute investment advice or recommendations. Past performance is not indicative of future results.