Chinese GPU Manufacturers: Technical Routes, Commercial Paths, and CUDA Ecosystem Barriers
Unlock More Features
Login to access AI-powered analysis, deep research reports and more advanced features

About us: Ginlix AI is the AI Investment Copilot powered by real data, bridging advanced AI with professional financial databases to provide verifiable, truth-based answers. Please use the chat box below to ask any financial question.
The domestic GPU “Four Dragons” have certain division and collaboration advantages due to differences in their technical routes, combined with their respective product positioning and ecosystem cooperation approaches. However, under NVIDIA’s training/inference ecosystem barrier built with CUDA, it is still difficult to completely shake its dominant position in the short term. The practical path should focus on domestic specific application scenarios and system-level integration (e.g., inference clusters, domestic cloud services), and with the support of policies and capital, achieve phased commercial breakthroughs, while gradually moving closer to the NVIDIA ecosystem or building compatible solutions.
-
Moore Threads (Full-Functionality, NVIDIA-Aligned)
Moore Threads is currently the only domestic enterprise to achieve mass production of full-functionality GPUs, with product lines covering AI intelligent computing, graphics rendering, and smart cockpits. It aims to replicate NVIDIA’s coverage in training + inference + graphics, but is currently in an expansionary loss phase, and is expected to achieve break-even around 2027. Facing the CUDA ecosystem, Moore Threads leverages its founding team’s NVIDIA experience to attempt certain compatibility at the system protocol layer, but still has a significant gap in the software ecosystem (e.g., cuDNN/TensorRT). Therefore, its short-term strategy needs to gain commercial returns from “domestic alternatives + government/industry orders” while actively promoting the optimization of local deep learning frameworks and compiler support to narrow the gap with CUDA [1]. -
Muxi (Replicating AMD’s B2B Route)
With an AMD background, Muxi focuses on general-purpose GPUs and hybrid graphics/AI applications, with current emphasis on cloud intelligent computing inference and training general-purpose chips. Its “Xiyun” series has achieved a full-process domestic closed loop, and by leveraging the high concentration of the B2B market, it quickly entered domestic cloud services and industry integrators. The company plans to quickly establish market credibility through strongly bound strategic investments (e.g., National AI Industry Fund). Since its positioning is closer to AMD, if it can establish stable compatibility solutions with mainstream server hardware (CPU + memory), its commercialization acceleration capability is promising, but it still needs to face the deep accumulation of the CUDA ecosystem in AI training scripts and toolchains [2]. -
Biren (Targeting High-End Training)
Biren Technology focuses on high-performance training GPUs and is planning dual listings on the Hong Kong Stock Exchange and the Science and Technology Innovation Board (STAR Market), with a valuation once exceeding 100 billion yuan. As of now, the company’s designs in high-performance computing and large model training target NVIDIA’s A100/H100, but have not yet formed a mature ecosystem at the software level. To achieve a breakthrough, Biren can build joint optimization models through cooperation with high-end servers and domestic super computing power (e.g., Huawei/Alibaba Cloud); at the same time, developing a CUDA-compatible decoding layer or providing automated conversion tools at the toolchain level may help establish an ecological foundation among closed/customized customers [3]. -
Suoyuan (Tied to Tencent Cloud Services)
Suoyuan’s core advantage lies in its binding to Tencent Cloud. Through close integration with cloud services, it achieves rapid deployment in specific scenarios (e.g., enterprise cloud + AI applications). In processor design, it prefers cost-effective inference chips and uses Tencent’s industry resources to implement scenario landing. Although single-card performance is still difficult to match the H100, it has natural advantages in latency-sensitive inference scenarios and government/financial customers with extremely high domesticization requirements. In the future, it can form a differentiated combination through a “GPU + DSA” hybrid architecture (e.g., using NVIDIA for training in the cloud first, then Suoyuan chips for inference). -
Cambricon (DSA and AI-Specific Architecture)
Although not part of the “Four Dragons”, it complements the above manufacturers in the ASIC/DSA field. Cambricon has achieved profitability in 2024/2025, with its market value remaining high, highlighting the advantages of DSA chips in the inference market. Its strategy does not directly target CUDA, but focuses on scenario-based inference and system collaboration. This route avoids direct competition with CUDA to the greatest extent, and uses efficiency, cost, and domesticization as selling points.
-
CUDA’s Ecosystem Barriers: NVIDIA has built an end-to-end training and inference toolchain (cuBLAS, cuDNN, TensorRT, CUDA Graphs, etc.) through CUDA, with supporting rich open-source frameworks (PyTorch/TensorFlow/JAX, etc.) and optimizers, forming a highly sticky developer ecosystem. Any domestic GPU that wants to compete in the training scenario must invest huge R&D costs in compatibility, performance tuning, compiler optimization, etc.
-
Differentiated Breakthrough Points:
- System-Level Integration and Scenario Optimization: Domestic GPUs can bypass single-card performance bottlenecks and form “computing clusters” through efficient interconnection (e.g., domestic high-speed interconnection, heterogeneous scheduling) to compete at the “system level” (similar to Huawei Atlas/Alibaba Panjiu). With the customized needs of domestic large cloud vendors and government customers, they can meet the “domestic computing power” security requirements [4].
- DSA/ASIC Combination: On the inference side, DSA chips can seize the high-demand long-tail scenarios with lower power consumption and higher throughput, forming a “GPU training + DSA inference” combination to avoid NVIDIA’s dominance in the training field.
- Software Compatibility and Conversion Layer: By developing CUDA-compatible layers or automated conversion tools (e.g., self-developed compilers to convert CUDA calls into domestic instructions), the migration cost is reduced, allowing existing models to run on domestic GPUs.
| Company | Commercial Advantages | Main Risks | Areas to Strengthen |
|---|---|---|---|
| Moore Threads | Full-function products, sufficient capital after IPO | High gross profit and profitability not yet achieved, equipment restrictions | Improve software ecosystem, build cooperative sales network |
| Muxi | AMD background, mature B2B model | Small scale, still in investment phase | Accelerate large-scale delivery, expand customer base |
| Biren | High-end training positioning, planned international listing | Scarce software ecosystem, high market benchmark pressure | Provide training toolchains, optimize cooperative customers |
| Suoyuan | Tied to Tencent Cloud, rich inference scenarios | Obvious gap in single-card performance vs CUDA | Consolidate cloud integration and low-power advantages |
| Cambricon | DSA advantages, profitability achieved | Intensified competition in the inference market | Expand training scenarios and cross-platform compatibility |
Commercialization breakthrough key points:
- First-Mover Advantage + Ecosystem Closure: Use government procurement, cloud service platforms, and cooperation with domestic whole machine manufacturers to form a “hardware + software + service” closed loop;
- High-Cost Replacement of NVIDIA in Sensitive Fields: Such as government affairs, national defense, finance and other industries with high domesticization requirements;
- Layered Ecosystem: On the training side, Biren/Moore Threads/Muxi can gradually build training environments (even if not fully replacing CUDA), while the inference side can fully achieve commercial returns through DSA.
- Short-Term (1-2 Years): Prioritize deployment in scenarios with strong domesticization demand (government enterprises/cloud-edge collaboration/industries with high security levels). Minimize direct competition with CUDA and focus on vertical applications and system integration.
- Mid-Term (3-5 Years): Gradually migrate some training tasks to domestic GPUs by optimizing compatible layers, compilers, and performance tuning tools. Expand collaboration with local AI frameworks and enhance differentiated advantages (e.g., cooperation with large model training platforms).
- Risk Management: Continuously pay attention to supply chain policies, power supply/heat dissipation and reliability to ensure no risks in large-scale deployment.
- Recommendation to Activate “In-Depth Research Mode”: If a detailed comparison of each company’s latest financial reports, gross margins, and order structures is needed, the in-depth research mode can be activated to obtain more A-share/US stock industry data and quantitative indicators.
[1] Web Search - Yahoo Hong Kong Finance: “Chinese GPU Newcomer Moore Threads’ Retail Subscription Multiple Hits Three-Year High” (https://hk.finance.yahoo.com/news/中國gpu新貴摩爾線程將在科創板亮相-散戶認購倍數創三年來之最-233148786.html)
[2] Web Search - Yahoo Hong Kong Finance: “Muxi Shares Debut on STAR Market; AMD Background and GPU Four Dragons Label Expected to Open Valuation Space for the Company” (https://hk.finance.yahoo.com/news/沐曦股份亮相科创板-amd光环及gpu四小龙标签料为公司打开估值空间-220000870.html)
[3] Web Search - Yahoo Hong Kong Finance: “China’s Computing Power Enters Capital Harvest Period! GPU Four Dragons Sprint for IPO” (https://hk.finance.yahoo.com/news/中國算力進入資本收割期-gpu四小龍衝刺上市-燧原科技17日登科創版-010003148.html)
[4] Web Search - Yahoo Hong Kong Finance: “7 Trillion Dollar Business Opportunity! ASIC Chips Surpass 10 Million Units; China’s AI Breaks Through with ‘System-Level’ Counterattack” (https://hk.finance.yahoo.com/news/7兆美元商機!asic晶片衝破千萬顆中國ai靠「系統級」逆襲突圍-025409147.html)
Insights are generated using AI models and historical data for informational purposes only. They do not constitute investment advice or recommendations. Past performance is not indicative of future results.
About us: Ginlix AI is the AI Investment Copilot powered by real data, bridging advanced AI with professional financial databases to provide verifiable, truth-based answers. Please use the chat box below to ask any financial question.
