Huawei Launches CloudMatrix 384 AI System

Huawei has unveiled the CloudMatrix 384, its most advanced AI computing system to date. It combines 384 Ascend 910C chips into a single cluster that rivals, and in some ways exceeds, Nvidia’s latest AI platform. The launch took place at the 2025 World AI Conference in Shanghai and positions Huawei as a key player in large-scale AI infrastructure—especially as the US tightens restrictions on Nvidia’s exports to China.
This article breaks down what CloudMatrix 384 is, how it performs, what makes it different from competitors, and why it matters in the global AI race.

What Is CloudMatrix 384?
CloudMatrix 384 is a high-performance AI cluster built by Huawei. It connects 384 of Huawei’s own Ascend 910C chips using an all-optical interconnect. The system is designed to support the most demanding AI workloads, including large language models and multi-modal inference.
Unlike traditional GPU clusters that focus on raw chip speed, Huawei’s approach relies on system-level performance. By tightly integrating chips, memory, and compute logic, the platform delivers faster output and better bandwidth for massive AI tasks.
Key Features and System Design
The system uses a unified bus architecture, which allows direct communication between chips at high speed. This is paired with 192 Kunpeng CPUs and 48 terabytes of HBM memory, making it suitable for training and deploying large-scale foundation models.
CloudMatrix 384 supports advanced parallelism strategies like expert parallel (EP320) and uses token optimization for inference acceleration. The onboard software, called CloudMatrix-Infer, handles peer-to-peer token dispatch and memory-efficient model serving.
Core Specs and Capabilities of CloudMatrix 384
| Component | Specification | Impact |
| Accelerators | 384 Ascend 910C NPUs | High compute power for training/inference |
| Memory | 48 TB HBM | Supports large context windows in LLMs |
| CPUs | 192 Kunpeng cores | Coordinates scheduling and task management |
| Interconnect | All-optical supernode | Low latency, high bandwidth between chips |
| Peak Compute (BF16) | Up to 300 PFLOPS | Exceeds Nvidia GB200 NVL72 (180 PFLOPS) |
How It Compares to Nvidia
Huawei openly admits that a single Ascend 910C is not as powerful as Nvidia’s best chips. But CloudMatrix makes up for that with scale. By integrating more chips with better system design, Huawei claims to deliver superior performance at the cluster level.
CloudMatrix 384 achieves 3.6 times more memory and over 2 times more memory bandwidth than Nvidia’s GB200 NVL72. Its peak compute power is nearly 66 percent higher. However, it comes at a cost—power usage is significantly higher, with the system drawing around 559 kilowatts.
CloudMatrix 384 vs Nvidia GB200 NVL72
| Feature | Huawei CloudMatrix 384 | Nvidia GB200 NVL72 | Winner |
| Number of AI Chips | 384 Ascend 910C | 72 Nvidia GB200 | Huawei (more chips) |
| Memory | 48 TB HBM | ~13 TB HBM | Huawei |
| Peak Compute (BF16) | 300 PFLOPS | 180 PFLOPS | Huawei |
| Power Consumption | ~559 kW | ~240 kW | Nvidia (more efficient) |
| Interconnect | Optical Supernode | NVLink Switch | Huawei (lower latency) |
Strategic Value for China
CloudMatrix 384 is more than a technical achievement. It represents China’s push to build domestic AI hardware and reduce reliance on US-based companies like Nvidia. With export controls limiting access to advanced chips, Huawei’s system gives China a homegrown alternative.
Huawei invests over ¥180 billion per year in R&D. CloudMatrix shows that the focus has shifted from single-chip performance to full-stack ecosystem control. That includes hardware, compilers, interconnects, and training software—all built in-house.
Use Cases and Workload Targets
Huawei built CloudMatrix 384 for:
- Training foundation models
- Real-time inference at scale
- Multi-modal systems using vision and language
- Expert parallel models for higher throughput
This system supports inference speeds of over 6,600 tokens per second per chip for prefill tasks and close to 2,000 tokens per second for decoding. It also maintains token latency under 50 milliseconds and delivers 538 tokens per second within a 15 ms latency cap using INT8 quantization.
Who Should Care
This launch is significant for researchers, enterprises, and governments focused on sovereign AI infrastructure. If you work in AI deployment, edge computing, or national computing policy, CloudMatrix 384 is a case study in scaling up with constraints.
To understand how large systems like this are designed and optimized, professionals should explore programs like the AI Certification. For engineers working with model performance and system bottlenecks, the Data Science Certification offers practical insights. For strategy, enterprise use, and commercialization, the Marketing and Business Certification is ideal.
Final Takeaway
Huawei’s CloudMatrix 384 is a bold response to AI chip restrictions and growing demand for domestic compute infrastructure. By focusing on system architecture and tight integration, Huawei has built an AI cluster that rivals Nvidia’s best—at least at the data center level.
The real test will be adoption. If Chinese tech firms, universities, and government agencies switch to CloudMatrix for large-scale AI tasks, it could change the global balance in AI hardware.
Huawei isn’t just building chips—it’s building control over the AI future, one cluster at a time.
Related Articles
View AllAI & ML
Google Launches Gemma 4 for Faster, Offline Use
Google’s Gemma 4 brings a new era of AI by enabling fast, offline performance. Designed for efficiency, it allows developers to run advanced AI models without relying on cloud infrastructure.
AI & ML
ZeroGPT
Learn how ZeroGPT detects AI-generated content accurately. Explore its features, use cases, benefits, and best practices for writers, educators, and marketers.
AI & ML
Humanizer AI
humanizer AI refers to a set of tools and techniques that transform robotic, AI-generated text into natural, human-like writing. Furthermore, these tools restructure sentence patterns, vary word choices, and adjust tone to closely match human expression. Additionally, humanizer AI processes text by analyzing linguistic patterns that machines typically produce. After that, it rewrites the content to reflect how real people naturally communicate. As a result, the final output reads authentically and engages readers more effectively.
Trending Articles
The Role of Blockchain in Ethical AI Development
How blockchain technology is being used to promote transparency and accountability in artificial intelligence systems.
Top 5 DeFi Platforms
Explore the leading decentralized finance platforms and what makes each one unique in the evolving DeFi landscape.
How Blockchain Secures AI Data
Understand how blockchain technology is being applied to protect the integrity and security of AI training data.