Ascend 950 Orders Exceed 400,000 Units: Domestic AI Computing Reaches Critical Inflection Point

In April 2026, a significant development emerged in China’s AI computing sector: tech giants including ByteDance and Alibaba collectively placed orders for Huawei’s Ascend 950 chips, with total volume approaching 450,000 units and procurement value around 475 billion RMB. This figure represents over half of Huawei’s annual shipment plan, triggering a 20% price increase while demand still outstrips supply.

This isn’t merely a commercial transaction—it’s a critical signal that domestic AI computing is transitioning from isolated breakthroughs to a complete industry chain closure.

Behind the Mass Procurement Rush

Where do these 400,000 orders originate? The answer lies in two core driving factors.

LLMs have entered peak deployment. In 2026, mainstream large models including DeepSeek V4, ERNIE Bot, and Tongyi Qianwen are all expanding iterations, with inference and training computing demands growing exponentially. The Ascend 950PR focuses on inference scenarios while the 950DT targets training, perfectly covering the full LLM workflow.

Supply chain security has become essential. China’s computing market previously relied heavily on NVIDIA, but sustained export controls have made high-end chip supplies unstable and prices volatile. The Ascend 950—with full-stack domestic control from chip manufacturing to HBM memory to software frameworks—has become the core choice for tech giants to mitigate risks.

Orders concentrated immediately, creating supply-demand imbalance. Currently, the Ascend 950PR standard version costs approximately 50,000 RMB, with premium versions around 70,000 RMB. Even with the 20% price increase, orders remain booked through the second half of the year.

Performance Verification: Multiple Metrics Surpass Benchmarks

Major client endorsement isn’t enough—the chip’s own hard capabilities are paramount.

The Ascend 950PR is the world’s first mass-produced FP4 low-precision inference chip. Test data shows single-card FP4 computing power reaches 1.56P, while NVIDIA’s H20 delivers only 0.54P—a 2.87x performance gap. In practical terms, running the same LLM requires just one Ascend 950PR card, while the H20 needs three cards working in concert.

Ascend 950 chip close-up showing precision manufacturing
Ascend 950 chip close-up showing precision manufacturing

Self-developed HiBL 1.0 HBM memory represents another breakthrough. Its 112GB HBM capacity exceeds the H20’s 96GB, enabling direct deployment of 70B parameter models without splitting. Crucially, self-developed HBM breaks overseas monopoly while reducing costs by approximately 30%.

At the cluster level, Ascend 950 pairs with the Atlas 950 SuperPoD supernode architecture, supporting full interconnection of 8,192 chips. This architecture boosts computing utilization from traditional clusters’ 30%-40% to 70%-80%, increasing inference throughput 8-10x. DeepSeek V4 running on Ascend 950PR demonstrates 35x faster inference speed compared to NVIDIA chips while reducing energy consumption by 40%.

Manufacturing has also achieved breakthroughs. The Ascend 950 uses SMIC N+3 process (equivalent to 5nm) through MCM quad-die packaging technology, completely bypassing EUV lithography restrictions. Current computing die yield has reached over 80% with steadily climbing production capacity.

Ecosystem Closure: From “Usable” to “Usable and Efficient”

Hardware performance forms only the foundation—software ecosystem determines success.

At the end of 2025, Huawei announced full-stack open source of the CANN heterogeneous computing architecture, directly benchmarking against NVIDIA’s closed-source CUDA ecosystem. CANN 9.0 released in April 2026 provides comprehensive low-precision format support, reducing model migration costs from “months” to “hours” with compatibility rates exceeding 95%.

DeepSeek V4 represents a landmark event—this global top-tier large model achieved 100% Ascend 950PR adaptation with full-stack migration to Huawei’s CANN framework, completely departing from NVIDIA’s CUDA ecosystem. Currently, over 40 mainstream large models and 200+ open-source models have fully adapted to the Ascend ecosystem.

Modern AI data center interior with server corridors
Modern AI data center interior with server corridors

From industry applications, the Ascend 950 series has penetrated over 20 sectors including internet, finance, government affairs, and industrial internet. ByteDance and Alibaba use 950PR for recommendation systems, reducing latency by 50% and costs by 30%; major banks use 950PR for intelligent risk control, improving efficiency by 40%.

Gaps Remain: Closure is the Starting Point

We must acknowledge that domestic computing still lags behind international leading levels. Ascend 950’s single-card FP8 computing power reaches approximately 1 PFLOPS, while NVIDIA’s H200 delivers 4.5 PFLOPS; in manufacturing processes, SMIC’s N+3 equivalent 5nm still maintains a generational gap from TSMC’s 3nm.

Yet the catch-up path is clear. Huawei’s published roadmap shows: Q4 2026 launches Ascend 950DT for training, Q4 2027 introduces the double-computing-power Ascend 960, and Q4 2028 reveals the Blackwell-benchmark Ascend 970.

From 400,000 orders to full-stack ecosystem closure, domestic AI computing is undergoing a critical transition from “substitute product” to “preferred choice.” This process won’t happen overnight, but the trend has become irreversible.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *