UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network Architecture
Abstrak
As the Large-scale Language Models (LLMs) continue to scale, the requisite computational power and bandwidth escalate. To address this, we introduce UB-Mesh, a novel AI datacenter network architecture designed to enhance scalability, performance, cost-efficiency and availability. Unlike traditional datacenters that provide symmetrical node-to-node bandwidth, UB-Mesh employs a hierarchically localized nD-FullMesh network topology. This design fully leverages the data locality of LLM training, prioritizing short-range, direct interconnects to minimize data movement distance and reduce switch usage. Although UB-Mesh's nD-FullMesh topology offers several theoretical advantages, its concrete architecture design, physical implementation and networking system optimization present new challenges. For the actual construction of UB-Mesh, we first design the UB-Mesh-Pod architecture, which is based on a 4D-FullMesh topology. UB-Mesh-Pod is implemented via a suite of hardware components that serve as the foundational building blocks, including specifically-designed NPU, CPU, Low-Radix-Switch (LRS), High-Radix-Switch (HRS), NICs and others. These components are interconnected via a novel Unified Bus (UB) technique, which enables flexible IO bandwidth allocation and hardware resource pooling. For networking system optimization, we propose advanced routing mechanism named All-Path-Routing (APR) to efficiently manage data traffic. These optimizations, combined with topology-aware performance enhancements and robust reliability measures like 64+1 backup design, result in 2.04x higher cost-efficiency, 7.2% higher network availability compared to traditional Clos architecture and 95%+ linearity in various LLM training tasks.
Penulis (34)
Heng Liao
Bingyang Liu
Xianping Chen
Zhigang Guo
Chuanning Cheng
Jianbing Wang
Xiangyu Chen
Peng Dong
Rui Meng
Wenjie Liu
Zhe Zhou
Ziyang Zhang
Yuhang Gai
Cunle Qian
Yi Xiong
Zhongwu Cheng
Jing Xia
Yuli Ma
Xi Chen
Wenhua Du
Shizhong Xiao
Chungang Li
Yong Qin
Liudong Xiong
Zhou Yu
Lv Chen
Lei Chen
Buyun Wang
Pei Wu
Junen Gao
Xiaochu Li
Jian He
Shizhuan Yan
Bill McColl
Akses Cepat
- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓