arXiv Open Access 2025

UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network Architecture

Heng Liao Bingyang Liu Xianping Chen Zhigang Guo Chuanning Cheng +29 lainnya
Lihat Sumber

Abstrak

As the Large-scale Language Models (LLMs) continue to scale, the requisite computational power and bandwidth escalate. To address this, we introduce UB-Mesh, a novel AI datacenter network architecture designed to enhance scalability, performance, cost-efficiency and availability. Unlike traditional datacenters that provide symmetrical node-to-node bandwidth, UB-Mesh employs a hierarchically localized nD-FullMesh network topology. This design fully leverages the data locality of LLM training, prioritizing short-range, direct interconnects to minimize data movement distance and reduce switch usage. Although UB-Mesh's nD-FullMesh topology offers several theoretical advantages, its concrete architecture design, physical implementation and networking system optimization present new challenges. For the actual construction of UB-Mesh, we first design the UB-Mesh-Pod architecture, which is based on a 4D-FullMesh topology. UB-Mesh-Pod is implemented via a suite of hardware components that serve as the foundational building blocks, including specifically-designed NPU, CPU, Low-Radix-Switch (LRS), High-Radix-Switch (HRS), NICs and others. These components are interconnected via a novel Unified Bus (UB) technique, which enables flexible IO bandwidth allocation and hardware resource pooling. For networking system optimization, we propose advanced routing mechanism named All-Path-Routing (APR) to efficiently manage data traffic. These optimizations, combined with topology-aware performance enhancements and robust reliability measures like 64+1 backup design, result in 2.04x higher cost-efficiency, 7.2% higher network availability compared to traditional Clos architecture and 95%+ linearity in various LLM training tasks.

Topik & Kata Kunci

Penulis (34)

H

Heng Liao

B

Bingyang Liu

X

Xianping Chen

Z

Zhigang Guo

C

Chuanning Cheng

J

Jianbing Wang

X

Xiangyu Chen

P

Peng Dong

R

Rui Meng

W

Wenjie Liu

Z

Zhe Zhou

Z

Ziyang Zhang

Y

Yuhang Gai

C

Cunle Qian

Y

Yi Xiong

Z

Zhongwu Cheng

J

Jing Xia

Y

Yuli Ma

X

Xi Chen

W

Wenhua Du

S

Shizhong Xiao

C

Chungang Li

Y

Yong Qin

L

Liudong Xiong

Z

Zhou Yu

L

Lv Chen

L

Lei Chen

B

Buyun Wang

P

Pei Wu

J

Junen Gao

X

Xiaochu Li

J

Jian He

S

Shizhuan Yan

B

Bill McColl

Format Sitasi

Liao, H., Liu, B., Chen, X., Guo, Z., Cheng, C., Wang, J. et al. (2025). UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network Architecture. https://arxiv.org/abs/2503.20377

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓