arXiv Open Access 2024

HiCCL: A Hierarchical Collective Communication Library

Mert Hidayetoglu Simon Garcia de Gonzalo Elliott Slaughter Pinku Surana Wen-mei Hwu +2 lainnya
Lihat Sumber

Abstrak

HiCCL (Hierarchical Collective Communication Library) addresses the growing complexity and diversity in high-performance network architectures. As GPU systems have envolved into networks of GPUs with different multilevel communication hierarchies, optimizing each collective function for a specific system has become a challenging task. Consequently, many collective libraries struggle to adapt to different hardware and software, especially across systems from different vendors. HiCCL's library design decouples the collective communication logic from network-specific optimizations through a compositional API. The communication logic is composed using multicast, reduction, and fence primitives, which are then factorized for a specified network hieararchy using only point-to-point operations within a level. Finally, striping and pipelining optimizations applied as specified for streamlining the execution. Performance evaluation of HiCCL across four different machines$\unicode{x2014}$two with Nvidia GPUs, one with AMD GPUs, and one with Intel GPUs$\unicode{x2014}$demonstrates an average 17$\times$ higher throughput than the collectives of highly specialized GPU-aware MPI implementations, and competitive throughput with those of vendor-specific libraries (NCCL, RCCL, and OneCCL), while providing portability across all four machines.

Topik & Kata Kunci

Penulis (7)

M

Mert Hidayetoglu

S

Simon Garcia de Gonzalo

E

Elliott Slaughter

P

Pinku Surana

W

Wen-mei Hwu

W

William Gropp

A

Alex Aiken

Format Sitasi

Hidayetoglu, M., Gonzalo, S.G.d., Slaughter, E., Surana, P., Hwu, W., Gropp, W. et al. (2024). HiCCL: A Hierarchical Collective Communication Library. https://arxiv.org/abs/2408.05962

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓