arXiv Open Access 2025

PAT: a new algorithm for all-gather and reduce-scatter operations at scale

Sylvain Jeaugey
Lihat Sumber

Abstrak

This paper describes a new algorithm called PAT, for Parallel Aggregated Trees, and which can be used to implement all-gather and reduce-scatter operations. This algorithm works on any number of ranks, has a logarithmic number of network transfers for small size operations, minimizes long-distance communication, and requires a logarithmic amount of internal buffers, independently from the total operation size. It is aimed at improving the performance of the NCCL library in cases where the ring algorithm would be inefficient, as its linear latency would show poor performance for small sizes and/or at scale.

Topik & Kata Kunci

Penulis (1)

S

Sylvain Jeaugey

Format Sitasi

Jeaugey, S. (2025). PAT: a new algorithm for all-gather and reduce-scatter operations at scale. https://arxiv.org/abs/2506.20252

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓