Lattice QCD Calculation and Optimization on ARM Processors
Abstrak
Lattice quantum chromodynamics(lattice QCD) is one of the most important applications of large-scale parallel computing in high energy physics,researches in this field usually consume a large amount of computing resources,and its core is to solve the large scale sparse linear equations.Based on the domestic Kunpeng 920 ARM processor,this paper studies the hot spot of lattice QCD calculation,the Dslash,which is applied on up to 64 nodes(6 144 cores) and show the linear scalability.Based on the roofline performance analysis model,we find that lattice QCD is a typical memory bound application,and by using the compression of 3×3 complex unitary matrices in Dslash based on symmetry,we can improve the performance of Dslash by 22%.For the solving of large scale sparse linear equations,we also explore the usual Krylov subspace iterative algorithm such as BiCGStab and the newly developed state-of-art multigrid algorithm on the same ARM processor,and find that in the practical physics calculation the multigrid algorithm is several times to a magnitude faster than BiCGStab,even including the multigrid setup time.Moreover,we consider the NEON vectorization instructions on Kunpeng 920,and there is up to 20% improvement for multigrid algorithm.Therefore,the use of multigrid algorithm on ARM processors can speed up the physics research tremendously.
Topik & Kata Kunci
Penulis (1)
SUN Wei, BI Yujiang, CHENG Yaodong
Akses Cepat
- Tahun Terbit
- 2023
- Sumber Database
- DOAJ
- DOI
- 10.11896/jsjkx.230200159
- Akses
- Open Access ✓