arXiv Open Access 2024

Aspen Open Jets: Unlocking LHC Data for Foundation Models in Particle Physics

Oz Amram Luca Anzalone Joschka Birk Darius A. Faroughy Anna Hallin +5 lainnya
Lihat Sumber

Abstrak

Foundation models are deep learning models pre-trained on large amounts of data which are capable of generalizing to multiple datasets and/or downstream tasks. This work demonstrates how data collected by the CMS experiment at the Large Hadron Collider can be useful in pre-training foundation models for HEP. Specifically, we introduce the AspenOpenJets dataset, consisting of approximately 178M high $p_T$ jets derived from CMS 2016 Open Data. We show how pre-training the OmniJet-$α$ foundation model on AspenOpenJets improves performance on generative tasks with significant domain shift: generating boosted top and QCD jets from the simulated JetClass dataset. In addition to demonstrating the power of pre-training of a jet-based foundation model on actual proton-proton collision data, we provide the ML-ready derived AspenOpenJets dataset for further public use.

Penulis (10)

O

Oz Amram

L

Luca Anzalone

J

Joschka Birk

D

Darius A. Faroughy

A

Anna Hallin

G

Gregor Kasieczka

M

Michael Krämer

I

Ian Pang

H

Humberto Reyes-Gonzalez

D

David Shih

Format Sitasi

Amram, O., Anzalone, L., Birk, J., Faroughy, D.A., Hallin, A., Kasieczka, G. et al. (2024). Aspen Open Jets: Unlocking LHC Data for Foundation Models in Particle Physics. https://arxiv.org/abs/2412.10504

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓