On average, during Run 2 of the Large Hadron Collider (LHC), 30-50 simultaneous vertices yielding charged and neutral showers, otherwise known as pileup, were recorded per event. This number is expected to only increase at the High Luminosity LHC with predicted values as high as 200. As such, pileup presents a salient problem that, if not checked, hinders the search for new physics as well as Standard Model precision measurements such as jet energy, jet substructure, missing momentum, and lepton isolation. The existing state-of-the-art pileup mitigation strategies seek to label pileup on a constituent particle basis. One such methodology is the foundation for this work known as Training Optimal Transport using Attention Learning (TOTAL, arXiv:2211.02029). The TOTAL methodology relies on the use of a transformer architecture using a loss function inspired by optimal transport problems to learn full event topologies. By comparing matched events with and without pileup added, the TOTAL network robustly learns pileup as a transport function, which can be used to reject pileup constituents. In this work, we improve upon the existing TOTAL methodology by reducing its necessary supervision. By no longer requiring the events with and without pileup to be directly matched, we can work in a weakly-supervised context comparing real data events with high and low pileup. Despite the reduced supervision, our work still outperforms existing conventional pileup mitigation approaches. Such an extension of the TOTAL methodology would allow for more robust pileup mitigation, less reliant on simulations, as well as the possibility of online pileup mitigation.
Host: Youqi Song