The complicated astrophysics of galaxy formation affects our universe on large scales, from which we are also trying to infer the fundamental properties of the universe -- the 'cosmological parameters'. In order to make that inference accurately, we must take into account the effects of galaxy formation by marginalizing over the various ways they can be modeled and implemented, which stems from our still-limited understanding of the underlying processes and our inability to resolve all the relevant scales at once.
This can be done by running a large suite of simulations where different assumptions about that physics are made, and training machine learning models to do the marginalization. This is exactly what we do in the CAMELS project. We have generated a massive data set (over one petabyte) of thousands of cosmological simulations, which we have also made largely publicly available for anyone to use in their research.