Madgraph sometimes has widely-varying memory usage

Asked by Martin Habedank on 2020-08-27

Hi all,

we're trying to generate 2HDM+a model processes @NLO with Madgraph inclusively, scanning over a large range of parameter values (which is probably a bit unusual). During this, we noted that Madgraph has a widely-varying memory usage. This is slightly inconvenient for us, though no major obstacle, as it requires to request the maximal possibly used memory for each job. Therefore, we did a more dedicated study on this. I'm writing to report our findings in case you're interested or maybe have something to add that we overlooked.

(1) The test setup was to use only a single set of parameter values and a fixed random seed but repeating the event generation with Madgraph 2.7.2 600 times, the basic script being
set group_subprocesses Auto
set ignore_six_quark_processes False
set gauge unitary
set loop_optimized_output True
set complex_mass_scheme False
set automatic_html_opening False
import model ./Pseudoscalar_2HDM
define p = g u c d s u~ c~ d~ s~
define j = g u c d s u~ c~ d~ s~
define l+ = e+ mu+
define l- = e- mu-
define vl = ve vm vt
define vl~ = ve~ vm~ vt~
define p = p b b~
define j = j b b~
define allsm = t t~ z w+ w- b b~ h1
define allbsm = h2 h3 h+ h- h4
generate p p > allsm allbsm allbsm DMS=2 QCD=2
add process p p > allsm allsm allbsm DMS=2 QCD=2
add process p p > allbsm allbsm allbsm DMS=2 QCD=2
output mgevents
launch
shower=Pythia8
set maxjetflavor 5
set gPXd 1.0
set tanbeta 0.1
set sinbma 1.0
set lam3 3.0
set lap1 3.0
set lap2 3.0
set sinp 0.35
set Mxd 10.0
set mh1 1.250000e+02
set mh2 600
set mh3 600
set mhc 600
set mh4 196.42857142857144
set Wh1 Auto
set Wh2 Auto
set Wh3 Auto
set Whc Auto
set Wh4 Auto

We found that the maximally used memory indeed varies largely, from 3GB up to 11GB, and does not follow a clear (say, Gaussian) distribution. This was quite a surprise for us as, given the same used random seed, we'd have expected the procedure happening to be roughly the same for every run. Additionally, we found that maximum memory usage and runtime of the job are roughly anticorrelated. I uploaded a few plots for our findings to https://cernbox.cern.ch/index.php/s/rr5gB18MiPd44iv , in case you're interested.

(2) To double-check our expectations we conducted a similar test with a slimmer, non-inclusive script ([...] is exactly as before):
[...]
import model ./Pseudoscalar_2HDM
define p = g u c d s u~ c~ d~ s~ b b~
define allbsm = h2 h3 h+ h- h4
define tops = t t~
generate p p > t t~ allbsm DMS=4 QCD=4
add process p p > tops h1 allbsm DMS=4 QCD=4
[...]
Here, everything indeed meets our expectations better: The memory usage is approximately Gaussian-distributed with a spread of less than 100% (1.8GB to 3.2GB) instead of more than 300% as before. There's also no clear correlation between runtime and memory consumption anymore (see https://cernbox.cern.ch/index.php/s/UCGS2VUEyHYCm0J for details).

Lastly, as we considered the observations possibly being related to the answer in https://answers.launchpad.net/mg5amcnlo/+question/670762 , we reran (1) but including the option "set low_mem_multicore_nlo_generation" (https://cernbox.cern.ch/index.php/s/ElDiGZetsyvAMJZ). This does lead to a more pronounced peak of memory usage at low values (~5.3GB) but it does not reduce the spread (3GB to 10 GB) or the anticorrelation with the runtime significantly.

Please let me know if there's anything else you would want to know about this. We'd of course also be very happy about any other input you can give on this.

Thanks and cheers,
Martin

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
2020-09-14
Last query:
2020-09-14
Last reply:
2020-09-01

Hi,

Which type of memory are you talking about.
From what you describe you are looking at the /tmp (or where you configured your $TMP_DIR) disk space where madgraph read/writes large files.
And indeed those files can be large and the size of those files will be correlated with the speed of the run. (if need a small number of iteration to converge, the code is of course faster and we need to write less event on disk (each iteration nearly double the size of the file that it write).

Cheers,

Olivier

> On 27 Aug 2020, at 11:15, Martin Habedank <email address hidden> wrote:
>
> New question #692627 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/692627
>
> Hi all,
>
> we're trying to generate 2HDM+a model processes @NLO with Madgraph inclusively, scanning over a large range of parameter values (which is probably a bit unusual). During this, we noted that Madgraph has a widely-varying memory usage. This is slightly inconvenient for us, though no major obstacle, as it requires to request the maximal possibly used memory for each job. Therefore, we did a more dedicated study on this. I'm writing to report our findings in case you're interested or maybe have something to add that we overlooked.
>
> (1) The test setup was to use only a single set of parameter values and a fixed random seed but repeating the event generation with Madgraph 2.7.2 600 times, the basic script being
> set group_subprocesses Auto
> set ignore_six_quark_processes False
> set gauge unitary
> set loop_optimized_output True
> set complex_mass_scheme False
> set automatic_html_opening False
> import model ./Pseudoscalar_2HDM
> define p = g u c d s u~ c~ d~ s~
> define j = g u c d s u~ c~ d~ s~
> define l+ = e+ mu+
> define l- = e- mu-
> define vl = ve vm vt
> define vl~ = ve~ vm~ vt~
> define p = p b b~
> define j = j b b~
> define allsm = t t~ z w+ w- b b~ h1
> define allbsm = h2 h3 h+ h- h4
> generate p p > allsm allbsm allbsm DMS=2 QCD=2
> add process p p > allsm allsm allbsm DMS=2 QCD=2
> add process p p > allbsm allbsm allbsm DMS=2 QCD=2
> output mgevents
> launch
> shower=Pythia8
> set maxjetflavor 5
> set gPXd 1.0
> set tanbeta 0.1
> set sinbma 1.0
> set lam3 3.0
> set lap1 3.0
> set lap2 3.0
> set sinp 0.35
> set Mxd 10.0
> set mh1 1.250000e+02
> set mh2 600
> set mh3 600
> set mhc 600
> set mh4 196.42857142857144
> set Wh1 Auto
> set Wh2 Auto
> set Wh3 Auto
> set Whc Auto
> set Wh4 Auto
>
> We found that the maximally used memory indeed varies largely, from 3GB up to 11GB, and does not follow a clear (say, Gaussian) distribution. This was quite a surprise for us as, given the same used random seed, we'd have expected the procedure happening to be roughly the same for every run. Additionally, we found that maximum memory usage and runtime of the job are roughly anticorrelated. I uploaded a few plots for our findings to https://cernbox.cern.ch/index.php/s/rr5gB18MiPd44iv , in case you're interested.
>
> (2) To double-check our expectations we conducted a similar test with a slimmer, non-inclusive script ([...] is exactly as before):
> [...]
> import model ./Pseudoscalar_2HDM
> define p = g u c d s u~ c~ d~ s~ b b~
> define allbsm = h2 h3 h+ h- h4
> define tops = t t~
> generate p p > t t~ allbsm DMS=4 QCD=4
> add process p p > tops h1 allbsm DMS=4 QCD=4
> [...]
> Here, everything indeed meets our expectations better: The memory usage is approximately Gaussian-distributed with a spread of less than 100% (1.8GB to 3.2GB) instead of more than 300% as before. There's also no clear correlation between runtime and memory consumption anymore (see https://cernbox.cern.ch/index.php/s/UCGS2VUEyHYCm0J for details).
>
> Lastly, as we considered the observations possibly being related to the answer in https://answers.launchpad.net/mg5amcnlo/+question/670762 , we reran (1) but including the option "set low_mem_multicore_nlo_generation" (https://cernbox.cern.ch/index.php/s/ElDiGZetsyvAMJZ). This does lead to a more pronounced peak of memory usage at low values (~5.3GB) but it does not reduce the spread (3GB to 10 GB) or the anticorrelation with the runtime significantly.
>
> Please let me know if there's anything else you would want to know about this. We'd of course also be very happy about any other input you can give on this.
>
> Thanks and cheers,
> Martin
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Martin Habedank (habedama) said : #2

Hi,

thanks for your quick reply! I'm talking about the actual RAM the script uses (as far as I'm aware, that is what qsub labels as "maxrss"). This should not be the disk space used as we only became aware of that issue because qsub kept killing the jobs as the memory limit of 2GB was exceeded for the jobs. As far as I know, there's no limit on disk space used by jobs in our system as low as 2GB.
Regarding the correlation of memory/ disk space and runtime: Indeed, the behaviour you describe is what we expected as well. But what we see for case (1) is anticorrelation - the less memory the jobs use, the more time they need. And another thing that I don't understand here (sorry, if I'm being slow): Shouldn't the number of iterations to converge be same the for all runs if the same random number seed is used?

Thanks a lot for your help!
Cheers,
Martin

Are you sure that /tmp is not a ram disk filesystem?
In that case that file will actually use the RAM and you can easily hit the limit.

I have study recently the amount of RAM needed per job for MG5aMC and for each job submitted to the cluster (not counting the python controller and not the part of the code that creates the matrix-element/...) The result is that I have never exceed 100Mb per job and the memory was quite stable when running (which makes sense since we use static memory allocation in most of the code).

I'm starting to run it on my laptop to check how much RAM I used.

Cheers,

Olivier

> On 27 Aug 2020, at 12:15, Martin Habedank <email address hidden> wr
> ote:
>
> Question #692627 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/692627
>
> Martin Habedank posted a new comment:
> Hi,
>
> thanks for your quick reply! I'm talking about the actual RAM the script uses (as far as I'm aware, that is what qsub labels as "maxrss"). This should not be the disk space used as we only became aware of that issue because qsub kept killing the jobs as the memory limit of 2GB was exceeded for the jobs. As far as I know, there's no limit on disk space used by jobs in our system as low as 2GB.
> Regarding the correlation of memory/ disk space and runtime: Indeed, the behaviour you describe is what we expected as well. But what we see for case (1) is anticorrelation - the less memory the jobs use, the more time they need. And another thing that I don't understand here (sorry, if I'm being slow): Shouldn't the number of iterations to converge be same the for all runs if the same random number seed is used?
>
> Thanks a lot for your help!
> Cheers,
> Martin
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Martin Habedank (habedama) said : #4

Hi,

I wasn't sure about /tmp maybe being part of the ram disk filesystem so I double checked.

The output of "df -T /tmp" is
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/sda6 xfs 4184064 33592 4150472 1% /tmp

and the output of "mount | grep tmp" is
devtmpfs on /dev type devtmpfs (rw,nosuid,size=32842700k,nr_inodes=8210675,mode=755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
/dev/sda6 on /tmp type xfs (rw,nosuid,nodev,relatime,attr2,inode64,usrquota)
/dev/sda6 on /var/tmp type xfs (rw,nosuid,nodev,relatime,attr2,inode64,usrquota)

None of this points towards tmpfs for /tmp as far as I see, so I'm inclined to believe /tmp is on disk for us.

Cheers,
Martin

This sounds as a real filesystem indeed.

Just to be sure you let madgraph submit jobs on the pbs cluster, or you submit yourself a single core job on your cluster?

Cheers,

Olivier

> On 28 Aug 2020, at 09:55, Martin Habedank <email address hidden> wrote:
>
> Question #692627 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/692627
>
> Martin Habedank posted a new comment:
> Hi,
>
> I wasn't sure about /tmp maybe being part of the ram disk filesystem so
> I double checked.
>
> The output of "df -T /tmp" is
> Filesystem Type 1K-blocks Used Available Use% Mounted on
> /dev/sda6 xfs 4184064 33592 4150472 1% /tmp
>
> and the output of "mount | grep tmp" is
> devtmpfs on /dev type devtmpfs (rw,nosuid,size=32842700k,nr_inodes=8210675,mode=755)
> tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
> tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
> tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
> /dev/sda6 on /tmp type xfs (rw,nosuid,nodev,relatime,attr2,inode64,usrquota)
> /dev/sda6 on /var/tmp type xfs (rw,nosuid,nodev,relatime,attr2,inode64,usrquota)
>
> None of this points towards tmpfs for /tmp as far as I see, so I'm
> inclined to believe /tmp is on disk for us.
>
> Cheers,
> Martin
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Martin Habedank (habedama) said : #6

Hi,

I'm submitting the jobs myself as "qsub runpoint.sh" (we're using a GE cluster) where runpoint.sh basically contains

#! /bin/bash
mg5_aMC mg2hdm.sh

apart from setting up a couple of environment variables. Madgraph itself is not used for the submission.

Cheers,
Martin

Hi,

So you have configure your code to run in single core or in multi-core? (the default is to use all the core of the submitting machine).
I'm not a GE expert (I ran on those type of machines 10 years ago) but your command is likely to assign a single core for the job.
So if you have kept the mode of MG5aMC to multi-core and that you have a 64 core machine, then you will need ~4G of RAM to run all the 64 jobs at the same time (and the GE will kill your job).
Obviously since you ask only one core, it is likely that all the other core are used by other person (maybe even you) and therefore your over-subscription will only slow you down.

So if you want to use your cluster like that, you either have to pass in single-core mode.
Or to ask GE to have more core (and tell MG5aMC the number of core that you want to use in multi-core mode)

For your process (which is a lot of small/fast computation since this is mainly a search for dominant cross-section) using the cluster mode of MG5aMC (we have a sge and a ge mode) is likely not the most efficient since this will submit more than 500 jobs on the cluster (but everything runs in less than 20 min on a 4 core laptop with hyperthread)

Cheers,

Olivier

Cheers,

Olivier

Martin Habedank (habedama) said : #8

Hi Olivier,

thanks for making me aware of the different run modes of Madgraph! I hadn't changed the default settings so I spent the last couple of days trying different settings with a larger number of jobs and monitor their memory consumption. Namely, I tested including
set run_mode 2
set nb_core 8
in the Madgraph script and running qsub with "-pe multicore 8". Indeed, that improves the resource usage quite a lot: Memory consumption is now down to 1.1 to 1.7GB and run time between 130 and 330 minutes (for each of ~300 jobs).

On the other hand, running on a single core, i.e. with
set run_mode 0
set nb_core 1
in the Madgraph script uses 0.66 to 0.80GB and takes 370 - 530 minutes (for each of 60 jobs).

So you were right that the large memory consumption was caused by Madgraph trying to run on multiple cores while being forced by the batch system to run on a single one.
Thanks a lot for your help!

Cheers,
Martin

Martin Habedank (habedama) said : #9

Thanks Olivier Mattelaer, that solved my question.