gridpack readonly parrallel: ask for 10K get 40K

Asked by Fady Bishara on 2021-05-28

Hello,

I setup a gridpack and wanted to run it in parallel so I followed the instructions in: https://answers.launchpad.net/mg5amcnlo/+faq/3276

However, when I counted the number of events produced, I got 40K rather than the 10K I asked for. When I asked for 100 or 1000, I got the correct number and when I asked for 9999 I got 3 times that.

To double check, I ran the gridpack in read/write mode and got 10K when I asked for 10K.

This bug occurred on v2.9.3 and the latest v3.1.0. Also, I didn't set the granularity explicitly.

Do you have any idea why this would be happening?

Best,
Fady

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
Last query:
Last reply:

This question was reopened

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Are you 100% sure?
Never heard anything like that and no idea what could cause that.
I have tested for p p > t t~ j and have 10k as expected.

Olivier

> On 28 May 2021, at 16:55, Fady Bishara <email address hidden> wrote:
>
> New question #697298 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/697298
>
> Hello,
>
> I setup a gridpack and wanted to run it in parallel so I followed the instructions in: https://answers.launchpad.net/mg5amcnlo/+faq/3276
>
> However, when I counted the number of events produced, I got 40K rather than the 10K I asked for. When I asked for 100 or 1000, I got the correct number and when I asked for 9999 I got 3 times that.
>
> To double check, I ran the gridpack in read/write mode and got 10K when I asked for 10K.
>
> This bug occurred on v2.9.3 and the latest v3.1.0
>
> Do you have any idea why this would be happening?
>
> Best,
> Fady
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Fady Bishara (fbishara) said :
#2

Hi Olivier,

Thanks for the quick reply; yes, I'm sure. I generated p p > mu+ mu-. Here is a link to the gridpack in case that helps: https://desycloud.desy.de/index.php/s/MbLkkgq9MK35xDH

Cheers,
Fady

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

Which seed did you ask to run the exact same run?

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#4

Sorry for the double message,
But I have issue because of lhapdf with your gridpack, would it possible to do a run without lhapdf?
I doubt that this has anything to do with test but it will simplify for me to check your gridpack.

Thanks,

Olivier

Revision history for this message
Fady Bishara (fbishara) said :
#5

I used seed 1234567

Here is a gridpack without lhapdf: https://desycloud.desy.de/index.php/s/EzqyXCtBHeyHeLA

 I verified that it has the same problem on the original machine (Ubuntu 20.04 with Python 3.8.8)

However, when I downloaded it on my laptop also running Ubuntu 20.04 but with Python 3.7.6, I got the correct number of events. Strange.

Then, I made an environment on the original machine with the same python version (3.7.6) and, lo and behold, I got the correct number of events!

So, the problem is due to the version of python 3! Furthermore, the number of generated events was even larger (a factor of 8) when I asked for 20K events with Python 3.8.8.

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#6

I do not reproduce it with either python2.7,3.8.5 or 3.9

So I'm not sure what I can do here if it is that specific to a given version of python (at the last digit).
Then the best is probably to avoid that version of python.

Cheers,

Olivier

> On 28 May 2021, at 21:50, Fady Bishara <email address hidden> wrote:
>
> Question #697298 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/697298
>
> Status: Answered => Solved
>
> Fady Bishara confirmed that the question is solved:
> I used seed 1234567
>
> Here is a gridpack without lhapdf:
> https://desycloud.desy.de/index.php/s/EzqyXCtBHeyHeLA
>
> I verified that it has the same problem on the original machine (Ubuntu
> 20.04 with Python 3.8.8)
>
> However, when I downloaded it on my laptop also running Ubuntu 20.04 but
> with Python 3.7.6, I got the correct number of events. Strange.
>
> Then, I made an environment on the original machine with the same python
> version (3.7.6) and, lo and behold, I got the correct number of events!
>
> So, the problem is due to the version of python 3! Furthermore, the
> number of generated events was even larger (a factor of 8) when I asked
> for 20K events with Python 3.8.8.
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Fady Bishara (fbishara) said :
#7

Hi Olivier, thanks for checking, there is no need to do anything. I marked the question as solved already. And, indeed, I'm just running in a 3.7 env and avoiding the version that caused this strange behavior. Good to know that other 3..8 subversions and 3.9 don't have this problem.

Revision history for this message
Fady Bishara (fbishara) said (last edit ):
#8

Hello again, unfortunately the problem persists. I made a rookie mistake and changed two things at once: the python version *and* set `use_syst = False`

The python version is a red herring, it has nothing to do with the problem as your check showed.

The problem seems to be with the `systematics.py` module. If I set `use_syst = True` and ask for 10k events, I get 40k.

However, if I set `use_syst = False` and ask for 10k events, I get 10k.

But I do need the scale variations so I tried running with `use_syst = False` and then running SysCalc (the c++ module) afterwards because `systematics.py` won't run a posteriori if the run was generated with `use_syst = False`, apparently (probably I'm doing something wrong?).

On the other hand, the problem with SysCalc is that it writes a mismatch (delta = +1) between the number of weights it thinks are there and the number of weights it actually writes which causes HepMC to throw an error. (I am using HepMC2, by the way).

Do you have any suggestions as to what would be the best path forward? Perhaps there is simple way to run `systematics.py` after the fact hoping that the spurious number of events happen again? Is there an easy way to make it write a different LHE file with the systematics?

P.S. the gridpack-in-parallel option works really well and is very efficient for generating a large sample on a modern multicore machine; I just wanted to say thanks!

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#9

Hi,

Let me do a couple of comments

1) SysCalc is not developed by us and as far as i know is not
maintained anymore. You can try to contact the author (alexis) if you want to give it a try.

2) gridpack readonly is not designed for multi-processing. So I have never check if systematics.py is working, the likely answer is no. But you should be able to run those outside of the readonly gridpack.

So I will test with
use_syst = T
systematics_program = none
to see if I reproduce your issue now.

If you want to run SysCalc and/or systematics.py it is important that you run with use_syst=T otherwise you will not have enough information in the lhef file for computing the uncertainty

Cheers,

Olivier

> On 1 Jun 2021, at 10:35, Fady Bishara <email address hidden> wrote:
>
> Question #697298 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/697298
>
> Status: Solved => Open
>
> Fady Bishara is still having a problem:
> Hello again, unfortunately the problem persists. I made a rookie mistake
> and changed two things at once: the python version *and* set `use_syst =
> False`
>
> The python version is a red herring, it has nothing to do with the
> problem as your check showed.
>
> The problem seems to be with the `systematics.py` module. If I set
> `use_syst = True` and ask for 10k events, I get 40k.
>
> However, if I set `use_syst = False` and ask for 10k events, I get 10k.
>
> But I do need the scale variations so I tried running with `use_syst =
> False` and then running SysCalc (the c++ module) afterwards because
> `systematics.py` won't run a posteriori if the run was generated with
> `use_syst = False`, apparently (probably I'm doing something wrong?).
>
> On the other hand, the problem with SysCalc is that it writes a mismatch
> (delta = +1) between the number of weights it thinks are there and the
> number of weights it actually writes which causes HepMC to throw an
> error. (I am using HepMC2, by the way).
>
> Do you have any suggestions as to what would be the best path forward?
> Perhaps there is simple way to run `systematics.py` after the fact
> hoping that the spurious number of events happen again? Is there an easy
> way to make it write a different LHE file with the systematics?
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Best Olivier Mattelaer (olivier-mattelaer) said :
#10

Ok,

Actually looks like systematics is run automatically (forgot about that), and even if systematics_program is on none or syscalc.
This is indeed the systematics program that duplicate events.

You will have the associated patch for those two issues here:
https://bazaar.launchpad.net/~maddevelopers/mg5amcnlo/LTS_dev/revision/315

Cheers,

Olivier

Revision history for this message
Fady Bishara (fbishara) said :
#11

Thanks Olivier Mattelaer, that solved my question.