large runtime dependence on $gran in MG5 v261

Asked by Junho Choi on 2018-11-29

Dear authors,

Hellp:) I am Junho, a cms guy.

While I compared the v260 and v261 versions, I found that MG5 v261 has very large dependence on # of cores when generating events.

Let me explain how I tested it:

(1) I made gridpacks of p p > l+ l- 01234j,(DrellYan addtional partons upto 4 partons) - you can find run/proc_cards here :http://147.47.242.72/USER/jhchoi/generator_group/slow_mg261/dy4j_card/

Please note that the gridpacks are completely independent on CMSSW. I used MG5 only.

(2) Test the runtime using run.sh with

nevent = 100 / 500
ncpu(=$gran) = 1/10/30

The result is here : http://147.47.242.72/USER/jhchoi/generator_group/slow_mg261/slow_mg261_dy4j_launchpad.pdf

(3) You can see that to generate500 events,

v260 needs 111/106/417 sec using 30/10/1 cores
v261 needs 95/672/14124 sec using 30/10/1 cores

The runtime of v261 increases drastically as # of cores decreases.

(The tests were done in the same server and the same conditions.)

I wonder this issue is an expected thing and already known to MG authors OR not.

If it is an expected one, what is recomended value for "$gran"?

We want to use efficient way and #of cpu for our sample generations.

Thanks for reading!

Best regards,

Junho Choi

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
2018-12-12
Last reply:
15 hours ago

This question was reopened

Hi,

gridpack run ONLY on a single core (that's the idea of a gridpack).
You can now set gridpack to readonly if you want to run mutiple times the same gridpack.
(and in some sense run in a way similar to a multi-core gridpack)

Therefore, we do not have any parameter ncpu.
We do have a parameter granularity, but this has nothing to do with the number of core.
This parameter indeed has some impact on the speed of the computation by playing with the selection
of the channel.

in gridpack mode, we first generate the number of events that should be generated by a given channel of integration. If that number is below granularity then we sometimes include the channel and sometimes not.
(in a way where combining sample does not lead to any bias).

In the past, the default of that granularity was 10, but CMS experiment complains about this because even if the sample is NOT bias, the statistical uncertainty of the sample is slightly off due to this process. To get back to the typical statistical uncertainty, we move the default value to one.

Now conerning the difference of speed between 2.6.0 and 2.6.1, this is something that I would need to investigate.

Cheers,

Olivier

> On 29 Nov 2018, at 14:07, Junho Choi <email address hidden> wrote:
>
> New question #676453 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/676453
>
> Dear authors,
>
> Hellp:) I am Junho, a cms guy.
>
> While I compared the v260 and v261 versions, I found that MG5 v261 has very large dependence on # of cores when generating events.
>
> Let me explain how I tested it:
>
> (1) I made gridpacks of p p > l+ l- 01234j,(DrellYan addtional partons upto 4 partons) - you can find run/proc_cards here :http://147.47.242.72/USER/jhchoi/generator_group/slow_mg261/dy4j_card/
>
> Please note that the gridpacks are completely independent on CMSSW. I used MG5 only.
>
> (2) Test the runtime using run.sh with
>
> nevent = 100 / 500
> ncpu(=$gran) = 1/10/30
>
>
>
> The result is here : http://147.47.242.72/USER/jhchoi/generator_group/slow_mg261/slow_mg261_dy4j_launchpad.pdf
>
> (3) You can see that to generate500 events,
>
> v260 needs 111/106/417 sec using 30/10/1 cores
> v261 needs 95/672/14124 sec using 30/10/1 cores
>
> The runtime of v261 increases drastically as # of cores decreases.
>
> (The tests were done in the same server and the same conditions.)
>
>
>
> I wonder this issue is an expected thing and already known to MG authors OR not.
>
> If it is an expected one, what is recomended value for "$gran"?
>
> We want to use efficient way and #of cpu for our sample generations.
>
>
>
> Thanks for reading!
>
> Best regards,
>
> Junho Choi
>
>
>
>
>
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Junho Choi (junhochoi) said : #2

I misunderstood $gran as #cores.

My question must be corrected #cores to $gran, so I changed my question.

Thanks for your answer!

Best,

Junho

Junho Choi (junhochoi) said : #3

Dear Olivier,

I also have one more thing to tell you.

when I ran "run.sh 1000 10 1" in v261, some of constructors needs additional definitions :

======part of my log====
P1_qq_mupmumg
P1_qq_taptamg
P0_qq_epem
P0_qq_mupmum
P0_qq_taptam
/data7/Users/jhchoi/gridpack_val/slow_dy4j/standalone_gridpacks/v261/madevent/bin/internal/gen_ximprove.py:800: DeprecationWarning: object.__new__() takes no parameters
  return super(gen_ximprove, cls).__new__(gen_ximprove_gridpack, cmd, opt)
/data7/Users/jhchoi/gridpack_val/slow_dy4j/standalone_gridpacks/v261/madevent/bin/internal/gen_ximprove.py:810: DeprecationWarning: object.__init__() takes no parameters
  super(gen_ximprove, self).__init__(cmd, opt)
WRITE GRIDCARD /data7/Users/jhchoi/gridpack_val/slow_dy4j/standalone_gridpacks/v261/madevent
DONE
write ./events.lhe.gz
1543485645
runtime : 73757 sec

=========================

Junho Choi (junhochoi) said : #4

Dear Olivier,

Hello :)

Have you found what the reason of this issue is OR any hints?

Best regards,

Junho Choi

Hi Junho,

I did not had the time to look at it yet.
It was actually not correctly tagged in my todo list. (it is now)

The only hint I can provide now, is that they are a lot of randomness in such generation.
The first step is to randomly select the channel of integration that you are going to run. This is based on the number of events expected for each channel. if that number is lower than ngran, then we have the probability nexpect/ngran to choose the channel.

Therefore I would expect to have a huge variance in the running time between the gridpack.
(depending of how much the largest multiplicity sample is used/...)
In top of that, the variance of a given channel is already a factor 4 in madgraph.
(adding an iteration in madgraph doubles the computation times, and the variance in the number of iteration required is typically 2)

So this to say that I'm not worried by your timing result so far (and this is therefore low priority for me).
It would be better to compare 20 runs and therefore have an indicative variance between the runs.

Cheers,

Olivier

Junho Choi (junhochoi) said : #6

Dear Olivier,

Okay, I'll show you a new result with much high precision.

Junho Choi (junhochoi) said : #7

Dear Olivier,

Please find the link below :

http://147.47.242.72/USER/jhchoi/generator_group/slow_mg261/jhchoi_181212_slow_dy4j_mg261_launchpad.pdf

I checked the runtime of v261 is much larger than v260.

Could you check this issue again? OR could you cross check this?

Best regards,

Junho Choi

Hi,

Thanks to have spot this.
I have just pushed a patch:
https://bazaar.launchpad.net/~mg5core1/mg5amcnlo/2.6.5/revision/289

In addition to fixing some real bug --no bias in the results can be related to those--, I have also play with some internal parameter who do have impact on the speed. (and some of them were modified from 2.6.0 to 2.6.1) .

I still have to double check that the speed in the "read-only" mode is not impacted and running nicely.

Cheers,

Olivier

Can you help with this problem?

Provide an answer of your own, or ask Junho Choi for more information if necessary.

To post a message you must log in.