generation error

Asked by Li on 2018-05-29

Hi,

    I try to generate pp->z z jb jb, the jb = j b b~, and I set z pt>400, jb PT >400 and the maxjetflavor=5. But I met several errors.

    First in one cluster with both MadGraph version 2.6.1 and 2.6.2, the event number is wrong ( in the root file, the parton level result seems good ).

    Then I use another cluster with MadGraph version 2.6.1, the error is:

At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
Fortran runtime error: Connection timed out
At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
Fortran runtime error: Connection timed out
At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
Fortran runtime error: Connection timed out
At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
Fortran runtime error: Connection timed out
At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
Fortran runtime error: Connection timed out
At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
Fortran runtime error: Connection timed out
At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
Fortran runtime error: Connection timed out
At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
Fortran runtime error: Connection timed out
At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
Fortran runtime error: Connection timed out
At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
Fortran runtime error: Connection timed out
rm: cant remove "results.dat": no such file or directory
ERROR DETECTED

     And sometimes ( when the event number = 100k ), the error says something like "no enough space" ( I think it's during the fortran complation)

     Last, in my pc, it generates extremely slow. like,

INFO: Idle: 152, Running: 8, Completed: 0 [ current time: 22h48 ]

INFO: Idle: 151, Running: 8, Completed: 1 [ 1h 58m ]

    So do you have any idea about this? How can I solve this?

Best,
Li

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
2018-05-30
Last query:
2018-05-30
Last reply:
2018-05-30

Hi,

> , the
> event number is wrong ( in the root file, the parton level result seems
> good ).

This sounds then a problem of the lhe to root converter that you are using.
If you have the correct number of events in the lhe file, then please contact the author(s) of the tool that you use to create such root file (we do not produce root file ourself)

> Then I use another cluster with MadGraph version 2.6.1, the error
> is:
>
> At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
> Fortran runtime error: Connection timed out

That seems to be a automount error. My guess is that your /tmp (or equivalent) disk is on automount and that it fails to mount. You should contact your IT guys to have him fix that (or change your environment to setup another directory for temporary file.

> And sometimes ( when the event number = 100k ), the error says
> something like "no enough space" ( I think it's during the fortran
> complation)

This sounds as a quota issue (and/or not enough space left on disk)

     Last, in my pc, it generates extremely slow. like,
>
>
> INFO: Idle: 152, Running: 8, Completed: 0 [ current time: 22h48 ]
>
> INFO: Idle: 151, Running: 8, Completed: 1 [ 1h 58m ]

A very slow code sounds to indicate issue in your cut.
How did you set your cuts?
You did not mention angular separation cut between your jet.
Did you use the default one of the run_card?

Also I've run the following (on my laptop)
> define j = j b b~
> generate p p > j j z z
> output
> launch
> set maxjetflavor 5
> set ptj 400
> set pt_min_pdg {23:400}
#so drjj is kept to default

and get the following:
> INFO: P1_qq_bbxzz
> INFO: Idle: 30, Running: 8, Completed: 0 [ current time: 22h53 ]
> INFO: Idle: 29, Running: 8, Completed: 1 [ 0.75s ]
> INFO: Idle: 27, Running: 8, Completed: 3 [ 3.8s ]
> INFO: Idle: 24, Running: 8, Completed: 6 [ 6.9s ]

so much less jobs and much faster.
This might indicates a model issue. Do you have all the quark mass set to 0 (as you should)
If that's the case, you might be interested in
https://answers.launchpad.net/mg5amcnlo/+faq/2312

Cheers,

Olivier

PS: Looks like you are using some weird setup for 4 and 5 flavor is that on purpose?

> On 29 May 2018, at 21:47, Li <email address hidden> wrote:
>
> Question #669754 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/669754
>
> Description changed to:
> Hi,
>
> I try to generate pp->z z jb jb, the jb = j b b~, and I set z
> pt>400, jb PT >400 and the maxjetflavor=5. But I met several errors.
>
> First in one cluster with both MadGraph version 2.6.1 and 2.6.2, the
> event number is wrong ( in the root file, the parton level result seems
> good ).
>
> Then I use another cluster with MadGraph version 2.6.1, the error
> is:
>
> At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
> Fortran runtime error: Connection timed out
> At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
> Fortran runtime error: Connection timed out
> At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
> Fortran runtime error: Connection timed out
> At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
> Fortran runtime error: Connection timed out
> At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
> Fortran runtime error: Connection timed out
> At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
> Fortran runtime error: Connection timed out
> At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
> Fortran runtime error: Connection timed out
> At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
> Fortran runtime error: Connection timed out
> At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
> Fortran runtime error: Connection timed out
> At line 279 of file rw_events.f (unit = 25, file = 'events.lhe')
> Fortran runtime error: Connection timed out
> rm: cant remove "results.dat": no such file or directory
> ERROR DETECTED
>
> And sometimes ( when the event number = 100k ), the error says
> something like "no enough space" ( I think it's during the fortran
> complation)
>
> Last, in my pc, it generates extremely slow. like,
>
> INFO: Idle: 152, Running: 8, Completed: 0 [ current time: 22h48 ]
>
> INFO: Idle: 151, Running: 8, Completed: 1 [ 1h 58m ]
>
>
> So do you have any idea about this? How can I solve this?
>
> Best,
> Li
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Li (huangli-itp) said : #2

Hi Olivier,

    Thanks very much!

 > This sounds then a problem of the lhe to root converter that you are using.

     I used the Delphes installed in MG5, and I checked the lhe event number ( it said it refine the number to 100k so I didn't check it ), I generate 100k, but when I use:
cat unweighted_events.lhe | grep event | wc -l
    I only get 12756, so that the event number is lost in lhe level.

> This sounds as a quota issue (and/or not enough space left on disk)

    Thanks for that

and get the following:
> INFO: P1_qq_bbxzz
> INFO: Idle: 30, Running: 8, Completed: 0 [ current time: 22h53 ]
> INFO: Idle: 29, Running: 8, Completed: 1 [ 0.75s ]
> INFO: Idle: 27, Running: 8, Completed: 3 [ 3.8s ]
> INFO: Idle: 24, Running: 8, Completed: 6 [ 6.9s ]

    I used the same commands as yours. And it runs fast in this step, but,

INFO: Idle: 28, Running: 8, Completed: 2 [ current time: 23h30 ]
INFO: Idle: 27, Running: 8, Completed: 3 [ 0.92s ]
INFO: Idle: 24, Running: 8, Completed: 6 [ 5s ]
INFO: Idle: 23, Running: 8, Completed: 7 [ 23.2s ]
INFO: Idle: 21, Running: 8, Completed: 9 [ 26.4s ]
INFO: Idle: 17, Running: 8, Completed: 13 [ 31.4s ]
INFO: Idle: 16, Running: 8, Completed: 14 [ 37.8s ]
INFO: Idle: 13, Running: 8, Completed: 17 [ 45.6s ]
INFO: Idle: 9, Running: 8, Completed: 21 [ 49.9s ]
INFO: Idle: 8, Running: 8, Completed: 22 [ 53.1s ]
INFO: Idle: 4, Running: 8, Completed: 26 [ 1m 0s ]
INFO: Idle: 2, Running: 8, Completed: 28 [ 1m 4s ]
INFO: Idle: 0, Running: 8, Completed: 30 [ 1m 7s ]
INFO: Idle: 0, Running: 3, Completed: 35 [ 1m 14s ]
INFO: Idle: 0, Running: 2, Completed: 36 [ 1m 17s ]
INFO: Idle: 0, Running: 0, Completed: 38 [ 1m 20s ]
INFO: End survey
refine 100000
Creating Jobs
INFO: Refine results to 100000
INFO: Generating 100000.0 unweigthed events.
INFO: Effective Luminosity 3893175.24954 pb^-1
INFO: need to improve 64 channels
Current estimate of cross-section: 0.0308231693434 +- 0.000496561906993
    P1_qq_zzqq
    P1_gg_zzbbx
    P1_gg_zzqq
    P1_gq_zzgq
    P1_qq_zzgg
    P1_qq_zzbbx
INFO: Idle: 153, Running: 8, Completed: 0 [ current time: 23h31 ]

    Then it runs extremely slow.

    I afraid this is the same problem: https://answers.launchpad.net/mg5amcnlo/+question/257133.

    And I guess the problem may, in the first cluster it doesn't compile it right so it loss a lot of events, and in the second cluster, it try to compile right but no enough space.

Best,
Li

Li (huangli-itp) said : #3

Hi Olivier,

    And the model file is the default sm in MG5.

Best,
Li

Hi,

OK so everything is fine on the first cluster, you just hit the maximum number of events that we can generate in one go for such kind of process. I would suggest to use the multi_run method to reach your 100k sample.
(I would ask 10 run of 10k events)

> I afraid this is the same problem:
> https://answers.launchpad.net/mg5amcnlo/+question/257133.

Actually are you using 2.6.2? We have changed one part of our phase-space integrator which was continuing to improve that situation. If you do not use 2.6.2 this is something that you should look at.
In my tests, the number of generated events for such type of process was multiplied by three compare to 2.6.1.

> Then it runs extremely slow.

On that part, the more events you ask the slower the code is (and the number of jobs is higher).
On my laptop that stage (for 10k events) was done in 40 min. Using the multi_run method should solve that issue as well. (Since you linearize the problem and avoid wasting cpu time due to the fact that many channel geneterate events correctly for 100k but all those need to be wasted due to a single channel who does not.

Cheers,

Olivier

> On 30 May 2018, at 05:46, Li <email address hidden> wrote:
>
> Question #669754 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/669754
>
> Status: Answered => Open
>
> Li is still having a problem:
> Hi Olivier,
>
> Thanks very much!
>
>> This sounds then a problem of the lhe to root converter that you are
> using.
>
> I used the Delphes installed in MG5, and I checked the lhe event number ( it said it refine the number to 100k so I didn't check it ), I generate 100k, but when I use:
> cat unweighted_events.lhe | grep event | wc -l
> I only get 12756, so that the event number is lost in lhe level.
>
>
>> This sounds as a quota issue (and/or not enough space left on disk)
>
> Thanks for that
>
> and get the following:
>> INFO: P1_qq_bbxzz
>> INFO: Idle: 30, Running: 8, Completed: 0 [ current time: 22h53 ]
>> INFO: Idle: 29, Running: 8, Completed: 1 [ 0.75s ]
>> INFO: Idle: 27, Running: 8, Completed: 3 [ 3.8s ]
>> INFO: Idle: 24, Running: 8, Completed: 6 [ 6.9s ]
>
> I used the same commands as yours. And it runs fast in this step,
> but,
>
> INFO: Idle: 28, Running: 8, Completed: 2 [ current time: 23h30 ]
> INFO: Idle: 27, Running: 8, Completed: 3 [ 0.92s ]
> INFO: Idle: 24, Running: 8, Completed: 6 [ 5s ]
> INFO: Idle: 23, Running: 8, Completed: 7 [ 23.2s ]
> INFO: Idle: 21, Running: 8, Completed: 9 [ 26.4s ]
> INFO: Idle: 17, Running: 8, Completed: 13 [ 31.4s ]
> INFO: Idle: 16, Running: 8, Completed: 14 [ 37.8s ]
> INFO: Idle: 13, Running: 8, Completed: 17 [ 45.6s ]
> INFO: Idle: 9, Running: 8, Completed: 21 [ 49.9s ]
> INFO: Idle: 8, Running: 8, Completed: 22 [ 53.1s ]
> INFO: Idle: 4, Running: 8, Completed: 26 [ 1m 0s ]
> INFO: Idle: 2, Running: 8, Completed: 28 [ 1m 4s ]
> INFO: Idle: 0, Running: 8, Completed: 30 [ 1m 7s ]
> INFO: Idle: 0, Running: 3, Completed: 35 [ 1m 14s ]
> INFO: Idle: 0, Running: 2, Completed: 36 [ 1m 17s ]
> INFO: Idle: 0, Running: 0, Completed: 38 [ 1m 20s ]
> INFO: End survey
> refine 100000
> Creating Jobs
> INFO: Refine results to 100000
> INFO: Generating 100000.0 unweigthed events.
> INFO: Effective Luminosity 3893175.24954 pb^-1
> INFO: need to improve 64 channels
> Current estimate of cross-section: 0.0308231693434 +- 0.000496561906993
> P1_qq_zzqq
> P1_gg_zzbbx
> P1_gg_zzqq
> P1_gq_zzgq
> P1_qq_zzgg
> P1_qq_zzbbx
> INFO: Idle: 153, Running: 8, Completed: 0 [ current time: 23h31 ]
>
> Then it runs extremely slow.
>
> I afraid this is the same problem:
> https://answers.launchpad.net/mg5amcnlo/+question/257133.
>
> And I guess the problem may, in the first cluster it doesn't compile
> it right so it loss a lot of events, and in the second cluster, it try
> to compile right but no enough space.
>
> Best,
> Li
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Li (huangli-itp) said : #5

Hi Olivier,

   Thanks very much, I will try. But can you please explain "such kind of process"? In what situation the event number is limited?

Best,
Li

All processes have a maximal number of events that can be reached in one go.
Most of the time this is quite high and nobody notice (since at that stage you want to move to gridpack anyway) But the VBF processes (and similar) are the one where this limit is quite low.

Cheers,

Olivier

> On 30 May 2018, at 06:47, Li <email address hidden> wrote:
>
> Question #669754 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/669754
>
> Li posted a new comment:
> Hi Olivier,
>
> Thanks very much, I will try. But can you please explain "such kind
> of process"? In what situation the event number is limited?
>
> Best,
> Li
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Li (huangli-itp) said : #7

Thanks Olivier Mattelaer, that solved my question.