MadGraph5_aMC@NLO

Madgraph jobs run for an unexpectedly long time

Asked by Hesham El Faham on 2021-01-10

Hello,

I am running tZW process at NLO at fixed order on Ingrid remote host. The jobs run through Slurm cluster. At an FO precision of 0.01, the whole process is completed in around 4 hours and the HwU gets generated. When increasing the precision to 0.001, the process runs fairly fast while computing the cross section and in the `refining results step 1 and step 2'. However, in `refining results step 3', all the sub-jobs get stuck running and none of them gets completed, things remain that way for a very long time. I expect that at 0.001, the process will take roughly 10 times the time at 0.01, that is around 40 hours, but the sub-jobs in `refining results step 3` get stuck running for more than 2 days. I am wondering if I should wait more or something is wrong. I change nothing in any of the cards from the 0.01 runs to the 0.001 ones except the FO precision parameter in the run card. May you please help with that?

Best,
Hesham

Question information

Language:: English Edit question

Status:: Expired

For:: MadGraph5_aMC@NLO Edit question

Assignee:: No assignee Edit question

Last query:: 2021-01-10

Last reply:: 2021-01-25

Link existing bug

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) said on 2021-01-10:

If the previous case was running in 4h.
The same process with 10x precision should takes 400h not 40h (this is Monte-Carlo integration the precision goes like 1/\sqrt(N) )

Cheers,

Olivier

> On 10 Jan 2021, at 01:50, Hesham El Faham <email address hidden> wrote:
>
> New question #694872 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/694872
>
> Hello,
>
> I am running tZW process at NLO at fixed order on Ingrid remote host. The jobs run through Slurm cluster. At an FO precision of 0.01, the whole process is completed in around 4 hours and the HwU gets generated. When increasing the precision to 0.001, the process runs fairly fast while computing the cross section and in the `refining results step 1 and step 2'. However, in `refining results step 3', all the sub-jobs get stuck running and none of them gets completed, things remain that way for a very long time. I expect that at 0.001, the process will take roughly 10 times the time at 0.01, that is around 40 hours, but the sub-jobs in `refining results step 3` get stuck running for more than 2 days. I am wondering if I should wait more or something is wrong. I change nothing in any of the cards from the 0.01 runs to the 0.001 ones except the FO precision parameter in the run card. May you please help with that?
>
> Best,
> Hesham
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message

Hesham El Faham (helfaham) said on 2021-01-10:

Thanks. I think 400 hours exceeds the limit of allowed job running time, if so, is there a way to take care of that on the level of amcatnlo_configuration.txt? There I use:
-> run_mode=1
-> cluster_type = slurm
-> cluster queue = None
-> cluster size = 150
or perhaps I should split the event generation from the run_card to speed up the process?

Best,
Hesham

Revision history for this message

Launchpad Janitor (janitor) said on 2021-01-25:

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

To post a message you must log in.

Ask a question

Edit question

MadGraph5_aMC@NLO

Madgraph jobs run for an unexpectedly long time

Question information

Related bugs

Related FAQ:

Subscribers