MadGraph5_aMC@NLO

Error when running MadGraph on a SLURM cluster

Asked by xan1234 on 2017-10-17

I have tried to run the latest version of MadGraph on the SLURM cluster (for gluino pair production at 13TeV, parton level). I have kept the default MSSM parameter and run card and have changed the following options in Cards/me5_configuration.txt

run_mode = 1
cluster_type = slurm
cluster_queue = long
cluster_size = 20

cluster_temp_path = /localscratch/${SLURM_JOB_ID}/ (on my cluster it is not SLURM_JOBID so I changed cluster.py)
cluster_local_path = <path to my MadGraph output directory>

cluster_status_update = 60 30
cluster_nb_retry = 1

When I run ./bin/generate_events -f , everything seems to run correctly at first but at the very end I get the following error messages

CRITICAL: Fail to run correctly job 3025542.
            with option: {'log': None, 'stdout': None, 'argument': ['0', '1', '2.2'], 'nb_submit': 1, 'stderr': None, 'prog': '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/survey.sh', 'output_files': ['G1', 'G2.2'], 'time_check': 1508238204.298165, 'cwd': '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/P2_gq_gogoq', 'required_output': ['G1/results.dat', 'G2.2/results.dat'], 'input_files': ['madevent', 'input_app.txt', 'symfact.dat', 'iproc.dat', 'dname.mg', '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/randinit', '']}
            file missing: /gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/P2_gq_gogoq/G1/results.dat
            Fails 1 times
            No resubmition.
INFO: Idle: 35, Running: 19, Completed: 56 [ 23m 23s ]
INFO: All jobs finished
INFO: Idle: 0, Running: 0, Completed: 110 [ 23m 54s ]
INFO: End survey
refine 10000
Creating Jobs
INFO: Refine results to 10000
INFO: Generating 10000.0 unweigthed events.
INFO: Effective Luminosity 1.2e+103 pb^-1
INFO: need to improve 0 channels
Survey return zero cross section.
   Typical reasons are the following:
   1) A massive s-channel particle has a width set to zero.
   2) The pdf are zero for at least one of the initial state particles
      or you are using maxjetflavor=4 for initial state b:s.
   3) The cuts are too strong.
   Please check/correct your param_card and/or your run_card.
Zero result detected: See https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/FAQ-General-14

(the reason why the path of the MG folder indicates an old version number is that I have used the interface to update MadGraph and did not change the folder name)

The log files for the subprocesses contain in particular the following lines

Process in group number 1
A PDF is used, so alpha_s(MZ) is going to be modified
Warning: file ../../../../../.././Cards/param_card.dat is not correct

So I am concerned that instead of looking for the cards in the local folder the program is looking for it in the folder located on the cluster node. Do you know how I could fix this?

Thank you in advance for your help!

Best regards,
Sonia

Question information

Language:: English Edit question

Status:: Solved

For:: MadGraph5_aMC@NLO Edit question

Assignee:: No assignee Edit question

Solved by:: Olivier Mattelaer

Solved:: 2017-10-20

Last query:: 2017-10-20

Last reply:: 2017-10-18

Link existing bug

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) said on 2017-10-18:

Hi,

> cluster_temp_path = /localscratch/${SLURM_JOB_ID}/ (on my cluster it is not SLURM_JOBID so I changed cluster.py)

Is this part working? you changed cluster.py so I can not really tell. But this is likely the problem.
Does it work if you do not specify such path?

> cluster_local_path = <path to my MadGraph output directory>

The idea of such path is to be used with cvmfs path to point to a place where you have access to the lhapdf file, such that you do not have to read it from the main disk/ copy it to the running node. So this path is also likely to be incorrect.

Cheers,

Olivier

> On 17 Oct 2017, at 14:09, sonia el hedri <email address hidden> wrote:
>
> New question #659577 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/659577
>
> I have tried to run the latest version of MadGraph on the SLURM cluster (for gluino pair production at 13TeV, parton level). I have kept the default MSSM parameter and run card and have changed the following options in Cards/me5_configuration.txt
>
> run_mode = 1
> cluster_type = slurm
> cluster_queue = long
> cluster_size = 20
>
> cluster_temp_path = /localscratch/${SLURM_JOB_ID}/ (on my cluster it is not SLURM_JOBID so I changed cluster.py)
> cluster_local_path = <path to my MadGraph output directory>
>
> cluster_status_update = 60 30
> cluster_nb_retry = 1
>
> When I run ./bin/generate_events -f , everything seems to run correctly at first but at the very end I get the following error messages
>
> CRITICAL: Fail to run correctly job 3025542.
> with option: {'log': None, 'stdout': None, 'argument': ['0', '1', '2.2'], 'nb_submit': 1, 'stderr': None, 'prog': '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/survey.sh', 'output_files': ['G1', 'G2.2'], 'time_check': 1508238204.298165, 'cwd': '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/P2_gq_gogoq', 'required_output': ['G1/results.dat', 'G2.2/results.dat'], 'input_files': ['madevent', 'input_app.txt', 'symfact.dat', 'iproc.dat', 'dname.mg', '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/randinit', '']}
> file missing: /gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/P2_gq_gogoq/G1/results.dat
> Fails 1 times
> No resubmition.
> INFO: Idle: 35, Running: 19, Completed: 56 [ 23m 23s ]
> INFO: All jobs finished
> INFO: Idle: 0, Running: 0, Completed: 110 [ 23m 54s ]
> INFO: End survey
> refine 10000
> Creating Jobs
> INFO: Refine results to 10000
> INFO: Generating 10000.0 unweigthed events.
> INFO: Effective Luminosity 1.2e+103 pb^-1
> INFO: need to improve 0 channels
> Survey return zero cross section.
> Typical reasons are the following:
> 1) A massive s-channel particle has a width set to zero.
> 2) The pdf are zero for at least one of the initial state particles
> or you are using maxjetflavor=4 for initial state b:s.
> 3) The cuts are too strong.
> Please check/correct your param_card and/or your run_card.
> Zero result detected: See https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/FAQ-General-14
>
> (the reason why the path of the MG folder indicates an old version number is that I have used the interface to update MadGraph and did not change the folder name)
>
> The log files for the subprocesses contain in particular the following lines
>
> Process in group number 1
> A PDF is used, so alpha_s(MZ) is going to be modified
> Warning: file ../../../../../.././Cards/param_card.dat is not correct
>
> So I am concerned that instead of looking for the cards in the local folder the program is looking for it in the folder located on the cluster node. Do you know how I could fix this?
>
> Thank you in advance for your help!
>
> Best regards,
> Sonia
>
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Hi,

> cluster_temp_path = /localscratch/${SLURM_JOB_ID}/              (on my cluster it is not SLURM_JOBID so I changed cluster.py)

Is this part working? you changed cluster.py so I can not really tell. But this is likely the problem.
Does it work if you do not specify such path?

> cluster_local_path = <path to my MadGraph output directory>

Cheers,

Olivier

> On 17 Oct 2017, at 14:09, sonia el hedri <question659577@answers.launchpad.net> wrote:
> 
> New question #659577 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/659577
> 
> I have tried to run the latest version of MadGraph on the SLURM cluster  (for gluino pair production at 13TeV, parton level). I have kept the default MSSM parameter and run card and have changed the following options in Cards/me5_configuration.txt
> 
> run_mode = 1
> cluster_type = slurm
> cluster_queue = long
> cluster_size = 20
> 
> cluster_temp_path = /localscratch/${SLURM_JOB_ID}/              (on my cluster it is not SLURM_JOBID so I changed cluster.py)
> cluster_local_path = <path to my MadGraph output directory>
> 
> cluster_status_update = 60 30
> cluster_nb_retry = 1
> 
> When I run ./bin/generate_events -f , everything seems to run correctly at first but at the very end I get the following error messages
> 
> CRITICAL: Fail to run correctly job 3025542.
>            with option: {'log': None, 'stdout': None, 'argument': ['0', '1', '2.2'], 'nb_submit': 1, 'stderr': None, 'prog': '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/survey.sh', 'output_files': ['G1', 'G2.2'], 'time_check': 1508238204.298165, 'cwd': '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/P2_gq_gogoq', 'required_output': ['G1/results.dat', 'G2.2/results.dat'], 'input_files': ['madevent', 'input_app.txt', 'symfact.dat', 'iproc.dat', 'dname.mg', '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/randinit', '']}
>            file missing: /gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/P2_gq_gogoq/G1/results.dat
>            Fails 1 times
>            No resubmition.  
> INFO:  Idle: 35,  Running: 19,  Completed: 56 [  23m 23s  ] 
> INFO: All jobs finished 
> INFO:  Idle: 0,  Running: 0,  Completed: 110 [  23m 54s  ] 
> INFO: End survey 
> refine 10000
> Creating Jobs
> INFO: Refine results to 10000 
> INFO: Generating 10000.0 unweigthed events. 
> INFO: Effective Luminosity 1.2e+103 pb^-1 
> INFO: need to improve 0 channels 
> Survey return zero cross section. 
>   Typical reasons are the following:
>   1) A massive s-channel particle has a width set to zero.
>   2) The pdf are zero for at least one of the initial state particles
>      or you are using maxjetflavor=4 for initial state b:s.
>   3) The cuts are too strong.
>   Please check/correct your param_card and/or your run_card.
> Zero result detected: See https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/FAQ-General-14
> 
> (the reason why the path of the MG folder indicates an old version number is that I have used the interface to update MadGraph and did not change the folder name)
> 
> The log files for the subprocesses contain in particular the following lines
> 
> Process in group number            1
> A PDF is used, so alpha_s(MZ) is going to be modified
> Warning: file ../../../../../.././Cards/param_card.dat                                                   is not correct
> 
> So I am concerned that instead of looking for the cards in the local folder the program is looking for it in the folder located on the cluster node. Do you know how I could fix this?
> 
> Thank you in advance for your help!
> 
> Best regards,
> Sonia
> 
> 
> -- 
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message

xan1234 (xan1234) said on 2017-10-20:

Thanks Olivier Mattelaer, that solved my question.

To post a message you must log in.

Ask a question

Edit question

MadGraph5_aMC@NLO

Error when running MadGraph on a SLURM cluster

Question information

Related bugs

Related FAQ:

Subscribers