Error when running MadGraph on a SLURM cluster

Asked by xan1234

I have tried to run the latest version of MadGraph on the SLURM cluster (for gluino pair production at 13TeV, parton level). I have kept the default MSSM parameter and run card and have changed the following options in Cards/me5_configuration.txt

run_mode = 1
cluster_type = slurm
cluster_queue = long
cluster_size = 20

cluster_temp_path = /localscratch/${SLURM_JOB_ID}/ (on my cluster it is not SLURM_JOBID so I changed cluster.py)
cluster_local_path = <path to my MadGraph output directory>

cluster_status_update = 60 30
cluster_nb_retry = 1

When I run ./bin/generate_events -f , everything seems to run correctly at first but at the very end I get the following error messages

CRITICAL: Fail to run correctly job 3025542.
            with option: {'log': None, 'stdout': None, 'argument': ['0', '1', '2.2'], 'nb_submit': 1, 'stderr': None, 'prog': '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/survey.sh', 'output_files': ['G1', 'G2.2'], 'time_check': 1508238204.298165, 'cwd': '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/P2_gq_gogoq', 'required_output': ['G1/results.dat', 'G2.2/results.dat'], 'input_files': ['madevent', 'input_app.txt', 'symfact.dat', 'iproc.dat', 'dname.mg', '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/randinit', '']}
            file missing: /gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/P2_gq_gogoq/G1/results.dat
            Fails 1 times
            No resubmition.
INFO: Idle: 35, Running: 19, Completed: 56 [ 23m 23s ]
INFO: All jobs finished
INFO: Idle: 0, Running: 0, Completed: 110 [ 23m 54s ]
INFO: End survey
refine 10000
Creating Jobs
INFO: Refine results to 10000
INFO: Generating 10000.0 unweigthed events.
INFO: Effective Luminosity 1.2e+103 pb^-1
INFO: need to improve 0 channels
Survey return zero cross section.
   Typical reasons are the following:
   1) A massive s-channel particle has a width set to zero.
   2) The pdf are zero for at least one of the initial state particles
      or you are using maxjetflavor=4 for initial state b:s.
   3) The cuts are too strong.
   Please check/correct your param_card and/or your run_card.
Zero result detected: See https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/FAQ-General-14

(the reason why the path of the MG folder indicates an old version number is that I have used the interface to update MadGraph and did not change the folder name)

The log files for the subprocesses contain in particular the following lines

 Process in group number 1
 A PDF is used, so alpha_s(MZ) is going to be modified
 Warning: file ../../../../../.././Cards/param_card.dat is not correct

So I am concerned that instead of looking for the cards in the local folder the program is looking for it in the folder located on the cluster node. Do you know how I could fix this?

Thank you in advance for your help!

Best regards,
Sonia

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
Last query:
Last reply:
Revision history for this message
Best Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi,

> cluster_temp_path = /localscratch/${SLURM_JOB_ID}/ (on my cluster it is not SLURM_JOBID so I changed cluster.py)

Is this part working? you changed cluster.py so I can not really tell. But this is likely the problem.
Does it work if you do not specify such path?

> cluster_local_path = <path to my MadGraph output directory>

The idea of such path is to be used with cvmfs path to point to a place where you have access to the lhapdf file, such that you do not have to read it from the main disk/ copy it to the running node. So this path is also likely to be incorrect.

Cheers,

Olivier

> On 17 Oct 2017, at 14:09, sonia el hedri <email address hidden> wrote:
>
> New question #659577 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/659577
>
> I have tried to run the latest version of MadGraph on the SLURM cluster (for gluino pair production at 13TeV, parton level). I have kept the default MSSM parameter and run card and have changed the following options in Cards/me5_configuration.txt
>
> run_mode = 1
> cluster_type = slurm
> cluster_queue = long
> cluster_size = 20
>
> cluster_temp_path = /localscratch/${SLURM_JOB_ID}/ (on my cluster it is not SLURM_JOBID so I changed cluster.py)
> cluster_local_path = <path to my MadGraph output directory>
>
> cluster_status_update = 60 30
> cluster_nb_retry = 1
>
> When I run ./bin/generate_events -f , everything seems to run correctly at first but at the very end I get the following error messages
>
> CRITICAL: Fail to run correctly job 3025542.
> with option: {'log': None, 'stdout': None, 'argument': ['0', '1', '2.2'], 'nb_submit': 1, 'stderr': None, 'prog': '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/survey.sh', 'output_files': ['G1', 'G2.2'], 'time_check': 1508238204.298165, 'cwd': '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/P2_gq_gogoq', 'required_output': ['G1/results.dat', 'G2.2/results.dat'], 'input_files': ['madevent', 'input_app.txt', 'symfact.dat', 'iproc.dat', 'dname.mg', '/gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/randinit', '']}
> file missing: /gpfs/fs1/home/elhed001/MG5_aMC_v2_5_2/test3/SubProcesses/P2_gq_gogoq/G1/results.dat
> Fails 1 times
> No resubmition.
> INFO: Idle: 35, Running: 19, Completed: 56 [ 23m 23s ]
> INFO: All jobs finished
> INFO: Idle: 0, Running: 0, Completed: 110 [ 23m 54s ]
> INFO: End survey
> refine 10000
> Creating Jobs
> INFO: Refine results to 10000
> INFO: Generating 10000.0 unweigthed events.
> INFO: Effective Luminosity 1.2e+103 pb^-1
> INFO: need to improve 0 channels
> Survey return zero cross section.
> Typical reasons are the following:
> 1) A massive s-channel particle has a width set to zero.
> 2) The pdf are zero for at least one of the initial state particles
> or you are using maxjetflavor=4 for initial state b:s.
> 3) The cuts are too strong.
> Please check/correct your param_card and/or your run_card.
> Zero result detected: See https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/FAQ-General-14
>
> (the reason why the path of the MG folder indicates an old version number is that I have used the interface to update MadGraph and did not change the folder name)
>
> The log files for the subprocesses contain in particular the following lines
>
> Process in group number 1
> A PDF is used, so alpha_s(MZ) is going to be modified
> Warning: file ../../../../../.././Cards/param_card.dat is not correct
>
> So I am concerned that instead of looking for the cards in the local folder the program is looking for it in the folder located on the cluster node. Do you know how I could fix this?
>
> Thank you in advance for your help!
>
> Best regards,
> Sonia
>
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
xan1234 (xan1234) said :
#2

Thanks Olivier Mattelaer, that solved my question.