Pythia in cluster run mode on SLURM

Asked by Matthew Low

Hi,

I'm currently running events in cluster run mode on a SLURM batch system. I understand you probably don't have access yourself to a system running SLURM, but perhaps you could help me understand what might be going wrong. While generating events everything works fine up to parton events, but then when pythia is supposed to run I get several copies of the following error:

WARNING: /var/spool/slurmd/job339341/slurm_script: line 22: /pythia: No such file or directory

Perhaps somewhere a relative path is being used but SLURM needs a full path? Do you have any suggestions of where I could start looking to solve the problem?

Thanks!
- Matthew

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi Matthew,

pythia is launched via a script which configure pythia and which takes as argument the path to the pythia directory.
The error that you point indicates that the argument to the script is not recognized therefore the path is missing and is interpreted as ''.

Indeed in the slurm part of cluster.py, I don't see any part of the code handling those arguments.

Maybe the following changes should be ok:
replace (around line 1047 in madgraph/various/cluster.py):
        command = ['sbatch','-o', stdout,
                   '-J', me_dir,
                   '-e', stderr, prog]
by:
        command = ['sbatch','-o', stdout,
                   '-J', me_dir,
                   '-e', stderr, prog + ' '.join(argument)]

Tell me if this works and/or the correct way to pass argument to slurm cluster and I will include the fix in 1.5.13.

Cheers,

Olivier

On Oct 10, 2013, at 9:46 PM, Matthew Low <email address hidden> wrote:

> New question #237170 on MadGraph5:
> https://answers.launchpad.net/madgraph5/+question/237170
>
> Hi,
>
> I'm currently running events in cluster run mode on a SLURM batch system. I understand you probably don't have access yourself to a system running SLURM, but perhaps you could help me understand what might be going wrong. While generating events everything works fine up to parton events, but then when pythia is supposed to run I get several copies of the following error:
>
> WARNING: /var/spool/slurmd/job339341/slurm_script: line 22: /pythia: No such file or directory
>
> Perhaps somewhere a relative path is being used but SLURM needs a full path? Do you have any suggestions of where I could start looking to solve the problem?
>
> Thanks!
> - Matthew
>
> --
> You received this question notification because you are a member of
> MadTeam, which is an answer contact for MadGraph5.

Revision history for this message
Matthew Low (mattlow) said :
#2

Hi Olivier,

Thanks for the help. That should be the correct fix. I still have one small problem. After submitting the pythia job it fails with the message:

Fail to produce pythia output. More info in
     /home/MadGraph5_v1_5_12/Test/Events/run_01/tag_1_pythia.log

When I look at the log file and compare with a log file from successfully running pythia in multicore mode the only differences I notice are:

- In the failed log there is an addition printing of "Set LHAPATH to /home/MadGraph5_v1_5_12/pythia-pgs/src/PDFsets" before printing "Opened file unweighted_events.lhe".
- In the successful log there is a printing of "STDHEP version 5.04.01 - Aug. 29, 2005" in between loading in mass values, but there is no such line in the failed log.
- There are a few errors in the successful log: " Error type 9 has occured after 1066 PYEXEC calls:
     (PYPTIS:) Weight 1.3711E+00 above unity", but no such errors in the failed log.

In any case it is probably good to include your fix 1.5.13 since perhaps this is a problem specific to the cluster I am using.

Thanks,
- Matthew

Can you help with this problem?

Provide an answer of your own, or ask Matthew Low for more information if necessary.

To post a message you must log in.