slurm cluser run_mode of MadGraph

Asked by Keping Xie

Hi everyone,

I meet a problem related the cluster run_mode.
Our university use ManeFrame cluster, which is slurm type. And it has 3 types of partitions (or queues)-interactive, parallel, highmem (high memory).
However, when I set the mg5_configuration.txt

run_mode = 1
cluster_type = slurm
cluster_queue = interactive ( or parallel, or highmem)

none of them works.
My MG5_aMC shell displays:

Working on SubProcesses
    P0_qq_ll
INFO: Idle: 0, Running: 1, Completed: 0 [ 0.12s ]
INFO: Idle: 0, Running: 1, Completed: 0 [ 30.3s ]
INFO: Idle: 0, Running: 1, Completed: 0 [ 1m 0s ]
INFO: Idle: 0, Running: 1, Completed: 0 [ 1m 30s ]
INFO: Idle: 0, Running: 1, Completed: 0 [ 2m 0s ]
INFO: Idle: 0, Running: 1, Completed: 0 [ 2m 30s ]
INFO: Idle: 0, Running: 1, Completed: 0 [ 3m 0s ]
INFO: Idle: 0, Running: 1, Completed: 0 [ 3m 31s ]
INFO: Idle: 0, Running: 1, Completed: 0 [ 4m 1s ]
INFO: Idle: 0, Running: 1, Completed: 0 [ 4m 31s ]
WARNING: resubmit job (for the 1 times)
INFO: Idle: 1, Running: 0, Completed: 1 [ 5m 1s ]
INFO: Start to wait 600s between checking status.
Note that you can change this time in the configuration file.
Press ctrl-C to force the update.
INFO: Idle: 0, Running: 1, Completed: 1 [ 15m 2s ]

and I tried "srun -p parallel (/interactive/highmem) ./bin/mg5_aMC",
but it cannot give me a interactive shell to perform the computation.

Is anyone who has good solution to this problem?

Keping

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
Paolo Torrielli Edit question
Last query:
Last reply:
Revision history for this message
Paolo Torrielli (paolo-torrielli) said :
#1

Hi Keping,

I run sometimes on a slurm cluster in Zurich, and set cluster_queue = serial, which,
on that particular cluster, offers a parallelisation of the jobs. You may try this option,
however, I don't know if this would help on your cluster.

I also recently had similar problems with that architecture, which should have been
fixed in the latest release, hence I recommend to try 2.3.0 and see if the issue persists.

In general, the script that takes care of cluster submission is 2.x.y/madgraph/various/
cluster.py, where you can find a section dedicated to the slurm family and the relative
submission commands around line 1620.
You may try to insert there by hand the queue options you need, if this helps.

Let me know.
Cheers.
Paolo

Can you help with this problem?

Provide an answer of your own, or ask Keping Xie for more information if necessary.

To post a message you must log in.