Stuck at the last job of Pythia8 on cluster

Asked by teddym

Dear MG5 authors:
     I'm using a slurm cluster to generate events for lots of parameter points. However, randomly, some job will be stuck at the last job of Pythia8. Following is the last output I got from the program for one particular case, others are similar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Submitting Pythia8 jobs...
Pythia8 shower jobs: 0 Idle, 32 Running, 0 Done [1m18s]
Pythia8 shower jobs: 0 Idle, 30 Running, 2 Done [2m34s]
Pythia8 shower jobs: 0 Idle, 29 Running, 3 Done [2m34s]
Pythia8 shower jobs: 0 Idle, 28 Running, 4 Done [2m34s]
Pythia8 shower jobs: 0 Idle, 27 Running, 5 Done [2m35s]
Pythia8 shower jobs: 0 Idle, 26 Running, 6 Done [2m36s]
Pythia8 shower jobs: 0 Idle, 25 Running, 7 Done [2m38s]
Pythia8 shower jobs: 0 Idle, 23 Running, 9 Done [2m38s]
Pythia8 shower jobs: 0 Idle, 22 Running, 10 Done [2m38s]
Pythia8 shower jobs: 0 Idle, 21 Running, 11 Done [2m40s]
Pythia8 shower jobs: 0 Idle, 20 Running, 12 Done [2m40s]
Pythia8 shower jobs: 0 Idle, 18 Running, 14 Done [2m42s]
Pythia8 shower jobs: 0 Idle, 17 Running, 15 Done [2m43s]
Pythia8 shower jobs: 0 Idle, 16 Running, 16 Done [2m46s]
Pythia8 shower jobs: 0 Idle, 14 Running, 18 Done [2m46s]
Pythia8 shower jobs: 0 Idle, 12 Running, 20 Done [2m47s]
Pythia8 shower jobs: 0 Idle, 10 Running, 22 Done [2m48s]
Pythia8 shower jobs: 0 Idle, 9 Running, 23 Done [2m49s]
Pythia8 shower jobs: 0 Idle, 8 Running, 24 Done [2m49s]
Pythia8 shower jobs: 0 Idle, 7 Running, 25 Done [2m51s]
Pythia8 shower jobs: 0 Idle, 6 Running, 26 Done [2m54s]
Pythia8 shower jobs: 0 Idle, 5 Running, 27 Done [2m59s]
Pythia8 shower jobs: 0 Idle, 4 Running, 28 Done [2m59s]
Pythia8 shower jobs: 0 Idle, 3 Running, 29 Done [3m01s]
Pythia8 shower jobs: 0 Idle, 2 Running, 30 Done [3m02s]
Pythia8 shower jobs: 0 Idle, 1 Running, 31 Done [3m02s]
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

I would like to know if there is any way to prevent such behavior?

Best
Ted

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi,

I do not see why one job will take much longer than the other but we are not pythia8 expert (even if we implemented the running of py8 in parralel).

However this does not seem to be a log from a cluster mode, it seems that you are in a multi-core mode.
If you do squeue
Do you see madgraph submitting jobs on the cluster?

Cheers,

Olivier

Revision history for this message
teddym (niepanchongsheng) said :
#2

Hi Olivier:
     Yes, I just submit the whole procedure as a job in the slurm cluster, but the MG is till running with multi-core mode. I required the whole node, so I though the MG will occupy all the cores on the node. Is this a good practice? As if the MG need to submit the job to cluster, it may be waiting in the queue as we have limited resources and too many jobs on the cluster. Thanks!

Best
Ted

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

> Is this a good practice? As if the MG need to submit the job to cluster, it may be waiting in the queue as we have limited resources and too many jobs on the cluster.

At the same time, you will require less ressource (one core per job and not one full machine) so this will be easier to be allocated and you will not waste that much ressource (so that help to decrease the pressure on the cluster).
So the final answer does depend of your cluster and of the type of job that typically runs of those.

Cheers,

Olivier

> On 15 Sep 2021, at 17:41, teddym <email address hidden> wrote:
>
> Question #698701 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/698701
>
> Status: Answered => Open
>
> teddym is still having a problem:
> Hi Olivier:
> Yes, I just submit the whole procedure as a job in the slurm cluster, but the MG is till running with multi-core mode. I required the whole node, so I though the MG will occupy all the cores on the node. Is this a good practice? As if the MG need to submit the job to cluster, it may be waiting in the queue as we have limited resources and too many jobs on the cluster. Thanks!
>
>
> Best
> Ted
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Can you help with this problem?

Provide an answer of your own, or ask teddym for more information if necessary.

To post a message you must log in.