"waiting while finishing jobs on cluster 5 0 0 5" repeats forever when used qsub ("default" process)

Asked by Pawin Ittisamai on 2012-02-10

Dear MadGraph Team,

When I submitted a job such as
./bin/generate_events 2 4 TestMulticore_4,
using qsub on an HPC cluster, it never finished. The log file shows

Working on subprocess:
    P3_gg_ttx Launching job ajob1
    P2_gq_lvlq Launching job ajob1
    P2_qq_lvlg Launching job ajob1
    P1_qq_lvl Launching job ajob1
    P3_qq_ttx Launching job ajob1
waiting while finishing jobs on cluster
5 0 0 5
5 0 0 5
5 0 0 5
5 0 0 5
where "5 0 0 5" repeats forever (until I got kicked out from the cluster due to time limit). The process is just the default one that I copied from the Template folder. It ran fine in an interactive mode of the HPC. The qsub script I used is attached at the end of the message.

I'm using MadGraph5 1.3.30.

So it would be great if you can either let me know how to fix the problem, or where to look further to find its cause. If there is a guide to set up MG5 on an HPC cluster, please let me know where to read one.

Thank you very much and have a great day!


Below is an example of the qsub script
#!/bin/sh -login

#PBS -l nodes=1:ppn=4,walltime=1:00:00,mem=1gb
#PBS -o /home/qsub_output/RUN_TestTemplate_n1_ppn4
#PBS -M <email address hidden>
#PBS -m abe
#PBS -N MG5_RUN_TestTemplate_1_3_30_n1_ppn4
#PBS -r n

cd ~/Work/MadGraph5_v1_3_30/TestTemplate_n1_ppn4/
./bin/generate_events 2 4 TestTemplate_1_3_30_n1_ppn4

Question information

English Edit question
MadGraph5_aMC@NLO Edit question
No assignee Edit question
Last query:
Last reply:

Hi Pawin,

All the cluster management have completely changed between 1.3 and 1.4 version.
So my first advice would be to test with 1.4 rather than 1.3

I have various idea why they are a problem, and they should be fixed in 1.4
So could you try it?



Can you help with this problem?

Provide an answer of your own, or ask Pawin Ittisamai for more information if necessary.

To post a message you must log in.