Submitting jobs on PBS cluster

Asked by Adarsh Pyarelal

Hi,

I was wondering if someone could help with submitting gridpack jobs to a PBS cluster?

Question 1: Is there a difference between making gridpacks and running MG5 on cluster mode?

Question 2: We have three kinds of clusters at the University: HTC, MPI, and SMP. Do you know which cluster I should submit the job to? My guess is HTC, but I wanted to double-check.

Question 3: Do we have to run the command 'qsub' ourselves, or does MadEvent run it for us? That is, can I just ssh into the login node for my cluster and do

$ ./bin/mg5
> generate p p > t t~
> output ttbar
> exit

$ cd ttbar
$ vim Cards/me5_configuration.txt
run_mode = 1 (cluster)
cluster_type = pbs
cluster_queue = standard
cluster_size = 1248
(and do I have to change cluster temp_path and/or cluster_local_path)?

$ mkdir dev; cd dev; mkdir null (since these directories are specified for stderr and stdout in cluster.py)
$ cd ../../
$ vim bin/internal/cluster.py, then insert the following at line 1078:
command = ['qsub','-o', stdout,
                   '-m', 'bea',
                   '-M', '<email address hidden>',
                   '-W', 'group_list=shufang',
                   '-q', 'standard',
                   '-l', 'select=1:ncpus=12:mem=23gb:localscratch=1',
                   '-l', 'jobtype=htc_only',
                   '-l', 'cput=12:0:0',
                   '-l', 'walltime=1:0:0',
                   '-N', me_dir,
                   '-e', stderr,
                   '-V']

Then should I just do

$ ./bin/generate_events?

That doesn't seem to work...but maybe I missed something?

Or, should I do
$ qsub submit.pbs

************** submit.pbs ******************************
#PBS -N madgraph
#PBS -m bea
#PBS -M <email address hidden>
#PBS -W group_list=shufang
#PBS -q standard
#PBS -l select=1:ncpus=12:mem=23gb:localscratch=1
#PBS -l jobtype=htc_only
#PBS -l cput=12:0:0
#PBS -l walltime=1:0:0
./bin/generate_events -f

*****************************************************

Or, alternatively, should I generate a gridpack according to the instructions on your website here: https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/GridDevelopment,

compile it, then do

$ ./run.sh 1000 37 (for example)

from the login node? Does the run.sh script do the qsub command with the arguments in cluster.py? Or does the run.sh script have to be executed from the PBS batch file like so:

qsub submit.pbs

*** submit.pbs ***

#PBS -N madgraph
...
...
...
./run.sh 1000 37

******************

And if I do generate a gridpack, does it have to run on a single core, or can it use multiple cores on a single node (I gather that the gridpack is designed to run on a single node)? To this end, what should I put in the run_mode in the madevent/Cards/me5_configuration.txt file (after doing tar -zxvf run_01_gridpack.tar.gz)? 0, 1, or 2? Does it even matter?

To clarify, I was able to run MadEvent (not a gridpack) on multi-core mode on a single node with 12 cores on the htc cluster. I was just wondering if I could speed it up more, use more nodes/cores in a streamlined way. - from what I've seen on your websites, it seems like the gridpack method is the way to go, but I am just a bit unclear on the various running options. For example, what does

$ ./bin/generate_events --cluster

do exactly?

Thank you for your time!

-Adarsh

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
Last query:
Last reply:
Revision history for this message
Best Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi,

> Question 1: Is there a difference between making gridpacks and running
> MG5 on cluster mode?

Yes.

When a gridpack is created (typically by running on a cluster) then the code runs on a single core in a fully optimised (but card specific) way.
So the gridpack mode is intended for environment where you have a lot of job failing while the cluster mode is more versatile and more adapted for environment
with high job success (since if one job completely failed to return —by default twice— then the full run need to be discarded.)

> Question 2: We have three kinds of clusters at the University: HTC, MPI,
> and SMP. Do you know which cluster I should submit the job to? My guess
> is HTC, but I wanted to double-check.

Certainly not MPI, I do not know SMP so cannot comment on that. HTC should work.

> Question 3: Do we have to run the command 'qsub' ourselves, or does
> MadEvent run it for us? That is, can I just ssh into the login node for
> my cluster and do

MadEvent runs qsub and qstat, relaunch the failing jobs, analyse the output and submit additional jobs to reach target.
So no you do not need to run qsub yourself.

> cluster_size = 1248
> (and do I have to change cluster temp_path and/or cluster_local_path)?

cluster_size is only use by loop-induced process for the moment.
temp_path can be use if you disk access speed issue. (None should be fine)
cluster_local_path is a trick to avoid to read lhapdf on the central disk, in principle the default is ok but looks like on some cluster it make it crash and you need to change it to None.
So you can try that

> $ mkdir dev; cd dev; mkdir null (since these directories are specified for stderr and stdout in cluster.py)

We use /dev/null which is the standard linux path for discarded output. I’m surprise that you need to play with that.

> $ vim bin/internal/cluster.py, then insert the following at line 1078:
> command = ['qsub','-o', stdout,
> '-m', 'bea',
> '-M', '<email address hidden>',
> '-W', 'group_list=shufang',
> '-q', 'standard',
> '-l', 'select=1:ncpus=12:mem=23gb:localscratch=1',
> '-l', 'jobtype=htc_only',
> '-l', 'cput=12:0:0',
> '-l', 'walltime=1:0:0',
> '-N', me_dir,
> '-e', stderr,
> ‘-V']

Sounds good but I’m not a PBS expert.
Why do you select ncpus=12?

> Or, should I do
> $ qsub submit.pbs

This can work too but only if a node is allowed to run qsub.

> $ ./run.sh 1000 37 (for example)
>
> from the login node? Does the run.sh script do the qsub command with the
> arguments in cluster.py? Or does the run.sh script have to be executed
> from the PBS batch file like so:

Yes this one need to be submitted via qsub

> And if I do generate a gridpack, does it have to run on a single core,
> or can it use multiple cores on a single node (I gather that the
> gridpack is designed to run on a single node)? To this end, what should
> I put in the run_mode in the madevent/Cards/me5_configuration.txt file
> (after doing tar -zxvf run_01_gridpack.tar.gz)? 0, 1, or 2? Does it even
> matter?

It is only single core.

> To this end, what should
> I put in the run_mode in the madevent/Cards/me5_configuration.txt file
> (after doing tar -zxvf run_01_gridpack.tar.gz)? 0, 1, or 2? Does it even
> matter?

Does not matter

> For example, what does
>
> $ ./bin/generate_events --cluster
>
> do exactly?

the option —cluster overwrites the value run_mode and set it to one (cluster mode).

Cheers,

Olivier

On 25 Jul 2015, at 19:31, Adarsh Pyarelal <email address hidden> wrote:

> Question #269645 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/269645
>
> Description changed to:
> Hi,
>
> I was wondering if someone could help with submitting gridpack jobs to a
> PBS cluster?
>
> Question 1: Is there a difference between making gridpacks and running
> MG5 on cluster mode?
>
> Question 2: We have three kinds of clusters at the University: HTC, MPI,
> and SMP. Do you know which cluster I should submit the job to? My guess
> is HTC, but I wanted to double-check.
>
> Question 3: Do we have to run the command 'qsub' ourselves, or does
> MadEvent run it for us? That is, can I just ssh into the login node for
> my cluster and do
>
> $ ./bin/mg5
>> generate p p > t t~
>> output ttbar
>> exit
>
> $ cd ttbar
> $ vim Cards/me5_configuration.txt
> run_mode = 1 (cluster)
> cluster_type = pbs
> cluster_queue = standard
> cluster_size = 1248
> (and do I have to change cluster temp_path and/or cluster_local_path)?
>
> $ mkdir dev; cd dev; mkdir null (since these directories are specified for stderr and stdout in cluster.py)
> $ cd ../../
> $ vim bin/internal/cluster.py, then insert the following at line 1078:
> command = ['qsub','-o', stdout,
> '-m', 'bea',
> '-M', '<email address hidden>',
> '-W', 'group_list=shufang',
> '-q', 'standard',
> '-l', 'select=1:ncpus=12:mem=23gb:localscratch=1',
> '-l', 'jobtype=htc_only',
> '-l', 'cput=12:0:0',
> '-l', 'walltime=1:0:0',
> '-N', me_dir,
> '-e', stderr,
> '-V']
>
> Then should I just do
>
> $ ./bin/generate_events?
>
> That doesn't seem to work...but maybe I missed something?
>
> Or, should I do
> $ qsub submit.pbs
>
> ************** submit.pbs ******************************
> #PBS -N madgraph
> #PBS -m bea
> #PBS -M <email address hidden>
> #PBS -W group_list=shufang
> #PBS -q standard
> #PBS -l select=1:ncpus=12:mem=23gb:localscratch=1
> #PBS -l jobtype=htc_only
> #PBS -l cput=12:0:0
> #PBS -l walltime=1:0:0
> module load python
> ./bin/generate_events -f
>
> *****************************************************
>
> Or, alternatively, should I generate a gridpack according to the
> instructions on your website here:
> https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/GridDevelopment,
>
> compile it, then do
>
> $ ./run.sh 1000 37 (for example)
>
> from the login node? Does the run.sh script do the qsub command with the
> arguments in cluster.py? Or does the run.sh script have to be executed
> from the PBS batch file like so:
>
> qsub submit.pbs
>
> *** submit.pbs ***
>
> #PBS -N madgraph
> ...
> ...
> ...
> ./run.sh 1000 37
>
> ******************
>
> And if I do generate a gridpack, does it have to run on a single core,
> or can it use multiple cores on a single node (I gather that the
> gridpack is designed to run on a single node)? To this end, what should
> I put in the run_mode in the madevent/Cards/me5_configuration.txt file
> (after doing tar -zxvf run_01_gridpack.tar.gz)? 0, 1, or 2? Does it even
> matter?
>
> To clarify, I was able to run MadEvent (not a gridpack) on multi-core
> mode on a single node with 12 cores on the htc cluster. I was just
> wondering if I could speed it up more, use more nodes/cores in a
> streamlined way. - from what I've seen on your websites, it seems like
> the gridpack method is the way to go, but I am just a bit unclear on the
> various running options. For example, what does
>
> $ ./bin/generate_events --cluster
>
> do exactly?
>
> Thank you for your time!
>
> -Adarsh
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Adarsh Pyarelal (adarsh-pyarelal) said :
#2

Ah, sorry I misunderstood the purpose of dev/null.

Revision history for this message
Adarsh Pyarelal (adarsh-pyarelal) said :
#3

Thanks Olivier Mattelaer, that solved my question.