Facing problem at the of running MG5 in cluster

Asked by Sarif Khan

I am using the following script to submit job in cluster,

#!/bin/bash
#PBS -l nodes=1:ppn=12
#PBS -N mg2
#PBS -q default
#PBS -j oe
#PBS -e error
#PBS -o output

#mpirun -f $PBS_NODEFILE ?np 16 ./a.out
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE > pbsnodes

#./bin/madevent multi_run 100 --laststep=parton --cluster --multicore --nb_core = 5
#./bin/madevent multi_run 200 --laststep=parton --cluster --multicore 24
./bin/madevent multi_run 50 -f
#./bin/madevent multi_run 100

I am also doing the following changes,
in Cards directory, creating the following files
pythia8_card.dat and pythia_card.dat.
run_mode 1 and all the necessary changes about which I am aware.
One more thing, after submitting the job, pythia_card.dat file is automatically being deleted.

Problem here is that, if I generate the process, p p > W jets,
I am not getting the hepmc file, somehow the showerings are not happening.
If I generate the following event,
generate p p > e+ e-, then in the above mentioned situation it is generating hepmc file.
Can you please help me in this regard?

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi,

First note that this is not one of the recomend way to handle the cluster.
The problem is that you send this to the cluster and then ask MG5aMC to submit job on the cluster.
This is typically forbidden.
If you try to use multi_run with such high value, I would advise to consider to use the gridpack mode.
In that mode, you do not have the direct interface to pythia8 but this is much more optimal for such huge generation.

> Can you please help me in this regard?

Not with that few information. Do you have the log file stating why it did not run pythia8 (or the log of py8 stating why it crashes)
Without those information, they are strictly nothing that I can do

Cheers,

Olivier

> On 9 Aug 2017, at 23:10, Sarif Khan <email address hidden> wrote:
>
> New question #655693 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/655693
>
>
> I am using the following script to submit job in cluster,
>
> #!/bin/bash
> #PBS -l nodes=1:ppn=12
> #PBS -N mg2
> #PBS -q default
> #PBS -j oe
> #PBS -e error
> #PBS -o output
>
> #mpirun -f $PBS_NODEFILE ?np 16 ./a.out
> cd $PBS_O_WORKDIR
> cat $PBS_NODEFILE > pbsnodes
>
> #./bin/madevent multi_run 100 --laststep=parton --cluster --multicore --nb_core = 5
> #./bin/madevent multi_run 200 --laststep=parton --cluster --multicore 24
> ./bin/madevent multi_run 50 -f
> #./bin/madevent multi_run 100
>
> I am also doing the following changes,
> in Cards directory, creating the following files
> pythia8_card.dat and pythia_card.dat.
> run_mode 1 and all the necessary changes about which I am aware.
> One more thing, after submitting the job, pythia_card.dat file is automatically being deleted.
>
> Problem here is that, if I generate the process, p p > W jets,
> I am not getting the hepmc file, somehow the showerings are not happening.
> If I generate the following event,
> generate p p > e+ e-, then in the above mentioned situation it is generating hepmc file.
> Can you please help me in this regard?
>
>
>
>
>
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Sarif Khan (sarif) said :
#2

In log file it is showing as follows,
#************************************************************
#* MadGraph5_aMC@NLO/MadEvent *
#* *
#* * * *
#* * * * * *
#* * * * * 5 * * * * *
#* * * * * *
#* * * *
#* *
#* *
#* VERSION 2.5.5 20xx-xx-xx *
#* *
#* The MadGraph5_aMC@NLO Development Team - Find us at *
#* https://server06.fynu.ucl.ac.be/projects/madgraph *
#* *
#************************************************************
#* *
#* Command File for MadEvent *
#* *
#* run as ./bin/madevent.py filename *
#* *
#************************************************************
multi_run 50 -f
Traceback (most recent call last):
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/extended_cmd.py", line 1430, in onecmd
    return self.onecmd_orig(line, **opt)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/extended_cmd.py", line 1384, in onecmd_orig
    return func(arg, **opt)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/madevent_interface.py", line 2498, in do_multi_run
    self.exec_cmd('generate_events %s_%s -f' % (main_name, i), postcmd=False)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/extended_cmd.py", line 1457, in exec_cmd
    stop = Cmd.onecmd_orig(current_interface, line, **opt)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/extended_cmd.py", line 1384, in onecmd_orig
    return func(arg, **opt)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/madevent_interface.py", line 2054, in do_generate_events
    postcmd=False)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/extended_cmd.py", line 1457, in exec_cmd
    stop = Cmd.onecmd_orig(current_interface, line, **opt)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/extended_cmd.py", line 1384, in onecmd_orig
    return func(arg, **opt)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/madevent_interface.py", line 2889, in do_survey
    jobs, P_zero_result = ajobcreator.launch()
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/gen_ximprove.py", line 182, in launch
    self.submit_to_cluster(job_list)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/gen_ximprove.py", line 222, in submit_to_cluster
    return self.submit_to_cluster_no_splitting(job_list)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/gen_ximprove.py", line 252, in submit_to_cluster_no_splitting
    cwd=pjoin(self.me_dir,'SubProcesses' , Pdir))
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/madevent_interface.py", line 5054, in launch_job
    required_output=required_output, **opt)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/cluster.py", line 206, in cluster_submit
    output_files, required_output, nb_submit)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/cluster.py", line 70, in deco_f_store
    id = f(self, **args)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/cluster.py", line 144, in submit2
    required_output=required_output, nb_submit=nb_submit)
  File "/c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/w4j_mg/bin/internal/misc.py", line 371, in deco_f_retry
    raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
Exception: [Fail 5 times]
 ['qsub', '-o', '/dev/null', '-N', 'a254c89f2cb206', '-e', '/dev/null', '-V', '-q', 'madgraph'] fails with no such file or directory
                              Run Options
                              -----------
               stdout_level : None

                         MadEvent Options
                         ----------------
     automatic_html_opening : False (user set)
        notification_center : True
          cluster_temp_path : None
             cluster_memory : None
               cluster_size : 150 (user set)
              cluster_queue : madgraph (user set)
                    nb_core : 12 (user set)
               cluster_time : None
                   run_mode : 1 (user set)

                      Configuration Options
                      ---------------------
                text_editor : None
         cluster_local_path : None
      cluster_status_update : (600, 30)
               pythia8_path : /c12scratch/sarif/6_aug_madgraph/mg1/MG5_aMC_v2_5_5/HEPTools/pythia8 (user set)
                  hwpp_path : None (user set)
            pythia-pgs_path : None (user set)
                    td_path : None (user set)
               delphes_path : None (user set)
                thepeg_path : None (user set)
               cluster_type : pbs (user set)
          madanalysis5_path : None (user set)
           cluster_nb_retry : 1
                 eps_viewer : None
                web_browser : None
               syscalc_path : None (user set)
           madanalysis_path : None (user set)
                     lhapdf : lhapdf-config
              f2py_compiler : None
                 hepmc_path : None (user set)
         cluster_retry_wait : 300
           fortran_compiler : None
                auto_update : 7 (user set)
        exrootanalysis_path : None (user set)
                    timeout : 60
               cpp_compiler : None
#************************************************************
#* MadGraph5_aMC@NLO *
#* *
#* * * *
#* * * * * *
#* * * * * 5 * * * * *
#* * * * * *
#* * * *
#* *
#* *
#* VERSION 2.5.5 2017-05-26 *
#* *
#* The MadGraph5_aMC@NLO Development Team - Find us at *
#* https://server06.fynu.ucl.ac.be/projects/madgraph *
#* *
#************************************************************
#* *
#* Command File for MadGraph5_aMC@NLO *
#* *
#* run as ./bin/mg5_aMC filename *
#* *
#************************************************************
set loop_color_flows False
set max_npoint_for_channel 0
set group_subprocesses Auto
set ignore_six_quark_processes False
set loop_optimized_output True
set gauge unitary
set complex_mass_scheme False
import model sm
define p = g u c d s u~ c~ d~ s~
define j = g u c d s u~ c~ d~ s~
define l+ = e+ mu+
define l- = e- mu-
define vl = ve vm vt
define vl~ = ve~ vm~ vt~
define ws = w+ w-
generate p p > ws @0
add process p p > ws j @1
add process p p > ws j j @2
add process p p > ws j j j @3
output w4j_mg

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

Hi,

Looks like that you do not have 'qsub' available on the node on which your program is running.
i.e. the node on which you submit your job via qsub does not have qsub.
Since you claim that it sometimes goes trough, my guess is that some node of your pool does have qsub and some other do not.

The only solution is to change your way to handle the cluster. Nothing that i can do for you concerning such problem.

Cheers,

Olivier

Can you help with this problem?

Provide an answer of your own, or ask Sarif Khan for more information if necessary.

To post a message you must log in.