Requirements for gridpack production on an SGE cluster

Asked by johnsurf

Dear MadGraph Team,

           I would like to run a large gridpack creation job on and SGE cluster. After setting up the configuration on the head node which supports python 2.7.3, I submitted the ./bin/generate events with the following options selected in my ./Cards/:

1) In me5_configuration.txt:
web_browser = None
run_mode = 2 (multicore)

2) In run_card.dat
.true. = gridpack

Then ./bin/generate_events runs fine and a gridpack is created.

However, if I want to run on the cluster itself, then I made these modifications to me5_configuration.txt:
web_browser = None
run_mode = 1
cluster_type = sge
cluster_queue = all.q (name of the only queue on this sge cluster)

Now the job fails and I'd like to understand the problem. What are the run requirements on the cluster nodes? What version of python is required on the cluster nodes? What configuration changes should I try?

HERE IS THE SUMMARY OF MY ATTEMPT TO RUN IN BATCH MODE ON THE CLUSTER:

This time ./bin/generate_events fails to end properly and the following message appears on the terminal:

>0
Generating gridpack with run name run_03
survey run_03 --accuracy=0.01 --points=2000 --iterations=8 --gridpack=.true.
compile directory
Using random number seed offset = 33
Running Survey
Creating Jobs
Working on SubProcesses
    P0_gq_zq_z_epem
    P0_gq_zq_z_mupmum
    P0_gc_zc_z_epem
    P0_gc_zc_z_mupmum
    P0_qq_zg_z_epem
    P0_qq_zg_z_mupmum
    P0_ccx_zg_z_epem
    P0_ccx_zg_z_mupmum
    P0_qq_z_z_epem
    P0_qq_z_z_mupmum
    P0_ccx_z_z_epem
    P0_ccx_z_z_mupmum
 Idle: 12 Running: 0 Finish: 0

INFO: All jobs finished
Start waiting for update on filesystem. (more info in debug mode)

Command "generate_events " interrupted in sub-command:
"generate_events" with error:
IOError : [Errno 2] No such file or directory: '/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/SubProcesses/P0_gq_zq_z_epem/G1/results.dat'
Please report this bug on https://bugs.launchpad.net/madgraph5
More information is found in '/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/run_03_tag_1_debug.log'.
Please attach this file to your report.
quit

$ ** Message: plugin_get_value 1 (1)
** Message: plugin_get_value 2 (2)
** Message: plugin_get_value 1 (1)
** Message: plugin_get_value 2 (2)
** Message: plugin_get_value 1 (1)
** Message: plugin_get_value 2 (2)
** Message: plugin_get_value 1 (1)
** Message: plugin_get_value 2 (2)
** Message: plugin_get_value 1 (1)
** Message: plugin_get_value 2 (2)

HERE IS THE more run_03_tag_1_debug.log file:

#************************************************************
#* MadGraph/MadEvent 5 *
#* *
#* * * *
#* * * * * *
#* * * * * 5 * * * * *
#* * * * * *
#* * * *
#* *
#* *
#* VERSION 5.1.5.9 *
#* *
#* The MadGraph Development Team - Please visit us at *
#* https://server06.fynu.ucl.ac.be/projects/madgraph *
#* *
#************************************************************
#* *
#* Command File for MadEvent *
#* *
#* run as ./bin/madevent.py filename *
#* *
#************************************************************
generate_events
Traceback (most recent call last):
  File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/extended_cmd.py", line 819, in onecmd
    return self.onecmd_orig(line, **opt)
  File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/extended_cmd.py", line 812, in onecmd_o
rig
    return func(arg, **opt)
  File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/madevent_interface.py", line 2227, in d
o_generate_events
    postcmd=False)
  File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/extended_cmd.py", line 859, in exec_cmd
    stop = Cmd.onecmd_orig(current_interface, line, **opt)
  File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/extended_cmd.py", line 812, in onecmd_o
rig
    return func(arg, **opt)
  File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/madevent_interface.py", line 2628, in d
o_survey
    cross, error = sum_html.make_all_html_results(self)
  File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/sum_html.py", line 384, in make_all_htm
l_results
    P_comb.add_results(name, pjoin(P_path,name,'results.dat'), mfactor)
  File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/sum_html.py", line 124, in add_results
    oneresult.read_results(filepath)
  File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/misc.py", line 155, in deco_f_retry
    return f(*args, **opt)
  File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/sum_html.py", line 56, in read_results
    for line in open(filepath):
IOError: [Errno 2] No such file or directory: '/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/SubProcesses/
P0_gq_zq_z_epem/G1/results.dat'
                              Run Options
                              -----------
               stdout_level : None

                         MadEvent Options
                         ----------------
     automatic_html_opening : False (user set)
          cluster_temp_path : None
              cluster_queue : all.q (user set)
                    nb_core : 8 (user set)
                   run_mode : 1 (user set)

                      Configuration Options
                      ---------------------
                web_browser : None
                text_editor : None
           madanalysis_path : None (user set)
               pythia8_path : None (user set)
            pythia-pgs_path : None (user set)
                    td_path : None (user set)
               delphes_path : None (user set)
                auto_update : 7 (user set)
               cluster_type : sge (user set)
           fortran_compiler : None (user set)
        exrootanalysis_path : None (user set)
                 eps_viewer : None
                    timeout : 60

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Pierre Artoisenet (partois) said :
#1

Dear John,

Thanks for your question.
Olivier Mattelaer is away for a week and he aksed me to inform
the mg5 users that he will answer mg5 questions on next Sunday.

Thanks for your patience,

Pierre

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#2

Hi John,

Python are not required on any node.
But the executables compiled on the submitting cpu should be able to run on any of the node.

Looks like that your cluster is not well supported by the SGE class that we have in MG5.
Supporting any cluster is roughly impossible since each cluster are configured in a different way
and require some different imformation (cputime, mem requirement,…).

You are in a situation, where I can't really help you directly since I really need to have a connection to your cluster
and have a discussion with your IT team (our read their instructions on how to use the cluster) in order to understand why
the submission of the job is failing.

If you look at this FAQ:
https://answers.launchpad.net/madgraph5/+faq/2249
you will have all the instruction on how to be able to add the support of any type of cluster in MG5, this will help to
understand how the submission is done for SGE cluster and what is not compatible with your specific cluster.

I'm anyway available for any additional question that you might have.

Cheers,

Olivier

On Apr 16, 2013, at 2:26 PM, johnsurf <email address hidden> wrote:

> New question #226831 on MadGraph5:
> https://answers.launchpad.net/madgraph5/+question/226831
>
> Dear MadGraph Team,
>
> I would like to run a large gridpack creation job on and SGE cluster. After setting up the configuration on the head node which supports python 2.7.3, I submitted the ./bin/generate events with the following options selected in my ./Cards/:
>
> 1) In me5_configuration.txt:
> web_browser = None
> run_mode = 2 (multicore)
>
> 2) In run_card.dat
> .true. = gridpack
>
> Then ./bin/generate_events runs fine and a gridpack is created.
>
> However, if I want to run on the cluster itself, then I made these modifications to me5_configuration.txt:
> web_browser = None
> run_mode = 1
> cluster_type = sge
> cluster_queue = all.q (name of the only queue on this sge cluster)
>
> Now the job fails and I'd like to understand the problem. What are the run requirements on the cluster nodes? What version of python is required on the cluster nodes? What configuration changes should I try?
>
> HERE IS THE SUMMARY OF MY ATTEMPT TO RUN IN BATCH MODE ON THE CLUSTER:
>
> This time ./bin/generate_events fails to end properly and the following message appears on the terminal:
>
>> 0
> Generating gridpack with run name run_03
> survey run_03 --accuracy=0.01 --points=2000 --iterations=8 --gridpack=.true.
> compile directory
> Using random number seed offset = 33
> Running Survey
> Creating Jobs
> Working on SubProcesses
> P0_gq_zq_z_epem
> P0_gq_zq_z_mupmum
> P0_gc_zc_z_epem
> P0_gc_zc_z_mupmum
> P0_qq_zg_z_epem
> P0_qq_zg_z_mupmum
> P0_ccx_zg_z_epem
> P0_ccx_zg_z_mupmum
> P0_qq_z_z_epem
> P0_qq_z_z_mupmum
> P0_ccx_z_z_epem
> P0_ccx_z_z_mupmum
> Idle: 12 Running: 0 Finish: 0
>
> INFO: All jobs finished
> Start waiting for update on filesystem. (more info in debug mode)
>
> Command "generate_events " interrupted in sub-command:
> "generate_events" with error:
> IOError : [Errno 2] No such file or directory: '/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/SubProcesses/P0_gq_zq_z_epem/G1/results.dat'
> Please report this bug on https://bugs.launchpad.net/madgraph5
> More information is found in '/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/run_03_tag_1_debug.log'.
> Please attach this file to your report.
> quit
>
> $ ** Message: plugin_get_value 1 (1)
> ** Message: plugin_get_value 2 (2)
> ** Message: plugin_get_value 1 (1)
> ** Message: plugin_get_value 2 (2)
> ** Message: plugin_get_value 1 (1)
> ** Message: plugin_get_value 2 (2)
> ** Message: plugin_get_value 1 (1)
> ** Message: plugin_get_value 2 (2)
> ** Message: plugin_get_value 1 (1)
> ** Message: plugin_get_value 2 (2)
>
> HERE IS THE more run_03_tag_1_debug.log file:
>
> #************************************************************
> #* MadGraph/MadEvent 5 *
> #* *
> #* * * *
> #* * * * * *
> #* * * * * 5 * * * * *
> #* * * * * *
> #* * * *
> #* *
> #* *
> #* VERSION 5.1.5.9 *
> #* *
> #* The MadGraph Development Team - Please visit us at *
> #* https://server06.fynu.ucl.ac.be/projects/madgraph *
> #* *
> #************************************************************
> #* *
> #* Command File for MadEvent *
> #* *
> #* run as ./bin/madevent.py filename *
> #* *
> #************************************************************
> generate_events
> Traceback (most recent call last):
> File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/extended_cmd.py", line 819, in onecmd
> return self.onecmd_orig(line, **opt)
> File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/extended_cmd.py", line 812, in onecmd_o
> rig
> return func(arg, **opt)
> File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/madevent_interface.py", line 2227, in d
> o_generate_events
> postcmd=False)
> File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/extended_cmd.py", line 859, in exec_cmd
> stop = Cmd.onecmd_orig(current_interface, line, **opt)
> File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/extended_cmd.py", line 812, in onecmd_o
> rig
> return func(arg, **opt)
> File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/madevent_interface.py", line 2628, in d
> o_survey
> cross, error = sum_html.make_all_html_results(self)
> File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/sum_html.py", line 384, in make_all_htm
> l_results
> P_comb.add_results(name, pjoin(P_path,name,'results.dat'), mfactor)
> File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/sum_html.py", line 124, in add_results
> oneresult.read_results(filepath)
> File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/misc.py", line 155, in deco_f_retry
> return f(*args, **opt)
> File "/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/bin/internal/sum_html.py", line 56, in read_results
> for line in open(filepath):
> IOError: [Errno 2] No such file or directory: '/storage/5/home/john.smith/NewMadGraph/MadGraph5_v1_5_9/Z0/SubProcesses/
> P0_gq_zq_z_epem/G1/results.dat'
> Run Options
> -----------
> stdout_level : None
>
> MadEvent Options
> ----------------
> automatic_html_opening : False (user set)
> cluster_temp_path : None
> cluster_queue : all.q (user set)
> nb_core : 8 (user set)
> run_mode : 1 (user set)
>
> Configuration Options
> ---------------------
> web_browser : None
> text_editor : None
> madanalysis_path : None (user set)
> pythia8_path : None (user set)
> pythia-pgs_path : None (user set)
> td_path : None (user set)
> delphes_path : None (user set)
> auto_update : 7 (user set)
> cluster_type : sge (user set)
> fortran_compiler : None (user set)
> exrootanalysis_path : None (user set)
> eps_viewer : None
> timeout : 60
>
> --
> You received this question notification because you are a member of
> MadTeam, which is an answer contact for MadGraph5.

Can you help with this problem?

Provide an answer of your own, or ask johnsurf for more information if necessary.

To post a message you must log in.