Declaring ressources on a GE cluster

Asked by Nicolas Deutschmann

Hello,

I am trying to run MG5.1.12 on a GE cluster (CC IN2P3) and I have difficulties declaring the ressources I want to use for my run. More precisely, I want to have my process folder in a location where submitted jobs can write only if the ressource "sps=1" is declared at submission.

To attempt to do so, I modified what I thought was the qsub call for the GE cluster in my PROCESS/bin/internal/cluster.py in the following way:

At line 822 you find

a = misc.Popen(['qsub','-o', stdout,
                                     '-e', stderr,
                                     tmp_submit],
                                     stdout=subprocess.PIPE,
                                     stderr=subprocess.STDOUT,
                                     stdin=subprocess.PIPE, cwd=cwd)

Which I replaced by

a = misc.Popen(['qsub -l sps=1','-o', stdout,
                                     '-e', stderr,
                                     tmp_submit],
                                     stdout=subprocess.PIPE,
                                     stderr=subprocess.STDOUT,
                                     stdin=subprocess.PIPE, cwd=cwd)

This is obviously not the correct way to do it, since when running generate_events, I get the following error

Working on SubProcesses
    P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb
Start waiting for update on filesystem. (more info in debug mode)
Command "generate_events " interrupted in sub-command:
"generate_events" with error:
Exception : ['qsub -l sps=1', '-o', '/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/log.ajob1', '-e', '/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/err.ajob1', '/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/tmp_submit'] fails with no such file or directory
Please report this bug on https://bugs.launchpad.net/madgraph5

And the debug file reads

Traceback (most recent call last):
  File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/extended_cmd.py", line 819, in onecmd
    return self.onecmd_orig(line, **opt)
  File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/extended_cmd.py", line 812, in onecmd_orig
    return func(arg, **opt)
  File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/madevent_interface.py", line 2300, in do_generate_events
    postcmd=False)
  File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/extended_cmd.py", line 859, in exec_cmd
    stop = Cmd.onecmd_orig(current_interface, line, **opt)
  File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/extended_cmd.py", line 812, in onecmd_orig
    return func(arg, **opt)
  File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/madevent_interface.py", line 2679, in do_survey
    run_type='survey on %s (%s/%s)' % (subdir,nb_proc+1,len(subproc)))
  File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/madevent_interface.py", line 3781, in launch_job
    input_files=input_files, output_files=output_files)
  File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/cluster.py", line 75, in submit2
    return self.submit(prog, argument, cwd, stdout, stderr, log)
  File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/misc.py", line 155, in deco_f_retry
    return f(*args, **opt)
  File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/cluster.py", line 827, in submit
    stdin=subprocess.PIPE, cwd=cwd)
  File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/misc.py", line 322, in deco_f
    % arg
Exception: ['qsub -l sps=1', '-o', '/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/log.ajob1', '-e', '/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/err.ajob1', '/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/tmp_submit'] fails with no such file or directory
                              Run Options
                              -----------
               stdout_level : None

                         MadEvent Options
                         ----------------
     automatic_html_opening : False (user set)
          cluster_temp_path : None
              cluster_queue : madgraph
                    nb_core : 8 (user set)
                   run_mode : 1 (user set)

                      Configuration Options
                      ---------------------
                web_browser : None
                text_editor : None
           madanalysis_path : None (user set)
               pythia8_path : None (user set)
            pythia-pgs_path : None (user set)
                    td_path : None (user set)
               delphes_path : None (user set)
                auto_update : 7 (user set)
               cluster_type : ge (user set)
           fortran_compiler : None (user set)
        exrootanalysis_path : None (user set)
                 eps_viewer : None
                    timeout : 60

So how should I declare this ressource ?

For the sake of completeness, when I run MG out of the box, the jobs get submitted but get an Error status in GE and the associated message is

can't stat() "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/log.ajob1" as stdout_path: Permission denied

Thanks in advance for your help
Nicolas

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
Last query:
Last reply:
Revision history for this message
Best Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi Nicolas,

I have no experience in that cluster. (Johan implement it for his cluster, but he is not in HEP anymore)

From your implementation, I would have done:
> a = misc.Popen(['qsub', '-l', 'sps=1','-o', stdout,
> '-e', stderr,
> tmp_submit],
> stdout=subprocess.PIPE,
> stderr=subprocess.STDOUT,
> stdin=subprocess.PIPE, cwd=cwd)

Since space are not always allowed (for security reason).

But I would rather suggest to use the "no share disk system".
The problem is clearly linked to a missing filesystem (or something like that).

if you define the value cluster_temp_path (in the configuration file)
all the required file, will be copy to the node and then run locally and then push back to the main file-system.

The default (which is use for GE) is to use the cp command to do that.
But it's probably better to use the built-in GE module in your case.
For doing that, you need to define the routine submit2.
this is done for example in the condor cluster.

Cheers,

olivier

On Nov 5, 2013, at 12:11 PM, Nicolas Deutschmann <email address hidden> wrote:

> New question #238680 on MadGraph5:
> https://answers.launchpad.net/madgraph5/+question/238680
>
> Hello,
>
> I am trying to run MG5.1.12 on a GE cluster (CC IN2P3) and I have difficulties declaring the ressources I want to use for my run. More precisely, I want to have my process folder in a location where submitted jobs can write only if the ressource "sps=1" is declared at submission.
>
> To attempt to do so, I modified what I thought was the qsub call for the GE cluster in my PROCESS/bin/internal/cluster.py in the following way:
>
> At line 822 you find
>
>
>
> a = misc.Popen(['qsub','-o', stdout,
> '-e', stderr,
> tmp_submit],
> stdout=subprocess.PIPE,
> stderr=subprocess.STDOUT,
> stdin=subprocess.PIPE, cwd=cwd)
>
>
>
> Which I replaced by
>
>
>
> a = misc.Popen(['qsub -l sps=1','-o', stdout,
> '-e', stderr,
> tmp_submit],
> stdout=subprocess.PIPE,
> stderr=subprocess.STDOUT,
> stdin=subprocess.PIPE, cwd=cwd)
>
>
>
> This is obviously not the correct way to do it, since when running generate_events, I get the following error
>
>
>
> Working on SubProcesses
> P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb
> Start waiting for update on filesystem. (more info in debug mode)
> Command "generate_events " interrupted in sub-command:
> "generate_events" with error:
> Exception : ['qsub -l sps=1', '-o', '/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/log.ajob1', '-e', '/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/err.ajob1', '/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/tmp_submit'] fails with no such file or directory
> Please report this bug on https://bugs.launchpad.net/madgraph5
>
>
>
> And the debug file reads
>
>
>
> Traceback (most recent call last):
> File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/extended_cmd.py", line 819, in onecmd
> return self.onecmd_orig(line, **opt)
> File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/extended_cmd.py", line 812, in onecmd_orig
> return func(arg, **opt)
> File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/madevent_interface.py", line 2300, in do_generate_events
> postcmd=False)
> File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/extended_cmd.py", line 859, in exec_cmd
> stop = Cmd.onecmd_orig(current_interface, line, **opt)
> File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/extended_cmd.py", line 812, in onecmd_orig
> return func(arg, **opt)
> File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/madevent_interface.py", line 2679, in do_survey
> run_type='survey on %s (%s/%s)' % (subdir,nb_proc+1,len(subproc)))
> File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/madevent_interface.py", line 3781, in launch_job
> input_files=input_files, output_files=output_files)
> File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/cluster.py", line 75, in submit2
> return self.submit(prog, argument, cwd, stdout, stderr, log)
> File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/misc.py", line 155, in deco_f_retry
> return f(*args, **opt)
> File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/cluster.py", line 827, in submit
> stdin=subprocess.PIPE, cwd=cwd)
> File "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/bin/internal/misc.py", line 322, in deco_f
> % arg
> Exception: ['qsub -l sps=1', '-o', '/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/log.ajob1', '-e', '/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/err.ajob1', '/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/tmp_submit'] fails with no such file or directory
> Run Options
> -----------
> stdout_level : None
>
> MadEvent Options
> ----------------
> automatic_html_opening : False (user set)
> cluster_temp_path : None
> cluster_queue : madgraph
> nb_core : 8 (user set)
> run_mode : 1 (user set)
>
> Configuration Options
> ---------------------
> web_browser : None
> text_editor : None
> madanalysis_path : None (user set)
> pythia8_path : None (user set)
> pythia-pgs_path : None (user set)
> td_path : None (user set)
> delphes_path : None (user set)
> auto_update : 7 (user set)
> cluster_type : ge (user set)
> fortran_compiler : None (user set)
> exrootanalysis_path : None (user set)
> eps_viewer : None
> timeout : 60
>
>
>
> So how should I declare this ressource ?
>
> For the sake of completeness, when I run MG out of the box, the jobs get submitted but get an Error status in GE and the associated message is
>
>
>
> can't stat() "/sps/hep/lyon/ndeutsch/TPZP_3SSL_14TEV/SubProcesses/P0_gg_tptpx_tp_tttx_t_epvlb_t_epvlb_tpx_ttxtx_t_epvlb/log.ajob1" as stdout_path: Permission denied
>
>
>
> Thanks in advance for your help
> Nicolas
>
>
>
> --
> You received this question notification because you are a member of
> MadTeam, which is an answer contact for MadGraph5.

Revision history for this message
Nicolas Deutschmann (ndeutschmann) said :
#2

Hi Olivier,

Thanks for the answer. It seems that your first suggestion (using Popen correctly) did the trick.

Cheers,
Nicolas.