Error on running on Cluster

Asked by teddym on 2018-06-08

Dear MG authors:
   I'm trying to use MG on our Cluster, I've modified me5_configuration.txt file in Cards directory accordingly. And I launch it by

./bin/madevent generate_events

However, I got following error message, which is really confusing: (whether the job is submitted or not?)

================================================

ClusterManagmentError : [Fail 5 times]
  fail to submit to the cluster:
 Your job 7396533 ("df7d541b905ce2") has been submitted

================================================

And following is the log file:

================================================

Traceback (most recent call last):
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/extended_cmd.py", line 1467, in onecmd
    return self.onecmd_orig(line, **opt)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/extended_cmd.py", line 1421, in onecmd_orig
    return func(arg, **opt)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/madevent_interface.py", line 2497, in do_generate_events
    postcmd=False)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/extended_cmd.py", line 1494, in exec_cmd
    stop = Cmd.onecmd_orig(current_interface, line, **opt)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/extended_cmd.py", line 1421, in onecmd_orig
    return func(arg, **opt)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/madevent_interface.py", line 3323, in do_survey
    jobs, P_zero_result = ajobcreator.launch()
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/gen_ximprove.py", line 188, in launch
    self.submit_to_cluster(job_list)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/gen_ximprove.py", line 273, in submit_to_cluster
    return self.submit_to_cluster_no_splitting(job_list)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/gen_ximprove.py", line 303, in submit_to_cluster_no_splitting
    cwd=pjoin(self.me_dir,'SubProcesses' , Pdir))
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/madevent_interface.py", line 5435, in launch_job
    required_output=required_output, **opt)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/cluster.py", line 207, in cluster_submit
    output_files, required_output, nb_submit)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/cluster.py", line 70, in deco_f_store
    id = f(self, **args)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/cluster.py", line 145, in submit2
    required_output=required_output, nb_submit=nb_submit)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/misc.py", line 419, in deco_f_retry
    raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
ClusterManagmentError: [Fail 5 times]
 fail to submit to the cluster:
Your job 7396533 ("df7d541b905ce2") has been submitted

                              Run Options
                              -----------
               stdout_level : None

                         MadEvent Options
                         ----------------
     automatic_html_opening : False (user set)
        notification_center : True
          cluster_temp_path : None
             cluster_memory : None
               cluster_size : 8 (user set)
              cluster_queue : theory.q (user set)
                    nb_core : 4 (user set)
               cluster_time : None
                   run_mode : 1 (user set)

                      Configuration Options
                      ---------------------
                text_editor : None
         cluster_local_path : None
      cluster_status_update : (600, 30)
               pythia8_path : None (user set)
                  hwpp_path : None (user set)
            pythia-pgs_path : None (user set)
                    td_path : None (user set)
               delphes_path : None (user set)
                thepeg_path : None (user set)
               cluster_type : pbs (user set)
          madanalysis5_path : None (user set)
           cluster_nb_retry : 1
                 eps_viewer : None
                web_browser : None
               syscalc_path : None (user set)
           madanalysis_path : None (user set)
                     lhapdf : lhapdf-config
              f2py_compiler : None
                 hepmc_path : None (user set)
         cluster_retry_wait : 300
           fortran_compiler : None
                auto_update : 7 (user set)
        exrootanalysis_path : None (user set)
                    timeout : 60
               cpp_compiler : None

=======================================================

Am I doing anything wrong here? Thanks!

Best!

Question information

Language:
English Edit question
Status:
Expired
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
2018-06-28
Last reply:
2018-07-14

Hi,

It seems that your pbs configuration (or version) does not match the behaviour expected.
In this case the formatting of the line stating that the submission was working does not have
the expected format.

So the only solution is to create your own cluster implementation:
https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/Plugin

But you obviously need to know a bit of python for that.

Cheers,

Olivier

On 8 Jun 2018, at 21:43, teddym <<email address hidden><mailto:<email address hidden>>> wrote:

pbs

teddym (niepanchongsheng) said : #2

Hi Olivier:
    Thanks for your info.
    And I've modified the cluster.py file to recognize the submission return message of the cluster.

   And I can see the jobs using qstat.

  However, when the jobs finishes, the MG is still waiting, saying RUNNING: 2 Finished: 0.
 Where should I look into for this?

Best!

Launchpad Janitor (janitor) said : #3

This question was expired because it remained in the 'Open' state without activity for the last 15 days.