Error on running on Cluster

Asked by teddym on 2018-06-08

Dear MG authors:
   I'm trying to use MG on our Cluster, I've modified me5_configuration.txt file in Cards directory accordingly. And I launch it by

./bin/madevent generate_events

However, I got following error message, which is really confusing: (whether the job is submitted or not?)


ClusterManagmentError : [Fail 5 times]
  fail to submit to the cluster:
 Your job 7396533 ("df7d541b905ce2") has been submitted


And following is the log file:


Traceback (most recent call last):
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 1467, in onecmd
    return self.onecmd_orig(line, **opt)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 1421, in onecmd_orig
    return func(arg, **opt)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 2497, in do_generate_events
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 1494, in exec_cmd
    stop = Cmd.onecmd_orig(current_interface, line, **opt)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 1421, in onecmd_orig
    return func(arg, **opt)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 3323, in do_survey
    jobs, P_zero_result = ajobcreator.launch()
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 188, in launch
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 273, in submit_to_cluster
    return self.submit_to_cluster_no_splitting(job_list)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 303, in submit_to_cluster_no_splitting
    cwd=pjoin(self.me_dir,'SubProcesses' , Pdir))
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 5435, in launch_job
    required_output=required_output, **opt)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 207, in cluster_submit
    output_files, required_output, nb_submit)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 70, in deco_f_store
    id = f(self, **args)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 145, in submit2
    required_output=required_output, nb_submit=nb_submit)
  File "/home/ycwu/Workings/MG5_aMC_v2_6_1/TEDWORK/pp_tt/bin/internal/", line 419, in deco_f_retry
    raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
ClusterManagmentError: [Fail 5 times]
 fail to submit to the cluster:
Your job 7396533 ("df7d541b905ce2") has been submitted

                              Run Options
               stdout_level : None

                         MadEvent Options
     automatic_html_opening : False (user set)
        notification_center : True
          cluster_temp_path : None
             cluster_memory : None
               cluster_size : 8 (user set)
              cluster_queue : theory.q (user set)
                    nb_core : 4 (user set)
               cluster_time : None
                   run_mode : 1 (user set)

                      Configuration Options
                text_editor : None
         cluster_local_path : None
      cluster_status_update : (600, 30)
               pythia8_path : None (user set)
                  hwpp_path : None (user set)
            pythia-pgs_path : None (user set)
                    td_path : None (user set)
               delphes_path : None (user set)
                thepeg_path : None (user set)
               cluster_type : pbs (user set)
          madanalysis5_path : None (user set)
           cluster_nb_retry : 1
                 eps_viewer : None
                web_browser : None
               syscalc_path : None (user set)
           madanalysis_path : None (user set)
                     lhapdf : lhapdf-config
              f2py_compiler : None
                 hepmc_path : None (user set)
         cluster_retry_wait : 300
           fortran_compiler : None
                auto_update : 7 (user set)
        exrootanalysis_path : None (user set)
                    timeout : 60
               cpp_compiler : None


Am I doing anything wrong here? Thanks!


Question information

English Edit question
MadGraph5_aMC@NLO Edit question
No assignee Edit question
Last query:
Last reply:


It seems that your pbs configuration (or version) does not match the behaviour expected.
In this case the formatting of the line stating that the submission was working does not have
the expected format.

So the only solution is to create your own cluster implementation:

But you obviously need to know a bit of python for that.



On 8 Jun 2018, at 21:43, teddym <<email address hidden><mailto:<email address hidden>>> wrote:


teddym (niepanchongsheng) said : #2

Hi Olivier:
    Thanks for your info.
    And I've modified the file to recognize the submission return message of the cluster.

   And I can see the jobs using qstat.

  However, when the jobs finishes, the MG is still waiting, saying RUNNING: 2 Finished: 0.
 Where should I look into for this?


Launchpad Janitor (janitor) said : #3

This question was expired because it remained in the 'Open' state without activity for the last 15 days.