MG in cluster mode crashes for high number of events

Asked by Alex Feike

Hello,

I am currently trying to run MG on a cluster. Everything works fine for a low number of events (nevents=500) but as I am going higher (nevents=50k) MG crashes. It also gives the warning ‘cluster.get_job_identifier runs unexpectedly’ just before it crashes. I’ll attach the log file below. My cluster settings in the mg5_configuration.txt are:
run_mode=1
cluster_type = slurm
cluster_queue = normal
cluster_size = 200

and I use the following process:
generate p p > t t~
launch
set pdlabel lhapdf
set lhaid 27000
set ebeam1 6800
set ebeam2 6800
set nevents 500 (or 50000)

I tried to run the exact same process without the cluster settings, so I commented out the mg5_configuration.txt lines given above and then everything runs fine. Do you have any idea why I run into trouble and how this might be solved?

Thanks and best,
Alex

generate_events 50000 Traceback (most recent call last): File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/extended_cmd.py", line 1544, in onecmd return self.onecmd_orig(line, opt) File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/extended_cmd.py", line 1493, in onecmd_orig return func(arg, opt) File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/madevent_interface.py", line 2404, in do_generate_events self.run_generate_events(switch_mode, args) File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/common_run_interface.py", line 7630, in new_fct original_fct(obj, args, *opts) File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/madevent_interface.py", line 2643, in run_generate_events postcmd=False, printcmd=False) File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/extended_cmd.py", line 1573, in exec_cmd stop = Cmd.onecmd_orig(current_interface, line, opt) File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/extended_cmd.py", line 1493, in onecmd_orig return func(arg, opt) File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/common_run_interface.py", line 1924, in do_systematics stdout='/dev/null' File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/various/cluster.py", line 212, in cluster_submit output_files, required_output, nb_submit) File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/various/cluster.py", line 75, in deco_f_store id = f(self, **args) File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/various/cluster.py", line 150, in submit2 required_output=required_output, nb_submit=nb_submit) File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/various/misc.py", line 436, in deco_f_retry raise error.class('[Fail %i times] \n %s ' % (i+1, error)) UnboundLocalError: local variable 'error' referenced before assignment

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi,

Could you attach the full debug file? Looks like you do have issue with the support of the systematics computation when that one is running on the cluster (which might not happens for small number of events).

Now it seems that the debug message is not working as expected due to a bug introduced during the python3 transition that you were the first one to report.

One simple workaround might be to run the (post-processing) systematics.py script by hand, rather than integrated within MG5aMC. This will likely avoid the issue.

Cheers,

Olivier

Revision history for this message
Alex Feike (afeike) said :
#2

Hi Olivier,
thanks for the fast answer. Attached you'll find the full log file.
Best,
Alex

#************************************************************
#* MadGraph5_aMC@NLO/MadEvent *
#* *
#* * * *
#* * * * * *
#* * * * * 5 * * * * *
#* * * * * *
#* * * *
#* *
#* *
#* VERSION 3.4.1 2022-09-01 *
#* *
#* The MadGraph5_aMC@NLO Development Team - Find us at *
#* https://server06.fynu.ucl.ac.be/projects/madgraph *
#* *
#************************************************************
#* *
#* Command File for MadEvent *
#* *
#* run as ./bin/madevent.py filename *
#* *
#************************************************************
generate_events 50000
Traceback (most recent call last):
  File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/extended_cmd.py", line 1544, in onecmd
    return self.onecmd_orig(line, **opt)
  File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/extended_cmd.py", line 1493, in onecmd_orig
    return func(arg, **opt)
  File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/madevent_interface.py", line 2404, in do_generate_events
    self.run_generate_events(switch_mode, args)
  File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/common_run_interface.py", line 7630, in new_fct
    original_fct(obj, *args, **opts)
  File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/madevent_interface.py", line 2643, in run_generate_events
    postcmd=False, printcmd=False)
  File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/extended_cmd.py", line 1573, in exec_cmd
    stop = Cmd.onecmd_orig(current_interface, line, **opt)
  File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/extended_cmd.py", line 1493, in onecmd_orig
    return func(arg, **opt)
  File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/interface/common_run_interface.py", line 1924, in do_systematics
    stdout='/dev/null'
  File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/various/cluster.py", line 212, in cluster_submit
    output_files, required_output, nb_submit)
  File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/various/cluster.py", line 75, in deco_f_store
    id = f(self, **args)
  File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/various/cluster.py", line 150, in submit2
    required_output=required_output, nb_submit=nb_submit)
  File "/home/a/a_feik02/MG5_aMC_v3_4_1/madgraph/various/misc.py", line 436, in deco_f_retry
    raise error.__class__('[Fail %i times] \n %s ' % (i+1, error))
UnboundLocalError: local variable 'error' referenced before assignment
                              Run Options
                              -----------
               stdout_level : 20 (user set)

                         MadEvent Options
                         ----------------
     automatic_html_opening : False (user set)
        notification_center : True
                   run_mode : 1 (user set)
              cluster_queue : normal (user set)
               cluster_time : normal (user set)
               cluster_size : 200 (user set)
             cluster_memory : 200 (user set)
                    nb_core : 40 (user set)
          cluster_temp_path : None

                      Configuration Options
                      ---------------------
               pythia8_path : /home/a/a_feik02/MG5_aMC_v3_4_1/HEPTools/pythia8 (user set)
                  hwpp_path : None (user set)
                thepeg_path : None (user set)
                 hepmc_path : None (user set)
           madanalysis_path : None (user set)
          madanalysis5_path : /home/a/a_feik02/MG5_aMC_v3_4_1/HEPTools/madanalysis5/madanalysis5 (user set)
            pythia-pgs_path : None (user set)
                    td_path : None (user set)
               delphes_path : None (user set)
        exrootanalysis_path : None (user set)
               syscalc_path : None (user set)
                 rivet_path : None
                  yoda_path : None
                     lhapdf : /home/a/a_feik02/LHAPDF-6.5.3/bin/lhapdf-config (user set)
                 lhapdf_py2 : None
                 lhapdf_py3 : /home/a/a_feik02/LHAPDF-6.5.3/bin/lhapdf-config (user set)
                    timeout : 60
              f2py_compiler : None
          f2py_compiler_py2 : None
          f2py_compiler_py3 : None
                web_browser : None
                 eps_viewer : None
                text_editor : None
           fortran_compiler : None
               cpp_compiler : None
                auto_update : 7 (user set)
               cluster_type : slurm (user set)
      cluster_status_update : (600, 30)
           cluster_nb_retry : 1
         cluster_local_path : None
         cluster_retry_wait : 300
#************************************************************
#* MadGraph5_aMC@NLO *
#* *
#* * * *
#* * * * * *
#* * * * * 5 * * * * *
#* * * * * *
#* * * *
#* *
#* *
#* VERSION 3.4.1 2022-09-01 *
#* *
#* The MadGraph5_aMC@NLO Development Team - Find us at *
#* https://server06.fynu.ucl.ac.be/projects/madgraph *
#* *
#************************************************************
#* *
#* Command File for MadGraph5_aMC@NLO *
#* *
#* run as ./bin/mg5_aMC filename *
#* *
#************************************************************
set group_subprocesses Auto
set ignore_six_quark_processes False
set low_mem_multicore_nlo_generation False
set complex_mass_scheme False
set include_lepton_initiated_processes False
set gauge unitary
set loop_optimized_output True
set loop_color_flows False
set max_npoint_for_channel 0
set default_unset_couplings 99
set max_t_for_channel 99
set zerowidth_tchannel True
set nlo_mixed_expansion True
generate p p > t t~
define p = g u c d s u~ c~ d~ s~
define j = g u c d s u~ c~ d~ s~
define l+ = e+ mu+
define l- = e- mu-
define vl = ve vm vt
define vl~ = ve~ vm~ vt~
output bug
######################################################################
## PARAM_CARD AUTOMATICALY GENERATED BY MG5 ####
######################################################################
###################################
## INFORMATION FOR MASS
###################################
BLOCK MASS #
      5 4.700000e+00 # mb
      6 1.730000e+02 # mt
      15 1.777000e+00 # mta
      23 9.118800e+01 # mz
      25 1.250000e+02 # mh
      1 0.000000e+00 # d : 0.0
      2 0.000000e+00 # u : 0.0
      3 0.000000e+00 # s : 0.0
      4 0.000000e+00 # c : 0.0
      11 0.000000e+00 # e- : 0.0
      12 0.000000e+00 # ve : 0.0
      13 0.000000e+00 # mu- : 0.0
      14 0.000000e+00 # vm : 0.0
      16 0.000000e+00 # vt : 0.0
      21 0.000000e+00 # g : 0.0
      22 0.000000e+00 # a : 0.0
      24 8.041900e+01 # w+ : cmath.sqrt(mz__exp__2/2. + cmath.sqrt(mz__exp__4/4. - (aew*cmath.pi*mz__exp__2)/(gf*sqrt__2)))
###################################
## INFORMATION FOR SMINPUTS
###################################
BLOCK SMINPUTS #
      1 1.325070e+02 # aewm1
      2 1.166390e-05 # gf
      3 1.300009e-01 # as (note that parameter not used if you use a pdf set)
###################################
## INFORMATION FOR YUKAWA
###################################
BLOCK YUKAWA #
      5 4.700000e+00 # ymb
      6 1.730000e+02 # ymt
      15 1.777000e+00 # ymtau
###################################
## INFORMATION FOR DECAY
###################################
DECAY 6 1.491500e+00 # wt
DECAY 23 2.441404e+00 # wz
DECAY 24 2.047600e+00 # ww
DECAY 25 6.382339e-03 # wh
DECAY 1 0.000000e+00 # d : 0.0
DECAY 2 0.000000e+00 # u : 0.0
DECAY 3 0.000000e+00 # s : 0.0
DECAY 4 0.000000e+00 # c : 0.0
DECAY 5 0.000000e+00 # b : 0.0
DECAY 11 0.000000e+00 # e- : 0.0
DECAY 12 0.000000e+00 # ve : 0.0
DECAY 13 0.000000e+00 # mu- : 0.0
DECAY 14 0.000000e+00 # vm : 0.0
DECAY 15 0.000000e+00 # ta- : 0.0
DECAY 16 0.000000e+00 # vt : 0.0
DECAY 21 0.000000e+00 # g : 0.0
DECAY 22 0.000000e+00 # a : 0.0

#*********************************************************************
# MadGraph5_aMC@NLO *
# *
# run_card.dat MadEvent *
# *
# This file is used to set the parameters of the run. *
# *
# Some notation/conventions: *
# *
# Lines starting with a '# ' are info or comments *
# *
# mind the format: value = variable ! comment *
# *
# To display more options, you can type the command: *
# update to_full *
#*********************************************************************
#
#*********************************************************************
# Tag name for the run (one word) *
#*********************************************************************
  tag_1 = run_tag ! name of the run
#*********************************************************************
# Number of events and rnd seed *
# Warning: Do not generate more than 1M events in a single run *
#*********************************************************************
  50000 = nevents ! Number of unweighted events requested
  0 = iseed ! rnd seed (0=assigned automatically=default))
#*********************************************************************
# Collider type and energy *
# lpp: 0=No PDF, 1=proton, -1=antiproton, 2=elastic photon of proton,*
# +/-3=PDF of electron/positron beam *
# +/-4=PDF of muon/antimuon beam *
#*********************************************************************
  1 = lpp1 ! beam 1 type
  1 = lpp2 ! beam 2 type
  6800.0 = ebeam1 ! beam 1 total energy in GeV
  6800.0 = ebeam2 ! beam 2 total energy in GeV
# To see polarised beam options: type "update beam_pol"

#*********************************************************************
# PDF CHOICE: this automatically fixes alpha_s and its evol. *
# pdlabel: lhapdf=LHAPDF (installation needed) [1412.7420] *
# iww=Improved Weizsaecker-Williams Approx.[hep-ph/9310350] *
# eva=Effective W/Z/A Approx. [21yy.zzzzz] *
# none=No PDF, same as lhapdf with lppx=0 *
#*********************************************************************
  lhapdf = pdlabel ! PDF set
  27000 = lhaid ! if pdlabel=lhapdf, this is the lhapdf number
# To see heavy ion options: type "update ion_pdf"
#*********************************************************************
# Renormalization and factorization scales *
#*********************************************************************
  False = fixed_ren_scale ! if .true. use fixed ren scale
  False = fixed_fac_scale ! if .true. use fixed fac scale
  91.188 = scale ! fixed ren scale
  91.188 = dsqrt_q2fact1 ! fixed fact scale for pdf1
  91.188 = dsqrt_q2fact2 ! fixed fact scale for pdf2
  -1 = dynamical_scale_choice ! Choose one of the preselected dynamical choices
  1.0 = scalefact ! scale factor for event-by-event scales

#*********************************************************************
# Type and output format
#*********************************************************************
  False = gridpack !True = setting up the grid pack
  -1.0 = time_of_flight ! threshold (in mm) below which the invariant livetime is not written (-1 means not written)
  average = event_norm ! average/sum. Normalization of the weight in the LHEF
# To see MLM/CKKW merging options: type "update MLM" or "update CKKW"

#*********************************************************************
#
#*********************************************************************
# Phase-Space Optimization strategy (basic options)
#*********************************************************************
  0 = nhel ! using helicities importance sampling or not.
                             ! 0: sum over helicity, 1: importance sampling
  1 = sde_strategy ! default integration strategy (hep-ph/2021.00773)
                             ! 1 is old strategy (using amp square)
        ! 2 is new strategy (using only the denominator)
# To see advanced option for Phase-Space optimization: type "update psoptim"
#*********************************************************************
# Generation bias, check the wiki page below for more information: *
# 'cp3.irmp.ucl.ac.be/projects/madgraph/wiki/LOEventGenerationBias' *
#*********************************************************************
  None = bias_module ! Bias type of bias, [None, ptj_bias, -custom_folder-]
  {} = bias_parameters ! Specifies the parameters of the module.
#
#*******************************
# Parton level cuts definition *
#*******************************
  0.0 = dsqrt_shat ! minimal shat for full process
#
#
#*********************************************************************
# BW cutoff (M+/-bwcutoff*Gamma) ! Define on/off-shell for "$" and decay
#*********************************************************************
  15.0 = bwcutoff ! (M+/-bwcutoff*Gamma)
#*********************************************************************
# Standard Cuts *
#*********************************************************************
# Minimum and maximum pt's (for max, -1 means no cut) *
#*********************************************************************
  {} = pt_min_pdg ! pt cut for other particles (use pdg code). Applied on particle and anti-particle
  {} = pt_max_pdg ! pt cut for other particles (syntax e.g. {6: 100, 25: 50})
#
# For display option for energy cut in the partonic center of mass frame type 'update ecut'
#
#*********************************************************************
# Maximum and minimum absolute rapidity (for max, -1 means no cut) *
#*********************************************************************
  {} = eta_min_pdg ! rap cut for other particles (use pdg code). Applied on particle and anti-particle
  {} = eta_max_pdg ! rap cut for other particles (syntax e.g. {6: 2.5, 23: 5})
#*********************************************************************
# Minimum and maximum DeltaR distance *
#*********************************************************************
#*********************************************************************
# Minimum and maximum invariant mass for pairs *
#*********************************************************************
  {} = mxx_min_pdg ! min invariant mass of a pair of particles X/X~ (e.g. {6:250})
  {'default': False} = mxx_only_part_antipart ! if True the invariant mass is applied only
                       ! to pairs of particle/antiparticle and not to pairs of the same pdg codes.
#*********************************************************************
# Inclusive cuts *
#*********************************************************************
  0.0 = ptheavy ! minimum pt for at least one heavy final state
#*********************************************************************
# maximal pdg code for quark to be considered as a light jet *
# (otherwise b cuts are applied) *
#*********************************************************************
  4 = maxjetflavor ! Maximum jet pdg code
#*********************************************************************
#
#*********************************************************************
# Store info for systematics studies *
# WARNING: Do not use for interference type of computation *
#*********************************************************************
  True = use_syst ! Enable systematics studies
#
  systematics = systematics_program ! none, systematics [python], SysCalc [depreceted, C++]
  ['--mur=0.5,1,2', '--muf=0.5,1,2', '--pdf=errorset'] = systematics_arguments ! see: https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/Systematics#Systematicspythonmodule

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

For future reference,

here is the debug information that I see on a slurm cluster where I succeed to reproduce the issue:

Start waiting for update. (more info in debug mode)
fail to do <function SLURMCluster.submit at 0x7ff8dc1e03b0> function with <madgraph.various.cluster.SLURMCluster object at 0x7ff8dc040d90>, /opt/sw/arch/easybuild/2019b/software/Python/3.7.4-GCCcore-8.3.0/bin/python3, ['/home/users/o/m/omatt/mg5amcnlo/PROC_SMEFTatNLO-NLO_cdp_0/bin/internal/systematics.py', 'unweighted_events.lhe.gz', './tmp_0_unweighted_events.lhe.gz', '--mur=0.5,1,2', '--muf=0.5,1,2', '--pdf=errorset', '--start_event=0', '--stop_event=25000', '--result=./log_sys_0.txt', '--lhapdf_config=/home/users/o/m/omatt/mg5amcnlo/HEPTools/lhapdf6_py3/bin/lhapdf-config'], /home/users/o/m/omatt/mg5amcnlo/PROC_SMEFTatNLO-NLO_cdp_0/Events/run_01, /dev/null, None, None args. 1 try on a max of 5 (20 waiting time)
error is fail to submit to the cluster:
stdout: b'sbatch: error: This does not look like a batch script. The first\nsbatch: error: line must start with #! followed by the path to an interpreter.\nsbatch: error: For instance: #!/bin/sh\n'
stderr None
and occurred at :Traceback (most recent call last):
  File "/home/users/o/m/omatt/mg5amcnlo/madgraph/various/misc.py", line 420, in deco_f_retry
    return f(*args, **opt)
  File "/home/users/o/m/omatt/mg5amcnlo/madgraph/various/cluster.py", line 1720, in submit
    % ('stdout: %s\nstderr %s' %(output[0],output[1])))
madgraph.various.cluster.ClusterManagmentError: fail to submit to the cluster:
stdout: b'sbatch: error: This does not look like a batch script. The first\nsbatch: error: line must start with #! followed by the path to an interpreter.\nsbatch: error: For instance: #!/bin/sh\n'
stderr None

Revision history for this message
Best Olivier Mattelaer (olivier-mattelaer) said :
#4

Here is the patch: https://github.com/mg5amcnlo/mg5amcnlo/commit/2d5fc5e9bdad69aa1379018e3442297fc2065546
Thanks a lot for the notice.

Cheers,

Olivier

Revision history for this message
Alex Feike (afeike) said :
#5

Thanks Olivier Mattelaer, that solved my question.