Calculating cross section on condor

Asked by Iram Haque

Dear Madgraph team,

I am using Madgraph to calculate the cross section p p > iota0 > h eta0 [noborn=QCD] and p p > h eta0 [noborn=QCD] using the TRSM model (https://gitlab.com/apapaefs/twosinglet) where iota0, and eta0 are additional Higgs boson in TRSM.

My setup is as follows, for each set of model parameters I want to calculate a cross section a job is submitted to HTCondor. I have 18 sets of model parameters I want to calculate cross sections, however, here comes the problem, for some of the model parameters Madgraph crashes and furthermore, the sets of model parameters which crash are not consistent, sometimes a few sets crash and another times a few more. Below I have attached an example of a crash. Occasionally, I also get a different error message, but I think it is better if I begin with the error message I most commonly receive:

#************************************************************
#* MadGraph5_aMC@NLO/MadEvent *
#* *
#* * * *
#* * * * * *
#* * * * * 5 * * * * *
#* * * * * *
#* * * *
#* *
#* *
#* VERSION 3.5.3 2023-12-23 *
#* *
#* The MadGraph5_aMC@NLO Development Team - Find us at *
#* https://server06.fynu.ucl.ac.be/projects/madgraph *
#* *
#************************************************************
#* *
#* Command File for MadEvent *
#* *
#* run as ./bin/madevent.py filename *
#* *
#************************************************************
generate_events run_01
Traceback (most recent call last):
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/madevent/sum_html.py", line 307, in read_results
    self.axsec, self.xerru, self.xerrc, self.nevents, self.nw,\
ValueError: not enough values to unpack (expected 10, got 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/interface/extended_cmd.py", line 1546, in onecmd
    return self.onecmd_orig(line, **opt)
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/interface/extended_cmd.py", line 1495, in onecmd_orig
    return func(arg, **opt)
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/interface/madevent_interface.py", line 2403, in do_generate_events
    self.run_generate_events(switch_mode, args)
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/interface/common_run_interface.py", line 7701, in new_fct
    original_fct(obj, *args, **opts)
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/interface/madevent_interface.py", line 2601, in run_generate_events
    self.exec_cmd('refine %s' % nb_event, postcmd=False)
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/interface/extended_cmd.py", line 1575, in exec_cmd
    stop = Cmd.onecmd_orig(current_interface, line, **opt)
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/interface/extended_cmd.py", line 1495, in onecmd_orig
    return func(arg, **opt)
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/interface/madevent_interface.py", line 3638, in do_refine
    self.monitor(run_type='All job submitted for refine number %s' % self.nb_refine,
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/interface/madevent_interface.py", line 5764, in monitor
    self.cluster.wait(self.me_dir, update_status, update_first=update_first)
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/various/cluster.py", line 846, in wait
    six.reraise(self.fail_msg[0], self.fail_msg[1], self.fail_msg[2])
  File "/cvmfs/sft.cern.ch/lcg/views/LCG_104c_ATLAS_5/x86_64-el9-gcc13-opt/lib/python3.9/site-packages/six.py", line 719, in reraise
    raise value
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/various/cluster.py", line 674, in worker
    returncode = exe(*arg, **opt)
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/madevent/gen_ximprove.py", line 1643, in combine_iteration
    grid_calculator, cross, error = self.combine_grid(Pdir, G, step)
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/madevent/gen_ximprove.py", line 704, in combine_grid
    one_result = grid_calculator.add_results_information(fsock)
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/madevent/combine_grid.py", line 161, in add_results_information
    return self.results.add_results(fname,finput)
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/madevent/sum_html.py", line 434, in add_results
    oneresult.read_results(filepath)
  File "/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/madgraph5amc/3.5.3.atlas4-8cb97/x86_64-el9-gcc13-opt/madgraph/madevent/sum_html.py", line 311, in read_results
    log = pjoin(os.path.dirname(filepath), 'log.txt')
  File "/cvmfs/sft.cern.ch/lcg/releases/Python/3.9.12-9a1bc/x86_64-el9-gcc13-opt/lib/python3.9/posixpath.py", line 152, in dirname
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not TextIOWrapper
                              Run Options
                              -----------
               stdout_level : 10 (user set)

                         MadEvent Options
                         ----------------
     automatic_html_opening : False (user set)
        notification_center : True
                   run_mode : 2
              cluster_queue : None (user set)
               cluster_time : None (user set)
               cluster_size : 100
             cluster_memory : 100 (user set)
                    nb_core : 64 (user set)
          cluster_temp_path : None

                      Configuration Options
                      ---------------------
               pythia8_path : None (user set)
                  hwpp_path : None (user set)
                thepeg_path : None (user set)
                 hepmc_path : None (user set)
           madanalysis_path : None (user set)
          madanalysis5_path : None (user set)
            pythia-pgs_path : None (user set)
                    td_path : None (user set)
               delphes_path : None (user set)
        exrootanalysis_path : None (user set)
               syscalc_path : /cvmfs/sft.cern.ch/lcg/releases/MCGenerators/syscalc/1.1.7-5d583/x86_64-el9-gcc13-opt (user set)
                 rivet_path : None
                  yoda_path : None
                     lhapdf : /cvmfs/sft.cern.ch/lcg/releases/MCGenerators/lhapdf/6.5.3-3fa11/x86_64-el9-gcc13-opt/bin/lhapdf-config (user set)
                 lhapdf_py2 : /cvmfs/sft.cern.ch/lcg/releases/MCGenerators/lhapdf/6.5.3-3fa11/x86_64-el9-gcc13-opt/bin/lhapdf-config (user set)
                 lhapdf_py3 : /cvmfs/sft.cern.ch/lcg/releases/MCGenerators/lhapdf/6.5.3-3fa11/x86_64-el9-gcc13-opt/bin/lhapdf-config (user set)
                    timeout : 60
              f2py_compiler : None
          f2py_compiler_py2 : None
          f2py_compiler_py3 : None
                web_browser : None
                 eps_viewer : None
                text_editor : None
           fortran_compiler : None
               cpp_compiler : None
                auto_update : 0 (user set)
               cluster_type : condor
      cluster_status_update : (600, 30)
           cluster_nb_retry : 1
         cluster_local_path : None
         cluster_retry_wait : 300
#************************************************************
#* MadGraph5_aMC@NLO *
#* *
#* * * *
#* * * * * *
#* * * * * 5 * * * * *
#* * * * * *
#* * * *
#* *
#* *
#* VERSION 3.5.3 2023-12-23 *
#* *
#* The MadGraph5_aMC@NLO Development Team - Find us at *
#* https://server06.fynu.ucl.ac.be/projects/madgraph *
#* *
#************************************************************
#* *
#* Command File for MadGraph5_aMC@NLO *
#* *
#* run as ./bin/mg5_aMC filename *
#* *
#************************************************************
set group_subprocesses Auto
set ignore_six_quark_processes False
set low_mem_multicore_nlo_generation False
set complex_mass_scheme False
set include_lepton_initiated_processes False
set gauge unitary
set loop_optimized_output True
set loop_color_flows False
set max_npoint_for_channel 0
set default_unset_couplings 99
set max_t_for_channel 99
set zerowidth_tchannel True
set nlo_mixed_expansion True
set stdout_level DEBUG
convert model /eos/user/i/ihaque/MadgraphResonVsNonReson/MadgraphReson\
VsNonReson/X170_S30/twosinglet-master/loop_sm_twoscalar
import model /eos/home-i/ihaque/MadgraphResonVsNonReson/MadgraphResonV\
sNonReson/X170_S30/twosinglet-master/loop_sm_twoscalar
define l+ = e+ mu+
define l- = e- mu-
define vl = ve vm vt
define vl~ = ve~ vm~ vt~
define p = g u c d s u~ c~ d~ s~
define j = g u c d s u~ c~ d~ s~
generate p p > iota0 > eta0 h [noborn=QCD]
output /eos/user/i/ihaque/MadgraphResonVsNonReson/MadgraphResonVsNonRe\
son/X170_S30/nevents10000_ATLAS4/outputMadgraph_nevents10000_ATLAS4_pp\
_iota0_eta0h_0
######################################################################
## PARAM_CARD AUTOMATICALY GENERATED BY MG5 ####
######################################################################
###################################
## INFORMATION FOR BSM
###################################
BLOCK BSM #
      6 1.352000e+00 # a12 (theta_hs)
      7 1.175000e+00 # a13 (theta_hx)
      8 -4.070000e-01 # a23 (theta_sx)
      9 9.761594e-01 # stheta1
      10 9.226898e-01 # stheta2
      11 -3.958562e-01 # stheta3
      215 2.170548e-01 # ctheta1
      216 3.855431e-01 # ctheta2
      217 9.183125e-01 # ctheta3
      212 2.460000e+02 # v
      213 1.200000e+02 # vs
      214 8.900000e+02 # vx
      12 8.368398e-02 # r11
      13 9.756992e-01 # r21
      14 -2.025044e-01 # r31
      15 -5.960099e-01 # kap111
      16 -1.029799e-01 # kap112
      17 3.576466e+00 # kap122
      18 2.931906e+01 # kap222
      19 2.201481e-03 # kap1111
      20 6.157933e-03 # kap1112
      21 1.007775e-02 # kap1122
      22 1.448803e-02 # kap1222
      23 3.081989e-02 # kap2222
      24 -8.014084e+01 # kap133
      25 -1.143242e+01 # kap113
      26 -2.606138e+01 # kap123
      27 -9.142208e+01 # kap333
      28 -3.334769e+01 # kap233
      29 -2.896805e+01 # kap223
      30 2.988305e-02 # kap1113
      31 1.454584e-01 # kap1133
      32 2.546192e-01 # kap1333
      33 1.470536e-01 # kap3333
      34 -1.564697e-02 # kap2223
      35 9.056002e-02 # kap2333
      36 4.985816e-02 # kap2233
      37 1.315392e-01 # kap1233
      38 3.279717e-02 # kap1223
      39 5.346905e-02 # kap1123
###################################
## INFORMATION FOR LOOP
###################################
BLOCK LOOP #
      1 9.118800e+01 # mu_r
###################################
## INFORMATION FOR MASS
###################################
BLOCK MASS #
      5 4.700000e+00 # mb
      6 1.730000e+02 # mt
      15 1.777000e+00 # mta
      23 9.118800e+01 # mz
      25 3.000000e+01 # mh
      99925 1.250900e+02 # meta
      99926 1.700000e+02 # miota
      1 0.000000e+00 # d : 0.0
      2 0.000000e+00 # u : 0.0
      3 0.000000e+00 # s : 0.0
      4 0.000000e+00 # c : 0.0
      11 0.000000e+00 # e- : 0.0
      12 0.000000e+00 # ve : 0.0
      13 0.000000e+00 # mu- : 0.0
      14 0.000000e+00 # vm : 0.0
      16 0.000000e+00 # vt : 0.0
      21 0.000000e+00 # g : 0.0
      22 0.000000e+00 # a : 0.0
      24 8.041900e+01 # w+ : cmath.sqrt(mz__exp__2/2. + cmath.sqrt(mz__exp__4/4. - (aew*cmath.pi*mz__exp__2)/(gf*sqrt__2)))
###################################
## INFORMATION FOR SMINPUTS
###################################
BLOCK SMINPUTS #
      1 1.325070e+02 # aewm1
      2 1.166390e-05 # gf
      3 1.300000e-01 # as
###################################
## INFORMATION FOR YUKAWA
###################################
BLOCK YUKAWA #
      5 4.700000e+00 # ymb
      6 1.730000e+02 # ymt
      15 1.777000e+00 # ymtau
###################################
## INFORMATION FOR DECAY
###################################
DECAY 6 1.491500e+00 # wt
DECAY 23 2.441404e+00 # wz
DECAY 24 2.047600e+00 # ww
DECAY 25 6.211668e-06 # wh
DECAY 99925 3.887075e-03 # weta
DECAY 99926 7.036750e-02 # wiota
DECAY 1 0.000000e+00 # d : 0.0
DECAY 2 0.000000e+00 # u : 0.0
DECAY 3 0.000000e+00 # s : 0.0
DECAY 4 0.000000e+00 # c : 0.0
DECAY 5 0.000000e+00 # b : 0.0
DECAY 11 0.000000e+00 # e- : 0.0
DECAY 12 0.000000e+00 # ve : 0.0
DECAY 13 0.000000e+00 # mu- : 0.0
DECAY 14 0.000000e+00 # vm : 0.0
DECAY 15 0.000000e+00 # ta- : 0.0
DECAY 16 0.000000e+00 # vt : 0.0
DECAY 21 0.000000e+00 # g : 0.0
DECAY 22 0.000000e+00 # a : 0.0
###################################
## INFORMATION FOR QNUMBERS 82
###################################
BLOCK QNUMBERS 82 # gh
      1 0 # 3 times electric charge
      2 1 # number of spin states (2s+1)
      3 8 # colour rep (1: singlet, 3: triplet, 8: octet)
      4 1 # particle/antiparticle distinction (0=own anti)
###################################
## INFORMATION FOR QNUMBERS 99925
###################################
BLOCK QNUMBERS 99925 # eta0
      1 0 # 3 times electric charge
      2 1 # number of spin states (2s+1)
      3 1 # colour rep (1: singlet, 3: triplet, 8: octet)
      4 0 # particle/antiparticle distinction (0=own anti)
###################################
## INFORMATION FOR QNUMBERS 99926
###################################
BLOCK QNUMBERS 99926 # iota0
      1 0 # 3 times electric charge
      2 1 # number of spin states (2s+1)
      3 1 # colour rep (1: singlet, 3: triplet, 8: octet)
      4 0 # particle/antiparticle distinction (0=own anti)

#*********************************************************************
# MadGraph5_aMC@NLO *
# *
# run_card.dat MadEvent *
# *
# This file is used to set the parameters of the run. *
# *
# Some notation/conventions: *
# *
# Lines starting with a '# ' are info or comments *
# *
# mind the format: value = variable ! comment *
# *
# To display more options, you can type the command: *
# update to_full *
#*********************************************************************
#
#*********************************************************************
# Tag name for the run (one word) *
#*********************************************************************
  tag_1 = run_tag ! name of the run
#*********************************************************************
# Number of events and rnd seed *
# Warning: Do not generate more than 1M events in a single run *
#*********************************************************************
  10000 = nevents ! Number of unweighted events requested
  0 = iseed ! rnd seed (0=assigned automatically=default))
#*********************************************************************
# Collider type and energy *
# lpp: 0=No PDF, 1=proton, -1=antiproton, *
# 2=elastic photon of proton/ion beam *
# +/-3=PDF of electron/positron beam *
# +/-4=PDF of muon/antimuon beam *
#*********************************************************************
  1 = lpp1 ! beam 1 type
  1 = lpp2 ! beam 2 type
  6500.0 = ebeam1 ! beam 1 total energy in GeV
  6500.0 = ebeam2 ! beam 2 total energy in GeV
# To see polarised beam options: type "update beam_pol"

#*********************************************************************
# PDF CHOICE: this automatically fixes alpha_s and its evol. *
# pdlabel: lhapdf=LHAPDF (installation needed) [1412.7420] *
# iww=Improved Weizsaecker-Williams Approx.[hep-ph/9310350] *
# eva=Effective W/Z/A Approx. [2111.02442] *
# edff=EDFF in gamma-UPC [eq.(11) in 2207.03012] *
# chff=ChFF in gamma-UPC [eq.(13) in 2207.03012] *
# none=No PDF, same as lhapdf with lppx=0 *
#*********************************************************************
  nn23lo1 = pdlabel ! PDF set
  230000 = lhaid ! if pdlabel=lhapdf, this is the lhapdf number
# To see heavy ion options: type "update ion_pdf"
#*********************************************************************
# Renormalization and factorization scales *
#*********************************************************************
  False = fixed_ren_scale ! if .true. use fixed ren scale
  False = fixed_fac_scale ! if .true. use fixed fac scale
  91.188 = scale ! fixed ren scale
  91.188 = dsqrt_q2fact1 ! fixed fact scale for pdf1
  91.188 = dsqrt_q2fact2 ! fixed fact scale for pdf2
  -1 = dynamical_scale_choice ! Choose one of the preselected dynamical choices
  1.0 = scalefact ! scale factor for event-by-event scales

#*********************************************************************
# Type and output format
#*********************************************************************
  False = gridpack !True = setting up the grid pack
  -1.0 = time_of_flight ! threshold (in mm) below which the invariant livetime is not written (-1 means not written)
  average = event_norm ! average/sum. Normalization of the weight in the LHEF
# To see MLM/CKKW merging options: type "update MLM" or "update CKKW"

#*********************************************************************
#
#*********************************************************************
# Phase-Space Optimization strategy (basic options)
#*********************************************************************
  1 = nhel ! using helicities importance sampling or not.
                             ! 0: sum over helicity, 1: importance sampling
  2 = sde_strategy ! default integration strategy (hep-ph/2021.00773)
                             ! 1 is old strategy (using amp square)
                             ! 2 is new strategy (using only the denominator)
# To see advanced option for Phase-Space optimization: type "update psoptim"
#*********************************************************************
# Customization (custom cuts/scale/bias/...) *
# list of files containing fortran function that overwrite default *
#*********************************************************************
        = custom_fcts ! List of files containing user hook function
#*******************************
# Parton level cuts definition *
#*******************************
  0.0 = dsqrt_shat ! minimal shat for full process
#
#
#*********************************************************************
# BW cutoff (M+/-bwcutoff*Gamma) ! Define on/off-shell for "$" and decay
#*********************************************************************
  15.0 = bwcutoff ! (M+/-bwcutoff*Gamma)
#*********************************************************************
# Standard Cuts *
#*********************************************************************
# Minimum and maximum pt's (for max, -1 means no cut) *
#*********************************************************************
  {} = pt_min_pdg ! pt cut for other particles (use pdg code). Applied on particle and anti-particle
  {} = pt_max_pdg ! pt cut for other particles (syntax e.g. {6: 100, 25: 50})
#
# For display option for energy cut in the partonic center of mass frame type 'update ecut'
#
#*********************************************************************
# Maximum and minimum absolute rapidity (for max, -1 means no cut) *
#*********************************************************************
  {} = eta_min_pdg ! rap cut for other particles (use pdg code). Applied on particle and anti-particle
  {} = eta_max_pdg ! rap cut for other particles (syntax e.g. {6: 2.5, 23: 5})
#*********************************************************************
# Minimum and maximum DeltaR distance *
#*********************************************************************
#*********************************************************************
# Minimum and maximum invariant mass for pairs *
#*********************************************************************
  {} = mxx_min_pdg ! min invariant mass of a pair of particles X/X~ (e.g. {6:250})
  {'default': False} = mxx_only_part_antipart ! if True the invariant mass is applied only
                       ! to pairs of particle/antiparticle and not to pairs of the same pdg codes.
#*********************************************************************
# Inclusive cuts *
#*********************************************************************
  0.0 = ptheavy ! minimum pt for at least one heavy final state
#*********************************************************************
# maximal pdg code for quark to be considered as a light jet *
# (otherwise b cuts are applied) *
#*********************************************************************
  4 = maxjetflavor ! Maximum jet pdg code
#*********************************************************************
#
#*********************************************************************
# Store info for systematics studies *
# WARNING: Do not use for interference type of computation *
#*********************************************************************
  True = use_syst ! Enable systematics studies
#
  systematics = systematics_program ! none, systematics [python], SysCalc [depreceted, C++]
  ['--mur=0.5,1,2', '--muf=0.5,1,2', '--pdf=errorset'] = systematics_arguments ! see: https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/Systematics#Systematicspythonmodule

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

So from the madgraph side, you run on a single machine with 64 core and the handling of condor is done on your side, correct?
In that case, it is important that each point of your scan is run in their own directory (or use a condor option to prevent two jobs running at the same time).

The issue is unfortunately not clear, since the Fortran code crash. Python detects it, but it fails when trying to include the log file with the explanation of the bug (maybe due to cvmfs).

Is it correct that the crash occurs in the "refine" stage? If yes, this will explain the missing log since indeed in general we do not keep such log file since the crash related to the physics does not normally happen at such stage. In that case, the issue might be related to a condor limitation like number of RAM available/... which can be especially enhanced if you did not request 64 thread for slurm when asking MG to run that many thread. This would also be consistent with the randomness of your crash.

So do you have the line
request_cpus = 64
in your slurm_script?

Cheers,

Olivier

Revision history for this message
Iram Haque (i-haque) said (last edit ):
#2

Dear Olivier,

Thank you very much for your help. I tried your suggestion, however, it did not work.

I then tried to change the job output directory to our AFS from EOS (not sure if you are familiar with this kind of setup) and now I got no errors at all. I will try a couple of more runs to make sure this is the way to go.

I will come back to you if I get any more errors (I will mark this as solved once I have done a couple of more runs).

In the mean time, thank you for taking your time taking a look at my issue!

Can you help with this problem?

Provide an answer of your own, or ask Iram Haque for more information if necessary.

To post a message you must log in.