MG 2.6.5 Reweight gets killed

Asked by Giacomo

Dear Madgraph experts,

i'm trying to generate a computationally expansive process namely VBS production of WV pair with semileptonic final state with EFT contribution from 14 operators from SMEFTsim [1].
As the number of diagrams quickly diverges with the addition of EFT operators, i decided to run the codegen and integrate steps sampling from SM phase space and then changing process at reweight step to include all the operators.
My process definition is the following one for the first two steps ($ t t~ was added just for testing purposes)

import model SMEFTsim_U35_MwScheme_UFO-cW_cHWB_cHDD_cHbox_cHW_cHl1_cHl3_cHq1_cHq3_cqq1_cqq11_cqq31_cqq3_cll1_massless_mod
define p = u u~ d d~ s s~ c c~ b b~
define j = p
generate p p > e- ve~ j j j j $ t t~ NP=0 SMHLOOP=0 QCD=0
output emVjj_ewk_dim6 -nojpeg

Meanwhile at reweight i load the following card (just the first lines, as all the reweight points present the same syntax)
change helicity False
change rwgt_dir rwgt
change process p p > e- ve~ j j j j $ t t~ NP=1 SMHLOOP=0 QCD=0

# SM rwgt_1
launch
   set SMEFT 2 0
   set SMEFT 7 0
   set SMEFT 9 0
   set SMEFT 4 0

Both codegen and integrate run smoothly however the compilation gets killed while computing the second matrix element probably due to a system overload:

INFO: Idle: 0, Running: 1, Completed: 1699 [ 27h 18m ]
INFO: Idle: 0, Running: 1, Completed: 1699 [ 27h 23m ]
INFO: Idle: 0, Running: 1, Completed: 1699 [ 27h 28m ]
INFO: Idle: 0, Running: 1, Completed: 1699 [ 27h 33m ]
INFO: Idle: 0, Running: 1, Completed: 1699 [ 27h 38m ]
INFO: All jobs finished
INFO: Idle: 0, Running: 0, Completed: 1700 [ 27h 43m ]
  === Results Summary for run: pilotrun tag: tag_1 ===

     Cross-section : 0.09049 +- 0.0001506 pb
     Nb of events : 0

INFO: End survey
[...]
Do you want to edit a card (press enter to bypass editing)?
/------------------------------------------------------------\
| 1. reweight : reweight_card.dat |
\------------------------------------------------------------/
 you can also
   - enter the path to a valid card.
 [0, done, 1, reweight, enter path][90s to answer]
>INFO: Extracting the banner ...
INFO: process: p p > e- ve~ j j j j
INFO: options: $ t t~ NP=0 SMHLOOP=0 QCD=0
INFO: Running Reweighting
change helicity False
change rwgt_dir rwgt
change process p p > e- ve~ j j j j $ t t~ NP=1 SMHLOOP=0 QCD=0
launch
INFO: detected model: SMEFTsim_U35_MwScheme_UFO-cW_cHWB_cHDD_cHbox_cHW_cHl1_cHl3_cHq1_cHq3_cqq1_cqq11_cqq31_cqq3_cll1_massless_mod. Loading...
INFO: generating the square matrix element for reweighting
INFO: generate p p > e- ve~ j j j j $ t t~ NP=0 SMHLOOP=0 QCD=0 ;
INFO: Done 2912
INFO: generating the square matrix element for reweighting (second model and/or processes)
INFO: generate p p > e- ve~ j j j j $ t t~ NP=1 SMHLOOP=0 QCD=0 ;
/local-scratch/gboldrin/gp3/bin/MadGraph5_aMCatNLO/Utilities/gridpack_helpers.sh: line 76: 466127 Done echo "0"
     466128 Killed | ./bin/madevent --debug reweight pilotrun

Is this behaviour expected? Do you have any solution to run the reweight step in a less expansive way, maybe trading with computation time?

Thank you, i'm at disposal if additional infos are needed.
Best

-----------------------------------------------------
[1]
http://gboldrin.web.cern.ch/gboldrin/generators/SMEFTsim_U35_MwScheme_UFO.tar.gz

[2]

RUN CARD

#*********************************************************************
# MadGraph5_aMC@NLO *
# *
# run_card.dat MadEvent *
# *
# This file is used to set the parameters of the run. *
# *
# Some notation/conventions: *
# *
# Lines starting with a '# ' are info or comments *
# *
# mind the format: value = variable ! comment *
#*********************************************************************
#
#*******************
# Running parameters
#*******************
#
#*********************************************************************
# Tag name for the run (one word) *
#*********************************************************************
  tag_1 = run_tag ! name of the run
#*********************************************************************
# Run to generate the grid pack *
#*********************************************************************
  .false. = gridpack !True = setting up the grid pack
#*********************************************************************
# Number of events and rnd seed *
# Warning: Do not generate more than 1M events in a single run *
# If you want to run Pythia, avoid more than 50k events in a run. *
#*********************************************************************
  1000 = nevents ! Number of unweighted events requested
      0 = iseed ! rnd seed (0=assigned automatically=default))
#*********************************************************************
# Collider type and energy *
# lpp: 0=No PDF, 1=proton, -1=antiproton, 2=photon from proton, *
# 3=photon from electron *
#*********************************************************************
        1 = lpp1 ! beam 1 type
        1 = lpp2 ! beam 2 type
     6500 = ebeam1 ! beam 1 total energy in GeV
     6500 = ebeam2 ! beam 2 total energy in GeV
#*********************************************************************
# Beam polarization from -100 (left-handed) to 100 (right-handed) *
#*********************************************************************
        0 = polbeam1 ! beam polarization for beam 1
        0 = polbeam2 ! beam polarization for beam 2
#*********************************************************************
# PDF CHOICE: this automatically fixes also alpha_s and its evol. *
#*********************************************************************
 'lhapdf' = pdlabel ! PDF set
 $DEFAULT_PDF_SETS = lhaid
 $DEFAULT_PDF_MEMBERS = reweight_PDF
#*********************************************************************
# Renormalization and factorization scales *
#*********************************************************************
 F = fixed_ren_scale ! if .true. use fixed ren scale
 F = fixed_fac_scale ! if .true. use fixed fac scale
 91.1880 = scale ! fixed ren scale
 91.1880 = dsqrt_q2fact1 ! fixed fact scale for pdf1
 91.1880 = dsqrt_q2fact2 ! fixed fact scale for pdf2
 1 = scalefact ! scale factor for event-by-event scales
#*********************************************************************
# Matching - Warning! ickkw > 1 is still beta
#*********************************************************************
 0 = ickkw ! 0 no matching, 1 MLM, 2 CKKW matching
 1 = highestmult ! for ickkw=2, highest mult group
 1 = ktscheme ! for ickkw=1, 1 Durham kT, 2 Pythia pTE
 1 = alpsfact ! scale factor for QCD emission vx
 F = chcluster ! cluster only according to channel diag
 F = pdfwgt ! for ickkw=1, perform pdf reweighting
 5 = asrwgtflavor ! highest quark flavor for a_s reweight
 T = clusinfo ! include clustering tag in output
 3.0 = lhe_version ! Change the way clustering information pass to shower.
#*********************************************************************
#**********************************************************
#
#**********************************************************
# Automatic ptj and mjj cuts if xqcut > 0
# (turn off for VBF and single top processes)
#**********************************************************
   F = auto_ptj_mjj ! Automatic setting of ptj and mjj
#**********************************************************
#
#**********************************
# BW cutoff (M+/-bwcutoff*Gamma)
#**********************************
  15 = bwcutoff ! (M+/-bwcutoff*Gamma)
#**********************************************************
# Apply pt/E/eta/dr/mij cuts on decay products or not
# (note that etmiss/ptll/ptheavy/ht/sorted cuts always apply)
#**********************************************************
   T = cut_decays ! Cut decay products
#*************************************************************
# Number of helicities to sum per event (0 = all helicities)
# 0 gives more stable result, but longer run time (needed for
# long decay chains e.g.).
# Use >=2 if most helicities contribute, e.g. pure QCD.
#*************************************************************
   0 = nhel ! Number of helicities used per event
#*******************
# Standard Cuts
#*******************
#
#*********************************************************************
# Minimum and maximum pt's (for max, -1 means no cut) *
#*********************************************************************
 10 = ptj ! minimum pt for the jets
  0 = ptb ! minimum pt for the b
  0 = pta ! minimum pt for the photons
  5 = ptl ! minimum pt for the charged leptons
  0 = misset ! minimum missing Et (sum of neutrino's momenta)
  0 = ptheavy ! minimum pt for one heavy final state
 1.0 = ptonium ! minimum pt for the quarkonium states
 -1 = ptjmax ! maximum pt for the jets
 -1 = ptbmax ! maximum pt for the b
 -1 = ptamax ! maximum pt for the photons
 -1 = ptlmax ! maximum pt for the charged leptons
 -1 = missetmax ! maximum missing Et (sum of neutrino's momenta)
#*********************************************************************
# Minimum and maximum E's (in the center of mass frame) *
#*********************************************************************
  0 = ej ! minimum E for the jets
  0 = eb ! minimum E for the b
  0 = ea ! minimum E for the photons
  0 = el ! minimum E for the charged leptons
 -1 = ejmax ! maximum E for the jets
 -1 = ebmax ! maximum E for the b
 -1 = eamax ! maximum E for the photons
 -1 = elmax ! maximum E for the charged leptons
#*********************************************************************
# Maximum and minimum absolute rapidity (for max, -1 means no cut) *
#*********************************************************************
 6.5 = etaj ! max rap for the jets
 -1 = etab ! max rap for the b
 2.5 = etaa ! max rap for the photons
 -1 = etal ! max rap for the charged leptons
 0.6 = etaonium ! max rap for the quarkonium states
   0 = etajmin ! min rap for the jets
   0 = etabmin ! min rap for the b
   0 = etaamin ! min rap for the photons
   0 = etalmin ! main rap for the charged leptons
#*********************************************************************
# Minimum and maximum DeltaR distance *
#*********************************************************************
 0.01 = drjj ! min distance between jets
 0.01 = drbb ! min distance between b's
 0.01 = drll ! min distance between leptons
 0.01 = draa ! min distance between gammas
 0.01 = drbj ! min distance between b and jet
 0.01 = draj ! min distance between gamma and jet
 0.4 = drjl ! min distance between jet and lepton
 0.01 = drab ! min distance between gamma and b
 0.4 = drbl ! min distance between b and lepton
 0.01 = dral ! min distance between gamma and lepton
 -1 = drjjmax ! max distance between jets
 -1 = drbbmax ! max distance between b's
 -1 = drllmax ! max distance between leptons
 -1 = draamax ! max distance between gammas
 -1 = drbjmax ! max distance between b and jet
 -1 = drajmax ! max distance between gamma and jet
 -1 = drjlmax ! max distance between jet and lepton
 -1 = drabmax ! max distance between gamma and b
 -1 = drblmax ! max distance between b and lepton
 -1 = dralmax ! maxdistance between gamma and lepton
#*********************************************************************
# Minimum and maximum invariant mass for pairs *
# WARNING: for four lepton final state mmll cut require to have *
# different lepton masses for each flavor! *
#*********************************************************************
 40 = mmjj ! min invariant mass of a jet pair
 40 = mmbb ! min invariant mass of a b pair
 0 = mmaa ! min invariant mass of gamma gamma pair
 0 = mmll ! min invariant mass of l+l- (same flavour) lepton pair
 -1 = mmjjmax ! max invariant mass of a jet pair
 -1 = mmbbmax ! max invariant mass of a b pair
 -1 = mmaamax ! max invariant mass of gamma gamma pair
 -1 = mmllmax ! max invariant mass of l+l- (same flavour) lepton pair
#*********************************************************************
# Minimum and maximum invariant mass for all letpons *
#*********************************************************************
  0 = mmnl ! min invariant mass for all letpons (l+- and vl)
 -1 = mmnlmax ! max invariant mass for all letpons (l+- and vl)
#*********************************************************************
# Minimum and maximum pt for 4-momenta sum of leptons *
#*********************************************************************
 0 = ptllmin ! Minimum pt for 4-momenta sum of leptons(l and vl)
 -1 = ptllmax ! Maximum pt for 4-momenta sum of leptons(l and vl)
#*********************************************************************
# Inclusive cuts *
#*********************************************************************
 0 = xptj ! minimum pt for at least one jet
 0 = xptb ! minimum pt for at least one b
 0 = xpta ! minimum pt for at least one photon
 0 = xptl ! minimum pt for at least one charged lepton
#*********************************************************************
# Control the pt's of the jets sorted by pt *
#*********************************************************************
 0 = ptj1min ! minimum pt for the leading jet in pt
 0 = ptj2min ! minimum pt for the second jet in pt
 0 = ptj3min ! minimum pt for the third jet in pt
 0 = ptj4min ! minimum pt for the fourth jet in pt
 -1 = ptj1max ! maximum pt for the leading jet in pt
 -1 = ptj2max ! maximum pt for the second jet in pt
 -1 = ptj3max ! maximum pt for the third jet in pt
 -1 = ptj4max ! maximum pt for the fourth jet in pt
 0 = cutuse ! reject event if fails any (0) / all (1) jet pt cuts
#*********************************************************************
# Control the pt's of leptons sorted by pt *
#*********************************************************************
 0 = ptl1min ! minimum pt for the leading lepton in pt
 0 = ptl2min ! minimum pt for the second lepton in pt
 0 = ptl3min ! minimum pt for the third lepton in pt
 0 = ptl4min ! minimum pt for the fourth lepton in pt
 -1 = ptl1max ! maximum pt for the leading lepton in pt
 -1 = ptl2max ! maximum pt for the second lepton in pt
 -1 = ptl3max ! maximum pt for the third lepton in pt
 -1 = ptl4max ! maximum pt for the fourth lepton in pt
#*********************************************************************
# Control the Ht(k)=Sum of k leading jets *
#*********************************************************************
 0 = htjmin ! minimum jet HT=Sum(jet pt)
 -1 = htjmax ! maximum jet HT=Sum(jet pt)
 0 = ihtmin !inclusive Ht for all partons (including b)
 -1 = ihtmax !inclusive Ht for all partons (including b)
 0 = ht2min ! minimum Ht for the two leading jets
 0 = ht3min ! minimum Ht for the three leading jets
 0 = ht4min ! minimum Ht for the four leading jets
 -1 = ht2max ! maximum Ht for the two leading jets
 -1 = ht3max ! maximum Ht for the three leading jets
 -1 = ht4max ! maximum Ht for the four leading jets
#***********************************************************************
# Photon-isolation cuts, according to hep-ph/9801442 *
# When ptgmin=0, all the other parameters are ignored *
# When ptgmin>0, pta and draj are not going to be used *
#***********************************************************************
   0 = ptgmin ! Min photon transverse momentum
 0.4 = R0gamma ! Radius of isolation code
 1.0 = xn ! n parameter of eq.(3.4) in hep-ph/9801442
 1.0 = epsgamma ! epsilon_gamma parameter of eq.(3.4) in hep-ph/9801442
 .true. = isoEM ! isolate photons from EM energy (photons and leptons)
#*********************************************************************
# WBF cuts *
#*********************************************************************
 0 = xetamin ! minimum rapidity for two jets in the WBF case
 0 = deltaeta ! minimum rapidity for two jets in the WBF case
#*********************************************************************
# KT DURHAM CUT *
#*********************************************************************
 -1 = ktdurham
 0.4 = dparameter
#*********************************************************************
# maximal pdg code for quark to be considered as a light jet *
# (otherwise b cuts are applied) *
#*********************************************************************
 5 = maxjetflavor ! Maximum jet pdg code
#*********************************************************************
# Jet measure cuts *
#*********************************************************************
 0 = xqcut ! minimum kt jet measure between partons
#*********************************************************************
#
#*********************************************************************
# Store info for systematics studies *
# WARNING: If use_syst is T, matched Pythia output is *
# meaningful ONLY if plotted taking matchscale *
# reweighting into account! *
#*********************************************************************
   T = use_syst ! Enable systematics studies
#
#**************************************

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Do you know why it is killed?

The re-weighitng is creating a lot of files and need to compile them one by one.
This setup phase is quite time consuming and put stress on disk typically.
Now this should not put too much stress on number of thread / RAM so typically this should not be an issue.
So the best bet for the issue is that it is killed due to a walltime.
In that case, the solution might be to run a dedicated job for the creation of the reweighting directories.

Cheers,

Olivier

Revision history for this message
Giacomo (giacbold) said :
#2

Hi Olivier.

I do not know why the process was killed unfortunately (is there a away to know?).
We are running the reweighting locally on a machine with 48 cores and 4 Gb RAM x core (so total of 192 Gb ram). As far as i know we do not have walltime limits. I had local jobs running for weeks without getting killed, the reweight step only runs for 10h before getting killed by OS.

I do not know what you mean with running a dedicated job for the creation of the reweight directories, i tried to run ./bin/madevent --debug reweight pilotrun standalone and it still breaks while computing second matrix element. Still I do not see how splitting the creation of the first and second directory will help in this case, maybe I got that wrong.

I monitored the reweight process every 3 seconds watching the rss memory pressure of the reweight job. I found that it takes up to 50 Gb of memory if my calculations are correct ("ps -o rss -p 3483163", where the number refers to the pid of the reweight job), and it gets killed just before the 50Gb limit (last sampling is 49.9953 Gb to be precise).

You can see a plot here:
http://gboldrin.web.cern.ch/gboldrin/gridpacks/rss_reweighting.pdf

Do you expect such large rss usage?

Best,
Giacomo

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

Hi,

I wanted to test today but looks like I do not have the restriction_card that you use.
Can you copy it here or send it to me to <email address hidden> (with a mention of the question number: 701194)

Compilation are done differently in more recent version, so while I have never face that particular issue, it might actually help (or not) to test with more recent version.

Olivier

Revision history for this message
Giacomo (giacbold) said :
#4

Hi Olivier,

Thanks for looking deeper into this, the restrict card can be found at http://gboldrin.web.cern.ch/gboldrin/generators/restrict_cW_cHWB_cHDD_cHbox_cHW_cHl1_cHl3_cHq1_cHq3_cqq1_cqq11_cqq31_cqq3_cll1_massless_mod.dat (I'll also send an email)

I probably forgot to mention i'm using MG v2_6_5 which is quite old so it can be useful to test with more recent versions.
I'll also try in the next days.

Best,
Giacomo

Revision history for this message
Giacomo (giacbold) said :
#5

Hello,

Any news from this side?

Best,
Giacomo

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#6

I was waiting for your test with the supported version (2.9.9)

Cheers,

Olivier

> On 12 May 2022, at 15:20, Giacomo <email address hidden> wrote:
>
> Question #701194 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/701194
>
> Giacomo gave more information on the question:
> Hello,
>
> Any news from this side?
>
> Best,
> Giacomo
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Giacomo (giacbold) said :
#7

Hi Olivier,

just tested, still crashes the same way. However it crashed before 2.6.5 and the slopes are different.
Find the comparison plot here:

http://gboldrin.web.cern.ch/gboldrin/gridpacks/rss_reweighting_299.pdf

Cheers,
Giacomo

Revision history for this message
Giacomo (giacbold) said :
#8

Hello, any news on how to bypass this problem?

Cheers,
Giacomo

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#9

I guess the only solution that we can offer is to split the generation according to the various initial/final state.
The issue seems to be that you have so many initial/final state and so many diagram for each of them that the re-weighing code can not create a single executable with that many diagrams. (or more exactly, that you hit some RAM limitation for this process).

Another option is to see if you can simplify your model.
I do see that your reweight point as additional parameter/coupling set to zero:
So using a restricted model such that you can reduce the number of diagram/subprocesses (and therefore the amount of RAM needed) might help to make it through.

Cheers,

Olivier

Revision history for this message
Giacomo (giacbold) said :
#10

Dear Olivier,

by splitting the generation you mean setting the grouping of Subprocesses to False instead of Auto? Or literally splitting flavours / charges in various generations?

For the second option indeed I'm switching to a reweight card that provides the relative path to a restriction card with less operators involved. This way works:

change helicity False
change rwgt_dir rwgt
change process u d~ > w+ NP=1 SMHLOOP=0 QCD=0

# cW=-1 rwgt_2
change rwgt_dir rwgt/rwgt_cW
change model SMEFTsim_U35_MwScheme_UFO_b_massless-cW_massless
launch

# cll1=1 rwgt_3
change rwgt_dir ../../../rwgt_cll1
change model SMEFTsim_U35_MwScheme_UFO_b_massless-cll1_massless
launch

However for each weight point MG recomputes the initial ME (SM from the example). Is there any option to avoid the creation of the rw_me directories apart from the first one? At the moment I'm modifying the reweight interface.

Thank you,
Giacomo

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#11

Hi,

>
> by splitting the generation you mean setting the grouping of
> Subprocesses to False instead of Auto? Or literally splitting flavours /
> charges in various generations?

The second, the first is actually automatic for all standalone type of output.

> However for each weight point MG recomputes the initial ME (SM from the example). Is there any option to avoid the creation of the rw_me directories apart from the first one? At the moment I'm modifying the reweight interface.

I'm not aware of any option in that direction

Cheers,

Olivier

> On 27 May 2022, at 12:20, Giacomo <email address hidden> wrote:
>
> Question #701194 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/701194
>
> Status: Answered => Open
>
> Giacomo is still having a problem:
> Dear Olivier,
>
> by splitting the generation you mean setting the grouping of
> Subprocesses to False instead of Auto? Or literally splitting flavours /
> charges in various generations?
>
>
> For the second option indeed I'm switching to a reweight card that provides the relative path to a restriction card with less operators involved. This way works:
>
> change helicity False
> change rwgt_dir rwgt
> change process u d~ > w+ NP=1 SMHLOOP=0 QCD=0
>
> # cW=-1 rwgt_2
> change rwgt_dir rwgt/rwgt_cW
> change model SMEFTsim_U35_MwScheme_UFO_b_massless-cW_massless
> launch
>
> # cll1=1 rwgt_3
> change rwgt_dir ../../../rwgt_cll1
> change model SMEFTsim_U35_MwScheme_UFO_b_massless-cll1_massless
> launch
>
>
> However for each weight point MG recomputes the initial ME (SM from the example). Is there any option to avoid the creation of the rw_me directories apart from the first one? At the moment I'm modifying the reweight interface.
>
>
> Thank you,
> Giacomo
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Can you help with this problem?

Provide an answer of your own, or ask Giacomo for more information if necessary.

To post a message you must log in.