Combining LHE events from gridpacks with different jet multiplicity

Asked by He He

Dear experts,

Recently I produced a MadGraph gridpack with CMS genproductions repository, trying to generate pp to ZZ processes with different jet multiplicities and use FxFx merging with PYTHIA, similar to what is described at http://amcatnlo.web.cern.ch/amcatnlo/FxFx_merging.htm.

generate p p > z z [QCD] @0
add process p p > z z j [QCD] @1
add process p p > z z j j [QCD] @2

However, I realized I used "generate" instead of the "add process" syntax for the last two processes, and it seems that only the @2 process was generated. Could we confirm this will indeed be the case? So I plan to produce another gridpack with the 01j processes (or separate gridpacks) and combine the events. Since my workflow is to produce the LHE events, decay Z's with MadSpin, and perform showering with CMS interface, I would like an LHE file appropriately set up for showering and merging, as if it is created from a single inclusive gridpack with the above processes.

Could you give me some instructions on how to achieve this and things to pay attention to?

Thanks,
He

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
He He
Solved:
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi,

In principle you should do an unweighitng procedure when merging the two samples.
But since you likely generated too many 2j events you should generate enough 0/1j sample such that the maximum-weight of this unweighting is the one of the 2j sample (such that you can ensure that you keep 100% of the events of the 2j sample.

After this optimization point, this should be a standard un-weighting procedure.

Cheers,

Olivier

> On 22 Mar 2021, at 18:05, He He <email address hidden> wrote:
>
> New question #696191 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/696191
>
> Dear experts,
>
> Recently I produced a MadGraph gridpack with CMS genproductions repository, trying to generate pp to ZZ processes with different jet multiplicities and use FxFx merging with PYTHIA, similar to what is described at http://amcatnlo.web.cern.ch/amcatnlo/FxFx_merging.htm.
>
> generate p p > z z [QCD] @0
> add process p p > z z j [QCD] @1
> add process p p > z z j j [QCD] @2
>
> However, I realized I used "generate" instead of the "add process" syntax for the last two processes, and it seems that only the @2 process was generated. Could we confirm this will indeed be the case? So I plan to produce another gridpack with the 01j processes (or separate gridpacks) and combine the events. Since my workflow is to produce the LHE events, decay Z's with MadSpin, and perform showering with CMS interface, I would like an LHE file appropriately set up for showering and merging, as if it is created from a single inclusive gridpack with the above processes.
>
> Could you give me some instructions on how to achieve this and things to pay attention to?
>
> Thanks,
> He
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
He He (hehe-launchpad) said :
#2

Hi Olivier,

Thanks for the suggestions. I looked at XWGTUP in the LHE file for the 2j sample, and the events appear to be unweighted, as confirmed by thread [1]. If I understand it correctly , are you suggesting that we first reweight the 01j sample (enough events generated) and 2j samples by giving each sample's events a weight of the corresponding xsec/nevent (as you suggested in [2]), and then mix and unweight them? For unweighting I am considering the procedure outlined by your example codes [3].

At NLO, since there are negative weights, how should the reweighting/unweighting procedures in [2] and [3] get modified?

Thanks,
He

[1] https://answers.launchpad.net/mg5amcnlo/+question/238828
[2] https://answers.launchpad.net/mg5amcnlo/+question/184781
[3] https://answers.launchpad.net/mg5amcnlo/+question/291534 #10

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

Yes,

here is the equivalent algorithm with negative events (what we do is to generate un-weighted events for the absolute value of the function and preserve the sign of the weight.

Cheers,

Olivier

# perform the Monte-Carlo integration
for i in xrange(Ntry):
    x = random.random()
    f = fct(x)/Ntry
    all_try.append((x,f))
    max_f = max(max_f,abs(f))
    cross += f

# perform the unweighting
accepted_pos = []
accepted_neg = []

for x,f in all_try:
    if random.random() < abs(f)/max_f:
 if f >0:
           accepted_pos.append(x) #keep weight positive
       else:
           accepted_neg.append(x) #keep weight negative
    else:
        continue

Revision history for this message
He He (hehe-launchpad) said :
#4

Hi Olivier,

Thanks for the prompt response! I also noticed the quantity "total abs(cross section)" in the event generation printout. So to reweight the NLO events and mix them (before performing the unweighting as you suggested) , I suppose for event in each sample we just divide this abs(xsec) by nevents and then attach the sign of its weight?

But it seems that the abs(xsec) is only found in the intermediate results, and the final summary contains only the total cross section. Is there a way to look up or calculate the final abs(xsec)?

I also have some additional questions:
1. To merge the LHE events, I am also concerned about the extra information like headers, init, etc. in the LHE files. For 01j and 2j samples the same run_card is used, and only the process definition is different. How should I properly preserve/modify this extra information in the combined LHE file?

2. For the events in LHE files, do I simply put together the <event></event> blocks from all files and change their XWGTUP? What are the quantities in the #aMCatNLO line (e.g. [*]) and do I need to worry about them?

Thanks,
He

[*]
<event npLO=" -1 " npNLO=" 2 ">
 6 2 +2.1026600e+01 4.79591530e+01 7.54677110e-03 1.04726040e-01
        1 -1 0 0 502 0 +0.0000000000e+00 +0.0000000000e+00 +3.1893773000e+03 3.1893773000e+03 0.0000000000e+00 0.0000e+00 9.0000e+00
       21 -1 0 0 503 502 +0.0000000000e+00 +0.0000000000e+00 -4.4158856000e+01 4.4158856000e+01 0.0000000000e+00 0.0000e+00 9.0000e+00
       23 1 1 2 0 0 +8.3307754000e+01 +1.2537776000e+01 +9.1555491000e+02 9.2393367000e+02 9.1188000000e+01 0.0000e+00 9.0000e+00
       23 1 1 2 0 0 +1.8424099000e+02 -2.8953582000e+01 +7.2449050000e+02 7.5364766000e+02 9.1188000000e+01 0.0000e+00 9.0000e+00
        1 1 1 2 501 0 -2.4740589000e+02 -2.7108275000e+01 +1.4818855000e+03 1.5026408000e+03 0.0000000000e+00 0.0000e+00 9.0000e+00
       21 1 1 2 503 501 -2.0142860000e+01 +4.3524081000e+01 +2.3287470000e+01 5.3314038000e+01 0.0000000000e+00 0.0000e+00 9.0000e+00
#aMCatNLO 1 0 0 1 6 0.48590494E+03 0.00000000E+00 -9 7 0 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00
<rwgt>

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#5

But it seems that the abs(xsec) is only found in the intermediate results, and the final summary contains only the total cross section. Is there a way to look up or calculate the final abs(xsec)?

Just keep the events already written in the lhe-file.

1. To merge the LHE events, I am also concerned about the extra information like headers, init, etc. in the LHE files. For 01j and 2j samples the same run_card is used, and only the process definition is different. How should I properly preserve/modify this extra information in the combined LHE file?

This really depends of what tools is used after the running of MG5aMC and what information they are using from that part.
If you are worried about that, the best is likely to keep the two sample separate for any of the tools coming after (most tools are able to handle multiple input file) or to generate another sample with those two merge and re-use that banner (probably just changing the number of events).

2. For the events in LHE files, do I simply put together the <event></event> blocks from all files and change their XWGTUP? What are the quantities in the #aMCatNLO line (e.g. [*]) and do I need to worry about them?

As far as I know, you should just keep those line intact.
But you will have to change the line after the <rwgt> obviously

Revision history for this message
He He (hehe-launchpad) said :
#6

Hi Olivier,

Thanks again for the detailed and helpful suggestions.

a) Regarding abs(xsec):

>Just keep the events already written in the lhe-file.
Are you suggesting that I just use the value read from event XWGTUP (average = event_norm in run_card) and the init block?

I guess this is a side question. My main concern was that, in the case of total xsec, it can have different values at different places: In my 20K events 2j sample generation, the printout summary total xsec is 4.174e+00 +- 1.6e-02 pb, while the gridpack result, intermediate results (after the line "Updating the number of unweighted events per channel"), and LHE init block all have 4.197e+00 +- 1.6e-02 pb. I was wondering if this is expected, which value is more accurate, and whether a vaule of abs(xsec) different from gridpack prediction is also calculated and stored during event generation.

This is of interest because thread [1] seems to indicate that for ~10K events, the latter can have more accurate prediction.

b) Regarding merging and headers:

1. For unweighting, I reweighted each sample to <sign> abs(xsec)/nevents, and then mixed and applied accept/reject method. Then I chose the nominal weight of the unweighted sample to be newWgtNom = <sign> (abs(xsec_01j)+abs(xsec_2j)), and scaled the other weights in <rwgt> accordingly by newNomWgt/oldNomWgt[01j or 2j].

2. I compared the two samples' headers [2] and there are only a few differences in the proc_card, run_card, and <init>. So in the new header I combined the proc_card processes, changed nevents to the final unweighted events (may not be necessary), and combined <init> content to [3]. Putting the merged file into MadSpin and then PYTHIA seems to work, and I am trying to produce a small sample directly with MadGraph to verify the header.

Does the above look reasonable and I am not making obvious mistake?

Thanks,
He

[1] https://answers.launchpad.net/mg5amcnlo/+question/664751

[2]
(The output sample namings should be ignored here...)

diff 01jHeader 2jHeader
74,76c74,75
< generate p p > z z [QCD] @0
< add process p p > z z j [QCD] @1
< output ZZTo4L012j_5f_NLO_FXFX -nojpeg
---
> generate p p > z z j j [QCD] @2
> output ZZTo4L01j_5f_NLO_FXFX -nojpeg
114c113
< 50000 = nevents
---
> 20000 = nevents
116c115
< 3125 = nevt_job ! Max number of events per job in event generation.
---
> 1250 = nevt_job ! Max number of events per job in event generation.
1727,1729c1726,1727
< 2212 2212 0.65000000E+04 0.65000000E+04 -1 -1 306000 306000 -4 2
< 0.92088857E+01 0.26946695E-01 0.45340621E+02 1
< 0.14947483E+02 0.29093684E-01 0.45340621E+02 0
---
> 2212 2212 0.65000000E+04 0.65000000E+04 -1 -1 306000 306000 -4 1
> 0.41972794E+01 0.16191095E-01 0.21026600E+02 2

[3]
2212 2212 0.65000000E+04 0.65000000E+04 -1 -1 306000 306000 -4 3
0.41972794E+01 0.16191095E-01 0.21026600E+02 2
0.92088857E+01 0.26946695E-01 0.45340621E+02 1
0.14947483E+02 0.29093684E-01 0.45340621E+02 0

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#7

i like to use the normalization where the sum of the weight is equal to the cross-section since in that case.
because in that case for unweighting, you should only change the original weight by the max_weight that you are using.
If you do that and do not force a fixed number of events in your unweighted file, then you do not have to play with the total cross-section.

Another strategy is typically inspired by the average weight strategy/way to interpret the weight.
Where you simply replace the weight by the sum of the cross-section.
It is interesting to do the math here actually and see the relation.
(here I follow for the core idea of the appendix of the reweighting paper where I present one proof of the unweighting/reweighing procedure)

So obviously we have
sigma_tot = sigma_1 + sigma2
in the average normalization you would have N1 events in sample 1 and N2 event in sample 2
so
sigma_tot = 1/N1 \sum_1 sigma_1 + 1/N2 \sum_2 sigma_2
which you can change to the sum normalization to
sigma_tot = \sum_1 (sigma_1/N_1) + \sum_2 (sigma_2/N_2)

Then you can define X = max( sigma_1/N_1, sigma_2/N_2)
and rewritte the above equation as
sigma_tot = \sum_1 (sigma_1/N_1)/X*X + \sum_2 sigma_2/N_2/X*X

then you use (sigma_1/N_1)/X as a probability to keep/reject the event that I call P_1 and P_2
sigma_tot = \sum_1 P_1*X + \sum_2 P_2*X

and then you reduce your sample size by keeping a given event according to the probability P_1
assuming that in that procedure you keep n_1 and n_2 event for each sample (in this case by construction you either have n_1= N_1 or n_2=N_2)
sigma_tot \simeq n_1*X + n_2*X = (n_1 + n_2)*X
So in other word you should have X \simeq sigma_tot/ (n_1+n_2)

In order to re-write that expression in term of average over cross-section:
sigma_tot = 1/(n_1+n_2) \sum_{1+2} (n_1 + n_2)*X
sigma_tot \simeq 1/(n_1+n_2) \sum_{1+2} sigma_tot

The important point is that such relation between X and sigma_tot is not exact, they should converge to each other in the infinite sample limit. So it is a choice which one you will take. I personally prefer to keep the max-weight but this is a pure personal preferences that is not used everywhere in the code.

> 2. I compared the two samples' headers [2] and there are only a few
> differences in the proc_card, run_card, and <init>. So in the new header
> I combined the proc_card processes, changed nevents to the final
> unweighted events (may not be necessary), and combined <init> content to
> [3]. Putting the merged file into MadSpin and then PYTHIA seems to work,
> and I am trying to produce a small sample directly with MadGraph to
> verify the header.

sounds reasonable.

> On 2 Apr 2021, at 19:25, He He <email address hidden> wrote:
>
> Question #696191 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/696191
>
> Status: Answered => Open
>
> He He is still having a problem:
> Hi Olivier,
>
> Thanks again for the detailed and helpful suggestions.
>
> a) Regarding abs(xsec):
>
>> Just keep the events already written in the lhe-file.
> Are you suggesting that I just use the value read from event XWGTUP (average = event_norm in run_card) and the init block?
>
> I guess this is a side question. My main concern was that, in the case
> of total xsec, it can have different values at different places: In my
> 20K events 2j sample generation, the printout summary total xsec is
> 4.174e+00 +- 1.6e-02 pb, while the gridpack result, intermediate results
> (after the line "Updating the number of unweighted events per channel"),
> and LHE init block all have 4.197e+00 +- 1.6e-02 pb. I was wondering if
> this is expected, which value is more accurate, and whether a vaule of
> abs(xsec) different from gridpack prediction is also calculated and
> stored during event generation.
>
> This is of interest because thread [1] seems to indicate that for ~10K
> events, the latter can have more accurate prediction.
>
> b) Regarding merging and headers:
>
> 1. For unweighting, I reweighted each sample to <sign>
> abs(xsec)/nevents, and then mixed and applied accept/reject method. Then
> I chose the nominal weight of the unweighted sample to be newWgtNom =
> <sign> (abs(xsec_01j)+abs(xsec_2j)), and scaled the other weights in
> <rwgt> accordingly by newNomWgt/oldNomWgt[01j or 2j].
>
> 2. I compared the two samples' headers [2] and there are only a few
> differences in the proc_card, run_card, and <init>. So in the new header
> I combined the proc_card processes, changed nevents to the final
> unweighted events (may not be necessary), and combined <init> content to
> [3]. Putting the merged file into MadSpin and then PYTHIA seems to work,
> and I am trying to produce a small sample directly with MadGraph to
> verify the header.
>
> Does the above look reasonable and I am not making obvious mistake?
>
> Thanks,
> He
>
> [1] https://answers.launchpad.net/mg5amcnlo/+question/664751
>
> [2]
> (The output sample namings should be ignored here...)
>
> diff 01jHeader 2jHeader
> 74,76c74,75
> < generate p p > z z [QCD] @0
> < add process p p > z z j [QCD] @1
> < output ZZTo4L012j_5f_NLO_FXFX -nojpeg
> ---
>> generate p p > z z j j [QCD] @2
>> output ZZTo4L01j_5f_NLO_FXFX -nojpeg
> 114c113
> < 50000 = nevents
> ---
>> 20000 = nevents
> 116c115
> < 3125 = nevt_job ! Max number of events per job in event generation.
> ---
>> 1250 = nevt_job ! Max number of events per job in event generation.
> 1727,1729c1726,1727
> < 2212 2212 0.65000000E+04 0.65000000E+04 -1 -1 306000 306000 -4 2
> < 0.92088857E+01 0.26946695E-01 0.45340621E+02 1
> < 0.14947483E+02 0.29093684E-01 0.45340621E+02 0
> ---
>> 2212 2212 0.65000000E+04 0.65000000E+04 -1 -1 306000 306000 -4 1
>> 0.41972794E+01 0.16191095E-01 0.21026600E+02 2
>
> [3]
> 2212 2212 0.65000000E+04 0.65000000E+04 -1 -1 306000 306000 -4 3
> 0.41972794E+01 0.16191095E-01 0.21026600E+02 2
> 0.92088857E+01 0.26946695E-01 0.45340621E+02 1
> 0.14947483E+02 0.29093684E-01 0.45340621E+02 0
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
He He (hehe-launchpad) said :
#8

Hi Olivier,

Thanks for the detailed illustration, and I also read the appendix A.1 Unweighting section in the reweighting paper. I have been thinking in the sum normalization picture when doing the unweighting, where my (abs) sigma01j = 0.45340621E+02pb, N1 = 50K, sigma2j = 0.21026600E+02, N2=20K, so that P1=0.86 and P2=1, and the numbers of accepted events are as expected. When it comes to assigning the final weight, I use the average normalization since it is the convention in the original LHE files, as also indicated by IDWTUP = -4 in the init block (#6 [3]), according to description of IDWTUP in [*].

In the init block (#6 [3]) I should probably also update the XMAXUP of the 3 processes to the final weight to be safe, although according to [*] in the case of IDWTUP = -4 neither XSECUP nor XMAXUP needs to be known or supplied.

Thanks a lot for all the helpful explanations and suggestions!

He

[*]https://cds.cern.ch/record/517157/files/0109068.pdf