How are the weights stored in the lhe, hepmc and root files related?

Asked by Ewan McCulloch on 2017-08-21

I have generated the following process "p p > t t~ j" and have simulated to detector level using Pythia8 and Delphes. I have systematics turned on and haven't altered the cards. The generation runs without error.

In the interactive mode I get something like this printed to the screen after the systematics calculations:

INFO: #***************************************************************************
#
# original cross-section: 570.07463
# scale variation: +47% -29.5%
# central scheme variation: + 0% -44.8%
# PDF variation: +4.56% -4.56%
#
# dynamical scheme # 1 : 447.083 +43.7% -28.4% # \sum ET
# dynamical scheme # 2 : 373.656 +41% -27.4% # \sum\sqrt{m^2+pt^2}
# dynamical scheme # 3 : 526.728 +45.7% -29.1% # 0.5 \sum\sqrt{m^2+pt^2}
# dynamical scheme # 4 : 314.881 +38.8% -26.4% # \sqrt{\hat s}
#***************************************************************************

This cross-section is what I see as the weight in the LHE file "unweighted_events.lhe.gz":

"<event>
  5 1 +5.7007463e+02 1.91672000e+02 7.54677100e-03 1.15696200e-01"
                ^^^^^^^^^^^^ this number here is the cross-section and sits in the Event Weight position (where XWGTUP sits as stated in https://arxiv.org/pdf/hep-ph/0609017.pdf)

There then follows a section which appears to present many more cross-section calculations perhaps using different pdfs. This looks like the following:

"<rwgt>
  <wgt id='1'> +7.7929291e+02 </wgt>
  <wgt id='2'> +5.6311575e+02 </wgt>
  ....
  ....
  ....
  <wgt id='145'> +5.6236188e+02 </wgt>
 </rwgt>"

Looking now at the HEPMC file I see on the "E - General GenEvent Information" line (as described in http://lcgapp.cern.ch/project/simu/HepMC/206/HepMC2_user_manual.pdf) there are lists of weights. Here is an example of this line for one event in the HEPMC file I have created:

"E 9 -1 -1.0000000000000000e+00 -1.0000000000000000e+00 -1.0000000000000000e+00 0 0 1008 1 2 0 146 1.1346873000000000e+01 8.0753825999999993e+00 ........... ............ "

There are many more weights after the two shown here (the last two numbers) and they are on the whole around 5.7.

Finally the ROOT file generated by Delphes has an event weights container. When I plot a histogram for the parameter Event.Weight I see that there is a distribution of weights centred at 6 with a FWHM of around 1.5.

I am trying to understand how these weightings are all linked. I can see that in the lhe file the numerous weights assigned to each event are centred around the cross-section printed during the systematics calculation and are centred around the cross-section/100 in the HEPMC file. The root file also seems to show weightings centred around cross-section/100.

So far all I've done is describe my output, thank you for bearing with me. I don't think there is any problem with it but I have very little idea of how these weights are to be used and whether they are physical. As of yet I have just been normalising the number of events passing selection cuts in an analysis I've written by using the cross section provided by the systematics calculation (the one I showed at the top) and have ignored the other numerous weights stored in various files. If any of the experts here can tell me whether I should be using these weights or if my current approach is fine (I am just doing a basic higgsino LSP analysis, the process presented here is just a test process).

Many thanks,
Ewan

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
2017-08-22
Last query:
2017-08-22
Last reply:
2017-08-22
Ewan McCulloch (ewanmcc) said : #1

As a quick aside. I have also run this process with jet matching turned on and with a xqcut of 100GeV. As I am not using a "add process p p > t t~ j j" feature I do not understand how I could be double counting, so expected the cross-section to be the same as without jet matching. However during the parton level systematics calculation I get the output:

INFO: #***************************************************************************
#
# original cross-section: 123.82121
# scale variation: +36.4% -24.8%
# central scheme variation: + 0% -44.4%
# PDF variation: +5.43% -5.43%
#
# dynamical scheme # 1 : 84.9558 +32.5% -23% # \sum ET
# dynamical scheme # 2 : 78.7175 +31.7% -22.6% # \sum\sqrt{m^2+pt^2}
# dynamical scheme # 3 : 103.633 +34.6% -24% # 0.5 \sum\sqrt{m^2+pt^2}
# dynamical scheme # 4 : 68.8968 +30.2% -21.8% # \sqrt{\hat s}
#***************************************************************************

This is a very different cross-section and the cross section stated after the pythia8 jet-matching process does not change this. I though pythia8 and Madgraph would be splitting the phase space of the jet between them so that together they would cover the same phase space of the jet that is covered by madgraph when jet matching is turned off. Additionally half of the generated events have zero weight designated in the hepmc and delphes root files.

To summarise:
When running a process such as p p > t t~ j, why does jet-matching affect the cross-section and ought I use jet-matching or not?

Ewan McCulloch (ewanmcc) said : #2

I think I misunderstood, I would have agreement in the cross-sections between p p > t t~ j (no jet matching) and p p > t t~ & p p > t t~ j (with jet matching). So I should expect differences in cross section by using jet matching without adding p p > t t~, as adding jet matching will still introduce a qcut on the jet i.e I am restricting the phase space I am working with so the cross-section reduces. Sorry for the confusion here, I think I have cleared up my question in the above comment but the original question still stands.

Many thanks,
Ewan

Dear Ewan,

As you can see from

#***************************************************************************
#
# original cross-section: 570.07463
# scale variation: +47% -29.5%
# central scheme variation: + 0% -44.8%
# PDF variation: +4.56% -4.56%
#
# dynamical scheme # 1 : 447.083 +43.7% -28.4% # \sum ET
# dynamical scheme # 2 : 373.656 +41% -27.4% # \sum\sqrt{m^2+pt^2}
# dynamical scheme # 3 : 526.728 +45.7% -29.1% # 0.5 \sum\sqrt{m^2+pt^2}
# dynamical scheme # 4 : 314.881 +38.8% -26.4% # \sqrt{\hat s}
#***************************************************************************

You have a huge theoretical uncertainty on your theoretical cross-section. Such uncertainty needs to propagate correctly trough your analysis in order to have a reasonable prediction of your signal/background.

Therefore, you need to know how this uncertainty propagates when you do some cut. Scale and PDF uncertainty do not behave in the same way on the full phase-space and therefore applying cut modifies the above theoretical uncertainty.
In order to be able to compute such theoretical error after analysis cut, we provide the weight for each PDF Replica and scale choice (possible central value + variation around it) such that you can recompute the above number after those cuts.
That's the reason of such weight.

To recompute the PDF error, you have to follow the instructions of the PDF set. This specific method of computation is typically available in lhapdf (but not always) and therefore i would advise to use lhapdf for that.
For the scale this is typically used by doing the envelope of the value.

The normalization of such weight is not fixed by the convention (at least for the lhe file). The default in madgraph is that the average of the weight returns the cross-section, but you can change such convention via the run_card.
The convention at HEPMC/Delphes level are unknow to me. (But they should be the same as for the central weight)

For the matching/merging.
The splitting of the phase-space between the parton shower(pythia) and the matrix-element generator (madgraph) is what we called the matching/merging method. So without that method, they are clear double counting since we do not split the phase-space.

In top of that, even if you have a single multiplicity, you can have quite different result if you put ickkw ON or OFF since putting it ON, change the scale choice associated to the events (each interaction will have his own scale) and since you have huge theoretical uncertainty, you can expect huge impact of such procedure --which in itself has nothing to do with the presence/absence of double counting due to higher muliplicity.)

Cheers,

Olivier

Ewan McCulloch (ewanmcc) said : #4

Hi Oliver, thanks for the response. I think I understand your answer for the most part; I have a couple of follow up questions though. In the case of a process like p p > t t~ j, I can either not use matcing/merging (there are no multiplicities in this, correct?) or use it and set the scale appropriately. Will I only get a physical cross-section if I use the matching/merging procedure? Thank you for making clear the use of the weights in the lhe file, that makes a lot of sense.

Many thanks,
Ewan

Ewan McCulloch (ewanmcc) said : #5

Thanks Olivier Mattelaer, that solved my question.