cross-sections differ a lot between different input random numbers

Asked by Yingjie Wei

Hi Madgraph experts:

    These days, I'd like to generate some EFT samples for Offshell HZZ->4l, especially on the operator: cpG. (SM*cpG). This SM*cpG is calculated by reweighting SM*ctp.

    But I met a problem with the cross-section. (or integrated cross-section)

    I generate using two different input random numbers, like 123456 and 54321. Except for the input random numbers, all the settings are the same.

    But I got two different cross-sections:

    (1) 0.000135413 pb
    (2) -9.75099e-05 pb

which are different quite a lot!!

Do you think this is reasonable? (while I thought the cross-section of a process is fixed, or well-determined)

Another question I'd like to ask is: what's the relationship between the "integrated cross-section" of all events and the "weight" of each event? How to calculate them? Do you have a document or paper that I can have a look at on the details?

Thanks so much in advance!!!

Please let me know if you need further information.

Yingjie

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Yingjie Wei
Solved:
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

1) What is the statistical error in both case?
Since you seem to compute an interference, this is not definite positive and statistical error can be very large which can make such two computations compatible.

2) what version of MG5aMC are you using?
and which value for sde_strategy is used (in the run_card)?

Cheers,

Olivier

> On 7 Jul 2021, at 05:05, Yingjie Wei <email address hidden> wrote:
>
> New question #697884 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/697884
>
> Hi Madgraph experts:
>
> These days, I'd like to generate some EFT samples for Offshell HZZ->4l, especially on the operator: cpG. (SM*cpG). This SM*cpG is calculated by reweighting SM*ctp.
>
> But I met a problem with the cross-section. (or integrated cross-section)
>
> I generate using two different input random numbers, like 123456 and 54321. Except for the input random numbers, all the settings are the same.
>
> But I got two different cross-sections:
>
> (1) 0.000135413 pb
> (2) -9.75099e-05 pb
>
> which are different quite a lot!!
>
> Do you think this is reasonable? (while I thought the cross-section of a process is fixed, or well-determined)
>
> Another question I'd like to ask is: what's the relationship between the "integrated cross-section" of all events and the "weight" of each event? How to calculate them? Do you have a document or paper that I can have a look at on the details?
>
> Thanks so much in advance!!!
>
> Please let me know if you need further information.
>
> Yingjie
>
>
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Yingjie Wei (yingjiewei) said (last edit ):
#2

Hi Olivier:

     Thanks so much for the prompt reply.

1) sorry my bad, I forgot to add the error. But it seems that the error is small and it can not cover the difference between two cross-sections.

   (1) 0.000135412826126 +- -4.60332772588e-05 pb
   (2) -9.75098707212e-05 +- -7.00729049367e-06 pb

2) The madgraph I used is: 2.9.3

here is part of run card including sde_strategy:

  1 = nhel ! using helicities importance sampling or not.
! 0: sum over helicity, 1: importance sampling
  2 = sde_strategy ! default integration strategy (hep-ph/2021.xxxxx)
! 1 is old strategy (using amp square)
! 2 is new strategy (using only the denominator)

Cheers,
Yingjie

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

To my point of view they are quite large (especially since they are only an estimator of the error and that such estimator is not bound to converge).

You can try with sde_strategy=1 to see if this help (I really doubt that it will). But the only option is likely to increase the number of events.

Fyi, integration of interference term is not something that we do have under-control and we have actually no idea how to integrate that efficiently/correctly in general. So you might also be in a case which is too complex for our integrator to handle as needed.

Cheers,

Olivier

> On 7 Jul 2021, at 08:30, Yingjie Wei <email address hidden> wrote:
>
> Question #697884 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/697884
>
> Status: Answered => Open
>
> Yingjie Wei is still having a problem:
> Hi Olivier:
>
>
> Thanks so much for the prompt reply.
>
> 1) sorry my bad, I forgot to add the error. But it seems that the error
> is small and it can not cover the difference between two cross-sections.
>
> (1) 0.000135412826126 +- -4.60332772588e-05 pb
> (2) -9.75098707212e-05 +- -7.00729049367e-06 pb
>
> 2) The madgraph I used is: 2.9.3
>
> here is part of run card including sde_strategy:
>
> 1 = nhel ! using helicities importance sampling or not.
> ! 0: sum over helicity, 1: importance sampling
> 2 = sde_strategy ! default integration strategy (hep-ph/2021.xxxxx)
> ! 1 is old strategy (using amp square)
> ! 2 is new strategy (using only the denominator)
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Yingjie Wei (yingjiewei) said (last edit ):
#4

Hi Olivier:

    Thanks,

     To be honest, for both samples mentioned above, I generated 10k events, (I think it's big enough?).

     I will try the "sde_strategy=1" and let you know.

     Another point, you mentioned the "which is too complex for our integrator to handle as needed": yes, I think so.

     My case is a bit complex. I generate the off-shell gg->4l on operator cpG from 130GeV to 2000GeV. From the event distribution (or differential cross-section) against m4l, this distribution has a positive part and a negative part. You may have a look at: https://cernbox.cern.ch/index.php/s/hBsaxin0E0eBRLD .
     The integrated cross-sections of these two parts cancel out!! I was wondering how Madgraph integrates the phase space, like from 130GeV to 2000GeV as a single entire region or 130GeV to 131GeV, 131GeV to 132 GeV, etc?

     From the plot: https://cernbox.cern.ch/index.php/s/hBsaxin0E0eBRLD , it seems the distributions of two samples with different input random numbers are consistent with each other. But I'd like to combine them (as I have 100 samples with different integrated cross-sections). I am not sure whether this works (the combination) or not?

Cheers,
Yingjie

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#5

Hi,

10k is large enough when you have definite positive integral.
In presence of non definite positive integral, this fraction of negative events matters a lot.
If you have ~50% of negative events, then 10k is a very small sample size.

what mg5aMC performs is the integration of the absolute value of the integral.
(both for the optimization and for the event generation)
then keep track if the contribution is negative or positive to return the physical value/correct value of the weight for each events.

> From the plot: https://cernbox.cern.ch/index.php/s/hBsaxin0E0eBRLD , it seems the distributions of two samples with different input random number are consistent with each other. But I'd like to combine them (as I have 100 samples with different intergrated cross-sections). I am not sure whether this works (the combination) or not?

So here i would suggest first to generate much bigger sample size such that you start to have decent statistical error on your nearly vanishing cross-section and then you can think to combine them by just merging them. With such level of error you will need to do some secondary un-weighting which is more complex.

Cheers,

Olivier

> On 7 Jul 2021, at 09:20, Yingjie Wei <email address hidden> wrote:
>
> Question #697884 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/697884
>
> Status: Answered => Open
>
> Yingjie Wei is still having a problem:
> Hi Olivier:
>
> Thanks,
>
> To be honest, for both samples mentioned above, I generated 10k
> events, (I think it's big enough?).
>
> I will try the "sde_strategy=1" and let you know.
>
> Another point, you mentioned the "which is too complex for our
> integrator to handle as needed": yes, I think so.
>
> My case is a bit complex. I generate the off-shell gg->4l on operator cpG from 130GeV to 2000GeV. From the event distribution (or differential cross-section) against m4l, this distribution has a positive part and a negative part. You may have a look at: https://cernbox.cern.ch/index.php/s/hBsaxin0E0eBRLD .
> The integrated cross-sections of these two parts cancel out!! I was wondering how Madgraph integrates the phase space, like from 130GeV to 2000GeV as a single entire region or 130GeV to 131GeV, 131GeV to 132 GeV, etc?
>
> From the plot: https://cernbox.cern.ch/index.php/s/hBsaxin0E0eBRLD , it seems the distributions of two samples with different input random number are consistent with each other. But I'd like to combine them (as I have 100 samples with different intergrated cross-sections). I am not sure whether this works (the combination) or not?
>
>
> Cheers,
> Yingjie
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Yingjie Wei (yingjiewei) said :
#6

Hi Olivier:

    Thanks so much for lots of help!!!

    In practice, I spent like 12 hours to generate 10k events (with 200 cpu cores). Generating a much larger single sample might not be feasible.

    But just for my case, would that be ok if I generate the positive part (300GeV to 2000GeV) and negative part (130GeV to 300GeV) separately?
    I am not sure whether 10k events for 100% negative part are ok or not?

Really thanks!
Yingjie

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#7

I do not see how this helps.
You will have the same issue that you will need to substract the two contribution but have to sum the statistical error.
But obviously you can give it a try.

Are you doing a loop-induced process? This is my best bet on why your code is so slow. If so you are indeed in a extremely complex case to integrate since phase-space integration for loop-induced is on his own quite complex (and in top of that the time to evaluate a single matrix element is quite expansive).

Cheers,

Olivier

> On 7 Jul 2021, at 10:01, Yingjie Wei <email address hidden> wrote:
>
> Question #697884 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/697884
>
> Status: Answered => Open
>
> Yingjie Wei is still having a problem:
> Hi Olivier:
>
> Thanks so much for lots of help!!!
>
> In practice, I spent like 12 hours to generate 10k events (with 200
> cpu cores). Generating a much larger single sample might not be
> feasible.
>
> But just for my case, would that be ok if I generate the positive part (300GeV to 2000GeV) and negative part (130GeV to 300GeV) separately?
> I am not sure whether 10k events for 100% negative part are ok or not?
>
> Really thanks!
> Yingjie
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Yingjie Wei (yingjiewei) said :
#8

HI Olivier,

     Yeah, I see.... (I will just have a try. I was wondering whether the separated generation of positive part and negative part will make the two integrated cross-section, one positive, one negative more definite. though the sum of them still fluctuates)

    You are absolutely right. I am doing a loop-induced process which is quite slow. That's why I am trying to generate many samples and combine them.

    Thanks so much for the above help! They are super helpful for me to understand more.

Cheers,
Yingjie