Consequences of same iseed in different directories

Asked by Jared Barron on 2019-10-22

Hi,

I've generated a large BSM simulation sample in the following way, on a cluster. For one process, I have created many working directories, each with different values set in the param_card.dat (So, e.g. 100 directories, each with a different mass for some BSM particle). The cluster I'm using allocates CPUs in nodes of 40 cores at a time. So, for each of these directories I have submitted 20 serial jobs, each generating 50k events, running Madgraph in multicore mode. (As an aside: Should I take the warning to limit event generation to 50k events seriously if Pythia is being run?) Within each directory, I've checked that the iseed changes for each run, as it should. However, the iseed for, say, run_01 is the same for each of these directories. Does that mean that my final event samples for each point in parameter space will be statistically dependent on each other? I've looked at previous questions on similar topics that indicate this might be a problem, but am unsure in my case, since the different directories have different model parameter values. Thank you!

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
2019-10-23
Last query:
2019-10-23
Last reply:
2019-10-23

> On 22 Oct 2019, at 23:37, Jared Barron <email address hidden> wrote:
>
> New question #685336 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/685336
>
> Hi,
>
> I've generated a large BSM simulation sample in the following way, on a cluster. For one process, I have created many working directories, each with different values set in the param_card.dat (So, e.g. 100 directories, each with a different mass for some BSM particle). The cluster I'm using allocates CPUs in nodes of 40 cores at a time. So, for each of these directories I have submitted 20 serial jobs, each generating 50k events, running Madgraph in multicore mode. (As an aside: Should I take the warning to limit event generation to 50k events seriously if Pythia is being run?) Within each directory, I've checked that the iseed changes for each run, as it should. However, the iseed for, say, run_01 is the same for each of these directories. Does that mean that my final event samples for each point in parameter space will be statistically dependent on each other? I've looked at previous questions on similar topics that indicate this might be a problem, but am unsure in my case, since the different directories have different model parameter values. Thank you!
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Hi Jared,

I do not think that someone can give a clear answer in that case. This highly depend of the Feynman diagram consider and on the benchmark point that you consider. You can go from 0 correlation to full correlation.
(let's say that you generate p p > t t~ QED=0 and change the mass of the Z, then obviously you will have full correlation)

If you do p p > Z > t t~, then one should expect fully statistical independent sample. However one still need to be careful since you might have the exact same distribution in the M*-MZ distribution (assuming fix width) which can be problematic if you use such sample to train Neural Network for example (you might miss the fact that you overtrain).

Now if you look at process like p p > t t~ QED<=2
and check the channel of integration of the Z. Then you have interference term with the QCD sample and this means that for every Z mass, you will likely have changes to the grid of each iteration which then will lead to a snowball effect which should reduce the correlation.
The same is true for the QCD channel of integration where the noise due to the Z should also lead to a snowball effect reducing the correlation.

Now even in those simple example, it is not clear how much independent such samples are.

Cheers,

Olivier

Jared Barron (jbarron) said : #3

Thanks Olivier. Do you know of any way I could check whether two given samples are independent? It shouldn't end up mattering too much for the work I'm doing, but it would be useful to know for the future whether I should worry about this.

Jared Barron (jbarron) said : #4

Thanks Olivier Mattelaer, that solved my question.

No i do not know how to evaluate that.