Using reweight feature on cluster

Asked by Kenneth Long on 2014-05-15

Hi Experts,

I'm interest in running the reweight freature of MadGraph (https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/Reweight) on a cluster. I would like to run ~30 different reweight parameters over a 100,000 event file. As discussed here, a simple way would be to run a single reweight simultaneously in many directories, but this is very inefficient. I have also considered dividing the file into 1000 event pieces and running through the full range of paramenters on those.

A similar question is addressed here: https://answers.launchpad.net/mg5amcnlo/+question/244512

"The way to split the efficiently work in parallel is probably to make all the reweighting in one go.
and parallel on the event. This is more efficient since like that you evaluate N+1 matrix element and not 2*N
(i.e. you evaluate only once the original matrix-element)

On line 418 of madgraph/interface/reweight_interface.py
you have the way to launch multiple estimator differing by a param_card so this can be done.
This also shows that expect for that input parameter, you can always use the same directory if you want."

Your suggestion seems to be to simultaneously run the reweights through the entire sample of events. This seems ideal, but I am having difficulty understanding how this is implemented in the reweight_interface.py file. Could you elaborate on this?

Thanks,

Kenneth

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
2014-05-16
Last query:
2014-05-16
Last reply:
2014-05-15

Dear Kenneth,

I honestly think that the simplest way to do in your case is to make ~30 independent run each of them running on your full sample.
and then making one script merging the various events file in a single one (script which is very easy to do).

The way to go to compute more than one hypothesis a the same time requires a bunch of work to handle everything coherently.
Since this code is extremely fast, I don’t think this is worthed but this is more detail:

1) you need to split the function do_launch (madgraph/interface/reweight_interface.py l 212)
such that it does only the creation of the param_card. (probably rename it to do_create_rw_card
and have a way to create param_card_X.dat keeping track at which level you are.
here the simplest is to define an instance attribute self.nb_reweight which is increase each time you add a card.

Be also carefull that the banner is written in a coherent way.

3) you need to create a new function (at would call this one do_reweight) which is actually doing the real work.
(so actually the rest of the previous do_launch function. so starting at line 282)

4) you need to modify the function calculate_weight such that it return the list of the weight and not only a single one.
This should do the trick:
out = []
w_orig = self.calculate_matrix_element(event, 0)
for i in range(self.nb_reweight):
    w = self.calculate_matrix_element(event, i)
    out.append(w/w_orig)

5) In calculate_matrix_element, you need to associate each index to the associate param_card.
(should be easy as well)

6) in the function do_reweight, you need to change the way the weight are add to the file since you need to handle a list as the output
of the previous routine and not a number anymore.

I think that’s basically it for computing more than one weight on one node.

For using multiple node this is deeper since the Multicore way that we use are not foreseen to
return value.
So it needs to modify the file madgraph/various/cluster.py
in order to have a way to recover the output value in a coherent way.
That’s definitively possible but this is more complicated that the modification describe above.

Cheers,

Olivier

On May 15, 2014, at 6:06 PM, Kenneth Long <email address hidden> wrote:

> New question #248730 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/248730
>
> Hi Experts,
>
> I'm interest in running the reweight freature of MadGraph (https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/Reweight) on a cluster. I would like to run ~30 different reweight parameters over a 100,000 event file. As discussed here, a simple way would be to run a single reweight simultaneously in many directories, but this is very inefficient. I have also considered dividing the file into 1000 event pieces and running through the full range of paramenters on those.
>
> A similar question is addressed here: https://answers.launchpad.net/mg5amcnlo/+question/244512
>
> "The way to split the efficiently work in parallel is probably to make all the reweighting in one go.
> and parallel on the event. This is more efficient since like that you evaluate N+1 matrix element and not 2*N
> (i.e. you evaluate only once the original matrix-element)
>
> On line 418 of madgraph/interface/reweight_interface.py
> you have the way to launch multiple estimator differing by a param_card so this can be done.
> This also shows that expect for that input parameter, you can always use the same directory if you want."
>
> Your suggestion seems to be to simultaneously run the reweights through the entire sample of events. This seems ideal, but I am having difficulty understanding how this is implemented in the reweight_interface.py file. Could you elaborate on this?
>
> Thanks,
>
> Kenneth
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Kenneth Long (kdlong) said : #2

Thanks Olivier Mattelaer, that solved my question.