Skip/speed up zipping Pythia files possible?

Asked by William Shepherd on 2019-11-04

In starting significant pseudodata generation on a cluster, we're finding that two phases of the simulation take a surprisingly large amount of time: "merging results from the split PY8 runs..." and "storing pythia8 files of previous run". I was poking around at the code which performs that merging, and it appears to be removing some leading and some following lines using sed, at least in the version I'm looking at. A simple test of achieving the same task with the head and tail commands was at least an order of magnitude faster; is there some impediment to using these commands as opposed to sed in this context?

Also, for our current analysis we are not actually exploiting information in the PY8 .hepmc file, and thus delete it shortly after the run is completed; is there a straightforward way to bypass the gzip call that is taking so long at the "storing pythia8 files..." juncture? This would significantly speed up our usage of the program; it seems that these two steps take an order of magnitude more time than the rest of the program when running in cluster mode, because they are both serial processes.

Thanks for any insights!

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
Valentin Hirschi Edit question
Last query:
2019-11-05
Last reply:
2019-11-09

For the sed/head/tail, I'm pretty sure that no one tries to optimise this.
Could you do a merge request? or provide your patch?

For the hepmc, the easiest would be to ask pythia8 to not generate such file (I guess that you customise already py8 mode since this is our current only output of py8/...) For that you should refer to the PY8 manual and put the associate flag within the pythia8_card.dat

Cheers,

Olivier

Hi Olivier,

I've only tested it on a single file, I'll play with what would be needed to get this to work on two example files and place the proposed code here when I have it.

The trouble with just not requesting the hepmc is that it's needed for Delphes, so I can't simply turn it off at the py8 step in the card; the 'storing pythia8 files...' step is outside any process straightforwardly controlled by cards at this time. I don't know what changes would be necessary to produce a switch that tells the code whether or not to gzip those files after they've been used by Delphes, which is why I asked the question.

Best,

Will

ah ok,

So you do use the hepmc, I was missing that point.
Ok I see the point now.
I will think on how to add such switch.
My first though would be to have a hidden entry of the run_card but this is a bit un-natural.
A second idea would be to have a special analysis mode for that
analysis=Delphes/nohepmc

Cheers,

Olivier

> On 5 Nov 2019, at 22:27, William Shepherd <email address hidden> wrote:
>
> Question #685621 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/685621
>
> Status: Answered => Open
>
> William Shepherd is still having a problem:
> Hi Olivier,
>
> I've only tested it on a single file, I'll play with what would be
> needed to get this to work on two example files and place the proposed
> code here when I have it.
>
> The trouble with just not requesting the hepmc is that it's needed for
> Delphes, so I can't simply turn it off at the py8 step in the card; the
> 'storing pythia8 files...' step is outside any process straightforwardly
> controlled by cards at this time. I don't know what changes would be
> necessary to produce a switch that tells the code whether or not to gzip
> those files after they've been used by Delphes, which is why I asked the
> question.
>
> Best,
>
> Will
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

I have success with the following two lines (to replace the sed command in madevent_interface.py):

os.system(' '.join(['head','-n','-1',hepmc_file,'|','tail','-n','+'+str(n_head),'>','tmpfile']))
os.system(' '.join(['mv','tmpfile',hepmc_file]))

However, I hear rumors that the OSX version of head doesn't have this functionality. I haven't done detailed timing comparisons inside the python code yet either.

head seems to have the options -n on mac (at least on Mojave)

Thanks,

Olivier

> On 6 Nov 2019, at 15:52, William Shepherd <email address hidden> wrote:
>
> Question #685621 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/685621
>
> William Shepherd posted a new comment:
> I have success with the following two lines (to replace the sed command
> in madevent_interface.py):
>
> os.system(' '.join(['head','-n','-1',hepmc_file,'|','tail','-n','+'+str(n_head),'>','tmpfile']))
> os.system(' '.join(['mv','tmpfile',hepmc_file]))
>
> However, I hear rumors that the OSX version of head doesn't have this
> functionality. I haven't done detailed timing comparisons inside the
> python code yet either.
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Hi Olivier,

My understanding isn't that head is missing the option -n entirely, but rather that at least some versions of head refuse to take negative arguments to that flag.

Best,

Will

Can you help with this problem?

Provide an answer of your own, or ask William Shepherd for more information if necessary.

To post a message you must log in.