Skip/speed up zipping Pythia files possible?

Asked by William Shepherd

In starting significant pseudodata generation on a cluster, we're finding that two phases of the simulation take a surprisingly large amount of time: "merging results from the split PY8 runs..." and "storing pythia8 files of previous run". I was poking around at the code which performs that merging, and it appears to be removing some leading and some following lines using sed, at least in the version I'm looking at. A simple test of achieving the same task with the head and tail commands was at least an order of magnitude faster; is there some impediment to using these commands as opposed to sed in this context?

Also, for our current analysis we are not actually exploiting information in the PY8 .hepmc file, and thus delete it shortly after the run is completed; is there a straightforward way to bypass the gzip call that is taking so long at the "storing pythia8 files..." juncture? This would significantly speed up our usage of the program; it seems that these two steps take an order of magnitude more time than the rest of the program when running in cluster mode, because they are both serial processes.

Thanks for any insights!

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
Valentin Hirschi Edit question
Solved by:
William Shepherd
Solved:
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

For the sed/head/tail, I'm pretty sure that no one tries to optimise this.
Could you do a merge request? or provide your patch?

For the hepmc, the easiest would be to ask pythia8 to not generate such file (I guess that you customise already py8 mode since this is our current only output of py8/...) For that you should refer to the PY8 manual and put the associate flag within the pythia8_card.dat

Cheers,

Olivier

Revision history for this message
William Shepherd (will-shepherd) said :
#2

Hi Olivier,

I've only tested it on a single file, I'll play with what would be needed to get this to work on two example files and place the proposed code here when I have it.

The trouble with just not requesting the hepmc is that it's needed for Delphes, so I can't simply turn it off at the py8 step in the card; the 'storing pythia8 files...' step is outside any process straightforwardly controlled by cards at this time. I don't know what changes would be necessary to produce a switch that tells the code whether or not to gzip those files after they've been used by Delphes, which is why I asked the question.

Best,

Will

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

ah ok,

So you do use the hepmc, I was missing that point.
Ok I see the point now.
I will think on how to add such switch.
My first though would be to have a hidden entry of the run_card but this is a bit un-natural.
A second idea would be to have a special analysis mode for that
analysis=Delphes/nohepmc

Cheers,

Olivier

> On 5 Nov 2019, at 22:27, William Shepherd <email address hidden> wrote:
>
> Question #685621 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/685621
>
> Status: Answered => Open
>
> William Shepherd is still having a problem:
> Hi Olivier,
>
> I've only tested it on a single file, I'll play with what would be
> needed to get this to work on two example files and place the proposed
> code here when I have it.
>
> The trouble with just not requesting the hepmc is that it's needed for
> Delphes, so I can't simply turn it off at the py8 step in the card; the
> 'storing pythia8 files...' step is outside any process straightforwardly
> controlled by cards at this time. I don't know what changes would be
> necessary to produce a switch that tells the code whether or not to gzip
> those files after they've been used by Delphes, which is why I asked the
> question.
>
> Best,
>
> Will
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
William Shepherd (will-shepherd) said :
#4

I have success with the following two lines (to replace the sed command in madevent_interface.py):

os.system(' '.join(['head','-n','-1',hepmc_file,'|','tail','-n','+'+str(n_head),'>','tmpfile']))
os.system(' '.join(['mv','tmpfile',hepmc_file]))

However, I hear rumors that the OSX version of head doesn't have this functionality. I haven't done detailed timing comparisons inside the python code yet either.

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#5

head seems to have the options -n on mac (at least on Mojave)

Thanks,

Olivier

> On 6 Nov 2019, at 15:52, William Shepherd <email address hidden> wrote:
>
> Question #685621 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/685621
>
> William Shepherd posted a new comment:
> I have success with the following two lines (to replace the sed command
> in madevent_interface.py):
>
> os.system(' '.join(['head','-n','-1',hepmc_file,'|','tail','-n','+'+str(n_head),'>','tmpfile']))
> os.system(' '.join(['mv','tmpfile',hepmc_file]))
>
> However, I hear rumors that the OSX version of head doesn't have this
> functionality. I haven't done detailed timing comparisons inside the
> python code yet either.
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
William Shepherd (will-shepherd) said :
#6

Hi Olivier,

My understanding isn't that head is missing the option -n entirely, but rather that at least some versions of head refuse to take negative arguments to that flag.

Best,

Will

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#7

so the use of tail/head has been implemented (testing first if the syntax is supported since not even all unix system support such syntax).

In addition, you can edit the py8 card and set the output to autoremove
and instead of tarring the output at the end, it will remove it.

Cheers,

Olivier

> On 9 Nov 2019, at 15:58, William Shepherd <email address hidden> wrote:
>
> Question #685621 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/685621
>
> William Shepherd posted a new comment:
> Hi Olivier,
>
> My understanding isn't that head is missing the option -n entirely, but
> rather that at least some versions of head refuse to take negative
> arguments to that flag.
>
> Best,
>
> Will
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
William Shepherd (will-shepherd) said :
#8

Hi Olivier,

I just tried this autoremove feature in the latest non-beta download, and it was treated as a path rather than a setting. Has this only been implemented in beta so far?

Thanks,

Will

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#9

Yes indeed.

You can find a link to the version with such feature on this page:
https://bazaar.launchpad.net/~mg5core1/mg5amcnlo/2.6.8/revision/298

Cheers,

Olivier

> On 20 Dec 2019, at 18:47, William Shepherd <email address hidden> wrote:
>
> Question #685621 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/685621
>
> William Shepherd posted a new comment:
> Hi Olivier,
>
> I just tried this autoremove feature in the latest non-beta download,
> and it was treated as a path rather than a setting. Has this only been
> implemented in beta so far?
>
> Thanks,
>
> Will
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
William Shepherd (will-shepherd) said :
#10

Looks good to me; I look forward to these becoming standard features, as they'll speed up our computing quite noticeably. Thanks for your help.

Will

Revision history for this message
William Shepherd (will-shepherd) said :
#11

One follow-up on this; it's a very nice feature, speeding up our analysis pipeline notably, especially when running in a more parallel environment. It would be beneficial to the rest of the world if this feature were documented in the pythia8_card.dat file comments though, along with the others, I imagine.

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#12

done in the development version.