Readonly gridpack mode costs more time in "madevent"

Asked by Congqiao Li

Dear experts,

I am working on the validation of the readonly gridpack mode from a CMS study. I recently noticed that in the readonly gridpack mode, more calculation is done in the "madevent" step than the normal gridpack mode, hence costing more time. I can reproduce this issue in a w+0123jet LO gridpack produced by V2.6.1 and V2.7.2 [1], but no longer in the latest version V3.1.0 [2].

As in the experiment some old version gridpack may be recycled for use, I am interested to learn what causes the issue and how it was fixed.
As far as I have reached, the "madevent" job is launched by a script "ajob1" in each subprocess. For a readonly mode routine, the program release "ajob1" and launch it from *an empty* directory, i.e. there is no "G*" subfolders. However, I noticed that some file e.g. "ftn25" in the "G*" folder may be used during the "madevent" step -- if I manually launch "ajob1" with "ftn25" file in the "G*" folder exists, the job can be finished much faster [3] (which I think is the condition for normal gridpack mode). I am not sure how the input file "ftn25" functions, but because they are produced at the gridpack generation stage, I wonder they may be somehow useful when generating events.

Thus I have the following question:
1) What causes the extra calculation in the readonly gridpack mode; and what is being fixed in the latest version so the issue is solved?
2) What is exactly "ftn25" and how it plays into a "madevent" process.
3) For gridpacks already produced in the previous version, is it possible to solve it with a patch as well? Will there be other physical problem in the old version gridpack for readonly mode w.r.t. the normal mode (e.g. a different kinematics distribution) ?

Thank you very much for your time and answers.

Best regards,
Congqiao

----
Some materials I use:
[1] V2.7.2 gridpack: https://coli.web.cern.ch/coli/tmp/.210430-195717_mgtest/w3jet_mg272_fix.tar.gz
[2] V3.1.0 gridpack: https://coli.web.cern.ch/coli/tmp/.210430-195717_mgtest/w3jet_mg310.tar.gz
[3] a standalone "ajob1" test (grab from a CMS gridpack, in V2.6.1): https://coli.web.cern.ch/coli/tmp/.210430-195717_mgtest/standalone_ajob1_test.tar.gz
Untar it in a folder, launch "./ajob1" for the first time and it costs ~2 min to finish. Then launch it for a second time (now G117/ftn25 exists) and it costs only ~20 s.

***Update in question***
The issue can in fact be reproduced in the latest v.3.1.0 so it is not yet fixed in the published version, as detailed in #3.
For an update of [2] please see [2']
[2'] 3.1.0 gridpack: https://coli.web.cern.ch/coli/tmp/.210430-195717_mgtest/w3jet_mg310_2.tar.gz

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Congqiao Li
Solved:
Last query:
Last reply:

This question was reopened

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi,

This has been fixed a while ago (2 years ago?) after that issue being identified by CMS and reported to us.
The issue is indeed that the integration grid (the ftn25 file) was not correctly re-used in read-only mode.
A patch was already provided to CMS 2 years ago (I guess one can find the question or bu report on launchpad)

Cheers,

Olivier

> On 1 May 2021, at 11:45, Congqiao Li <email address hidden> wrote:
>
> New question #696856 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/696856
>
> Dear experts,
>
> I am working on the validation of the readonly gridpack mode from a CMS study. I recently noticed that in the readonly gridpack mode, more calculation is done in the "madevent" step than the normal gridpack mode, hence costing more time. I can reproduce this issue in a w+0123jet LO gridpack produced by V2.6.1 and V2.7.2 [1], but no longer in the latest version V3.1.0 [2].
>
> As in the experiment some old version gridpack may be recycled for use, I am interested to learn what causes the issue and how it was fixed.
> As far as I have reached, the "madevent" job is launched by a script "ajob1" in each subprocess. For a readonly mode routine, the program release "ajob1" and launch it from *an empty* directory, i.e. there is no "G*" subfolders. However, I noticed that some file e.g. "ftn25" in the "G*" folder may be used during the "madevent" step -- if I manually launch "ajob1" with "ftn25" file in the "G*" folder exists, the job can be finished much faster [3] (which I think is the condition for normal gridpack mode). I am not sure how the input file "ftn25" functions, but because they are produced at the gridpack generation stage, I wonder they may be somehow useful when generating events.
>
> Thus I have the following question:
> 1) What causes the extra calculation in the readonly gridpack mode; and what is being fixed in the latest version so the issue is solved?
> 2) What is exactly "ftn25" and how it plays into a "madevent" process.
> 3) For gridpacks already produced in the previous version, is it possible to solve it with a patch as well? Will there be other physical problem in the old version gridpack for readonly mode w.r.t. the normal mode (e.g. a different kinematics distribution) ?
>
> Thank you very much for your time and answers.
>
> Best regards,
> Congqiao
>
> ----
> Some materials I use:
> [1] V2.7.2 gridpack: https://coli.web.cern.ch/coli/tmp/.210430-195717_mgtest/w3jet_mg272_fix.tar.gz
> [2] V3.1.0 gridpack: https://coli.web.cern.ch/coli/tmp/.210430-195717_mgtest/w3jet_mg310.tar.gz
> [3] a standalone "ajob1" test (grab from a CMS gridpack, in V2.6.1): https://coli.web.cern.ch/coli/tmp/.210430-195717_mgtest/standalone_ajob1_test.tar.gz
> Untar it in a folder, launch "./ajob1" for the first time and it costs ~2 min to finish. Then launch it for a second time (now G117/ftn25 exists) and it costs only ~20 s.
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#2

Sorry for the long delay on this,

Could you test that patch:

=== modified file 'Template/LO/bin/internal/restore_data'
--- Template/LO/bin/internal/restore_data 2020-04-12 20:45:22 +0000
+++ Template/LO/bin/internal/restore_data 2021-05-28 18:32:54 +0000
@@ -59,11 +59,13 @@
  done
  for j in $1_ftn26.gz ; do
      if [[ -e $j ]]; then
- rm -f ftn26 >& /dev/null
+ rm -f ftn26 ftn25 >& /dev/null
   rm -f $1_ftn26 >& /dev/null
   gunzip $j
   cp $1_ftn26 ftn26
   gzip $1_ftn26
+ ln -s ftn26 ftn25 # usefull for readonly gridpack
+
      fi
  done
  cd ../

It does not seems to change anything on my laptop but maybe I used a wrong example where I'm dominated by other factor.
Now that patch will create warning for normal gridpack. I think that they are fine but one would need to double check.

Olivier

Revision history for this message
Congqiao Li (colizz) said :
#3

Thank you very much Olivier! I am starting to launch the test.

Revision history for this message
Congqiao Li (colizz) said :
#4

Hi Olivier,

After a test on the patch (using the latest v3.1.0), it seems to me the issue still occurs.
Please see the result in slide-3 of the attached PDF [1].

More specifically, I tested an "incorrect" case that I am using "readonly" mode without running "restore_data". The results show that everything is the same as we run "restore_data" - both the output LHE and the run time. So my understanding is that in the "readonly" mode the code still fails to use the integration grid from the gridpack.

This result can be produced with the description in slide-3. Thanks for your further investigation and time.

[1] https://coli.web.cern.ch/coli/tmp/.210430-195717_mgtest/gentest_v2.pdf

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#5

Ok yes, you can scrap the previous patch.
on my laptop the following one seems to have an significant impact. Can you cross-check?

=== modified file 'madgraph/madevent/gen_ximprove.py'
--- madgraph/madevent/gen_ximprove.py 2021-03-09 12:29:32 +0000
+++ madgraph/madevent/gen_ximprove.py 2021-05-29 19:28:48 +0000
@@ -1903,6 +1903,9 @@
                     'packet': None,
                     }

+ if self.readonly:
+ basedir = pjoin(os.path.dirname(__file__), '..','..','SubProcesses', info['P_dir'], info['directory'])
+ info['base_directory'] = basedir

             jobs.append(info)

Revision history for this message
Congqiao Li (colizz) said :
#6

Thank you Olivier! This patch works well and solves the time discrepancy issue in "readonly" mode.

Revision history for this message
Congqiao Li (colizz) said :
#7

Hi Olivier,

A quick question: will this fix be made to the future MadGraph version? (for 2.9.x and for 3.1.x)

Many thanks.

Revision history for this message
Congqiao Li (colizz) said :
#8

Sorry, I just saw it is fixed since version 2.9.4 and 3.1.1. Thank you!