Timeouts in newer MG5_aMC Versions — Configurable?

Asked by Zachary Marshall

Hi there,

We've been seeing an increasing number of timeouts when running MG5_aMC recently. This is correlated with our moves to newer versions, but I've not been able to identify a killer moment at which the timeouts started occurring. There are two places we've seen them so far.

1) In model loading, around L3705 of madgraph/interface/common_run_interface.py:

             AskforEditCard.update_dependent(interface, interface.me_dir, card, path, timer=20)

2) in LHAPDF loading, in get_lhapdf_version_static of madgraph/interface/common_run_interface.py , which seems to ultimately be called from here:

        self.do_update('dependent', timer=20)

We're trying to track down the source of these timeouts on our side, to see if anything has changed, but I wanted to ask two questions of you all in the meantime. First: has anything changed to do with these timeouts or their treatment in recent releases? Second: could the timeouts be made configurable (rather than hard-coded to 20) so that we can make them a bit longer in systems that need a little extra time?

Thanks,
Zach

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi,

So the reasons are multiple:
1) In old version of the code, this function was not dependent of lhapdf and therefore the timeout was only trigger for slow/big model (and was correctly handle: which is stop the function, warn the user and continue the code without that optional feature)
2) we were not expecting that the lhapdf would be so slow that the timeout will occur within lhapdf call and therefore the catching of the timeout signal was/is not yet intercepted correctly if the timeout occur during such call.

I did provide to you in the previous thread about this issue, the patch to avoid the issue (not sure which solution I picked either allow for infinite time for lhapdf to react or change the exception handling such that it cancel the call to lhapdf if the timeout occurs).

> Second: could the timeouts be made configurable (rather than hard-coded to 20) so that we can make them a bit longer in systems that need a little extra time?

I do not see a point to make that timeout configurable. Either you care about the feature and then you have to run explicitly (and then it run without timer) or you do not really care about it and then the default timer is reasonable. Having an mix between the two mode is obviously possible but I do not see the point.

I do understand that the crash due to a super slow lhapdf is an issue but the patch should solve that issue without having to introduce an additional parameter.

Revision history for this message
Zachary Marshall (zach-marshall) said :
#2

Thanks Olivier. The point about LHAPDF is clear; what about the model loading? Should we also disable these?

I'd be ok with a top-level switch to disable them (and then it's up to us to do something sensible) — wouldn't that be nicer than our privately patching MG5_aMC to disable them for each new version?

Thanks,
Zach

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

You want to do which version?
Always disable the call to the function or disable the timer?

I guess always disable the call to the function can indeed make sense in your case.
Since this function is mainly for people that provide inconsistent BSM information in the param_card

Cheers,

Olivier

> On 19 Oct 2023, at 17:40, Zachary Marshall <email address hidden> wrote:
>
> Question #708223 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/708223
>
> Status: Answered => Open
>
> Zachary Marshall is still having a problem:
> Thanks Olivier. The point about LHAPDF is clear; what about the model
> loading? Should we also disable these?
>
> I'd be ok with a top-level switch to disable them (and then it's up to
> us to do something sensible) — wouldn't that be nicer than our privately
> patching MG5_aMC to disable them for each new version?
>
> Thanks,
> Zach
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Zachary Marshall (zach-marshall) said :
#4

We're in MG5_aMC 3.5.1 at the moment, but if you just put it into the main branch we'll get it eventually.

I would be inclined to disable the timer — the call was timing out when loading a restrict card. If we're loading a restrict card that is well-tested, is it in fact safe to just disable the checks? If so, that'd also be fine with me! We can make sure restrict cards used in production are well tested.

Thanks,
Zach

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#5

Hi Zach,

To be clear those type of check is that relation between parameter of the param_card follows the rule of the UFO model.
For the SM for example, we will check that the W mass is set correctly according to gauge invariance and therefore set according to the standard function that give the W mass for a given value of G_F, 1/a_EW and MZ.

On the MG5aMC side, those checks are irrelevant and not needed since the relation between all the parameters will anyway be ensure at the Fortran level and therefore will (always) be correct when doing the computation.

The main points of those check is
1) to make the user aware of the inconsistency of the param_card
2) indicate him how those inconsistency are handle (which should be 100% clear anyway)
3) update the param_card to fix those inconsistency such that correct/consistent information will be passed to program like pythia/...

Since
1) such checks can be quite cpu/io intensive for some model (since it force to (re)load the UFO, convert it to the internal MG format, apply the restriction, and then do the check).
2) that such check are useless for MG5aMC (and most of the time for subsequent code as well)
3) that this is not supposed to be our job to check the input.
4) that they are cases where the check is not going to work anyway (and that UFO model might not include all the relations anyway).
5) That for years we never check such type of consistency (and that before MG5, the computation would even be potentially invalid for invalid input)

So yes, fully bypass those check makes more sense that removing the timer.
Now doing that via patch would be dangerous since they are cases where
1) the user force to run the function without timer if he want to update the parameter (usefull if he want/need to use the functionality but that the functionality is too slow
2) we actually need such type of functionality (for the extended running option of the UFO supported in MG5aMC since version3.4 but no UFO model are yet publicly available) So it is important that we can call such function without a timer for some specific model/situation. (and such situation can re-appear more in the future).

So my suggestion here, is that you do not touch to that part of the code but maybe I miss a point here.

Cheers,

Olivier

Revision history for this message
Zachary Marshall (zach-marshall) said :
#6

Hi Olivier,

Everything that you wrote is very clear and helpful; I'm just having trouble understanding what we should therefore do.

You say that we shouldn't touch the code, but this check is timing out regularly for some of our models. Are we simply not able to run those models in MG5_aMC for the time being?

Thanks,
Zach

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#7

In principle the timeout is canceling the function and the code is continuing.

Given your message, I guess this is not what is happening.
So maybe the patch that you are using is creating a new issue.

Let me take a look at what I have done in 3.5.2 to see if indeed the patch (at least the one in the branch) is potentially problematic.
Maybe you can copy-paste the content of the debug file for me to check what is going on.

Cheers,

Olivier

Revision history for this message
Zachary Marshall (zach-marshall) said :
#8

Hi Olivier,

Unfortunately (?), there's no debug file, because there's no crash:

INFO: Update the dependent parameter of the param_card.dat
WARNING: update the strong coupling value (alpha_s) to the value from the pdf selected: 0.13
WARNING: The model takes too long to load so we bypass the updating of dependent parameter.
This might create trouble for external program (like MadSpin/shower/...)
The update can be forced without timer by typing 'update dependent' at the time of the card edition

The notion of running 'update dependent' isn't super clear to me, as this is loading the model with a restrict card that times out, not injecting or modifying on the fly a param card. Is it safe to "just" run that again after loading the model to get the same checks without a timer? Do we need to inject that at some other moment?

Thanks,
Zach

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#9

Good that they are no crash, this is then the expected behavior.

>The notion of running 'update dependent' isn't super clear to me, as this is loading the model with a restrict card that times out, not injecting or modifying on the fly a param card.

To force to have the fix of the dependent parameter, you need to answer this question:

Do you want to edit a card (press enter to bypass editing)?
/------------------------------------------------------------\
| 1. param : param_card.dat |
| 2. run : run_card.dat |
| 3. plot : plot_card.dat |
\------------------------------------------------------------/
 you can also
   - enter the path to a valid card or banner.
   - use the 'set' command to modify a parameter directly.
     The set option works only for param_card and run_card.
     Type 'help set' for more information on this command.
   - call an external program (ASperGE/MadWidth/...).
     Type 'help' for the list of available command
 [0, done, 1, param, 2, run, 3, plot, enter path][90s to answer]
>

by "update dependent"
this is an other valid function for that command like the "set mass 6 170".
(which is obviously also supporting in case of scripting).
To make sense this command need to be used AFTER the edition of the other parameter of the param_card (and the selection of the pdf within the run_card).

The check here are really dependent of the param_card so this is really a check that only makes sense after the edition of the cards.

Note that running the function by hand, we will not prevent the function to be call later with the default timer (and therefore you will still see the warning even in that case you know that we have our best).

 Cheers,

Olivier

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#10

To avoid to have the warning if you have run the "update dependent" function (and therefore without the timer)
I have added a variable that is tracking if you have run the function manually or not:

Obviously not a critical:
https://github.com/mg5amcnlo/mg5amcnlo/commit/746657dec7a72f04a325c9fb1346eeb4c7114e69

Revision history for this message
Zachary Marshall (zach-marshall) said :
#11

Hi Olivier,

Ok, thanks for this. What's the expected workflow for production, when we aren't sitting at a terminal typing? Forcing input via piping to stdin is extremely fragile (if you add or change a prompt, it breaks immediately), so I'd really rather avoid that. We add a restrict card, run this thing, and then? Do we check that the output card is the same as the restrict card, and if so ignore the warning for future runs because the restriction is fine?

Thanks again,
Zach

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#12

It is indeed not recommended to use piping.

You should use this
./bin/mg5_aMC PATH
and not
./bin/mg5_aMC < PATH

The first one will validate each answer step, if an entry is not valid for the current function, then it will be kept for the next question.
We are careful with the interface to avoid new interface feature can break retro-compatibility.

So for example if you have already a directory MYRUN
the following script:
generate p p > t t~
output MYRUN
launch

would fail with piping (due to the confirmation request on existing directory) but not if you run via the argument mode.
FAQ #2186: “How to script MG5 run?”.

Revision history for this message
Zachary Marshall (zach-marshall) said :
#13

Hi Olivier,

Sure — but again, in production we don't have a prompt, so what are you recommending we do to avoid this warning about dependent parameter setting (or should we just ignore it)?

Thanks,
Zach

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#14

What do you mean by "you do not have a prompt"?
running
./bin/mg5_aMC PATH
Does not need a "prompt". It is just an executable with a parameter

If your issue is with using ./bin/mg5_aMC and that you prefer using ./bin/generate_events, this is not a problem since you can also use
 ./bin/generate_events PATH in the same way.
and if you want to edit the cards outside of MG5aMC control, then the only line that you can put in that file is "update dependent".

But if doing that is also problematic, then yes the solution is likely to just ignore it.

Revision history for this message
Zachary Marshall (zach-marshall) said :
#15

Sorry, let me ask the question in full length — probably more verbose than you need, but I want to be sure it's clear enough.

I asked about two timeouts. For the LHAPDF one the answer is clear — we patch privately and either disable the timeout or provide a custom handling for the exception to continue.

For the model loading timeout, which looks like this:

WARNING: The model takes too long to load so we bypass the updating of dependent parameter.

You said:

To force to have the fix of the dependent parameter, you need to answer this question...by "update dependent"

This is happening when we use the default restrict card for a model, without editing the param card at all. The first step we run looks something like:

generate p p > t t~
output MYRUN

After that, a user is in principle allowed to modify the param or run cards. The second step (event generation) we run via

 ./bin/generate_events PATH

What I'm not following here is what we should be doing in order to run "update dependent" in that workflow. You say here:

"...the only line that you can put in that file is "update dependent"."

What is "that file"? I don't see that ./bin/generate_events is able to take an input configuration file:

help generate_events
syntax: generate_events [run_name] [options]
-- Launch the full chain of script for the generation of events
   Including possible plotting, shower and detector resolution.
   Those steps are performed if the related program are installed
   and if the related card are present in the Cards directory.
-- local options:
      -f : Use default for all questions.
      --laststep= : argument might be parton/pythia/pgs/delphes and indicate the last level to be run.
      -M : in order to add MadSpin
      -R : in order to add the reweighting module
-- session options:
      Note that those options will be kept for the current session
      --cluster : Submit to the cluster. Current cluster: condor
      --multicore : Run in multi-core configuration
      --nb_core=X : limit the number of core to use to X.

Thanks, and apologies if I'm being dense here; I feel like there's something obvious I'm missing...
Zach

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#16

Sorry indeed was my error here.

Indeed I did not implement the way to pass command file from that file.
The way to pass command file for event generation is actually possible but via the more versatile madevent script
where you can do
./bin/madevent PATH

where PATH is the path to a file, and in order to achieve what you want to do here it has to have the following lines:
launch
update dependent

Sorry for my confusion,

Olivier

Can you help with this problem?

Provide an answer of your own, or ask Zachary Marshall for more information if necessary.

To post a message you must log in.