MadGraph5_aMC@NLO

Madgraph gets very slow with several instances

Asked by Daniel Schieber on 2023-09-07

Hello,
I'm working with MG5 to calculate Cross-sections in 2 Higgs doublet models. Since I have a lot of parameter points, i am trying to run several instances of mg5 at the same time with MPI on multiple cpus. However, if I increase the number of processes, the MadGraph runtimes (process generation and launch) are getting much slower. For example: with n=50. The Cross-section calculation takes around 30-40 seconds. If I increase n=500, then the same calculations take 200-300 seconds on average. I don't know, what is causing this behaviour. I will try to describe my steps with as much detail as possible. MadGraph is Running in the Run_mode 0 (single maschine) I thought it means single core.
Now I start N instances with MPI.
I control mg5 with python subprocess (mg5 proc_card.dat)
Each instance generates the process "e+ e- > z h2 h2" and outputs it into its own folder.
Then I launch them with parameter_cards from spectrum files.

Now if I measure the time spend on the generation and cross-section calculation, it increases a lot with increasing n, even though I would expect the different MPI processes to be independent of each other.
I am using several machines with 20 cores each. I am measuring only the mg5 runtimes, without the MPI communication (sending cross-sections, parameters etc.)

Maybe someone can help me to resolve this issue.

Greetings

Daniel

Question information

Language:: English Edit question

Status:: Answered

For:: MadGraph5_aMC@NLO Edit question

Assignee:: No assignee Edit question

Last query:: 2023-09-07

Last reply:: 2023-09-07

Link existing bug

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) said on 2023-09-07:

Hi,

We have never try to do something like that.

Do you create your directory on a local file system or on a nfs one?
In the second case you are creating a lot of file on a shared filesystem and the reason of your slowdown might be that you are saturating the filesystem.

Now it can also be due to the fact that you add barrier at some level and that means that some computation have to wait that all other node have finished before moving to the next one. You likely need to check that you use non blocking communication.

But those are quite basic suggestion, difficult to do better, sorry

Cheers,

Olivier

Revision history for this message

Daniel Schieber (themaker845) said on 2023-09-07:

Hello,
Thank you for the quick response.
The filesystem is shared, so probably something like that.
I should not have Barriers and each parameter calculation is carried on its own.
I will ask the admin for help.

Greetings

Daniel

Can you help with this problem?

Provide an answer of your own, or ask Daniel Schieber for more information if necessary.

To post a message you must log in.

Ask a question

Edit question

MadGraph5_aMC@NLO

Madgraph gets very slow with several instances

Question information

Related bugs

Related FAQ:

Can you help with this problem?

Subscribers