CalcHEP

How to efficiently run a MicrOmegas executable across various work nodes simultaneously on a cluster

Asked by Prudhvi Bhattiprolu on 2024-04-04

Dear MicrOmegas Team,

I have a MicrOmegas executable for a model (MyModel) in the login node of a linux cluster: ~/micromegas_6.0/MyModel/main. And, to do a huge parameter space scan over, say, a million different input points, I am running this executable on 1000 work nodes simultaneously such that each of 1000 work nodes scans over 1000 input points. The only issue is that the same executable (located in the login node) when ran simultaneously on 1000 work nodes is taking a lot longer to compute than when ran on only machine at once. For comparison, the run time of MicrOmegas executable per input point in my model (with fast=1 and VZdecay=VWdecay=0 for relic abundance computation) when:

Ran only on one node at once: < 1 second
Ran simultaneously on 1000 nodes: anywhere from < 1 second to 500 seconds or more!

Although the cases where the run time is 500 seconds are rare, the total runtime is still being dominated by these rare occurrences, especially for huge parameter space scans. I suspect that this is because all I/O operations are taking place in the login node, and while the executable is running on one work node, all the other work nodes are perhaps just waiting for their turn? To fix this, I tried copying the executable from the login node to each of the work nodes but that doesn't seem to help. So, I was wondering if there is a way to fix this issue? Is there a setting for parallel runs that I am missing? If not would it help to install MicrOmegas and generate the executable locally on the work node for each job? Any help on this would be great!

Please let me know if there is anything else needed from my end! Thanks a lot!

Best,
Prudhvi

Question information

Language:: English Edit question

Status:: Answered

For:: CalcHEP Edit question

Assignee:: No assignee Edit question

Last query:: 2024-04-04

Last reply:: 2024-04-11

Link existing bug

Revision history for this message

Alexander Pukhov (pukhov) said on 2024-04-11:

Sorry for such a late response.

I guess the problem is the following. In principle we assume that
one micromegas executable file can be launched parallel from different
points of disk space.

But micromegas generates libraries of matrix elements which are
stored in directory work/so_generated. And at this point 1000
processes can waited for a one which generates a library, if this
library needs for all of them. You should see on the screen a message
"NEW PROCESS ..." when library is generated. Different libraries can
be generated simultaneously. But as a rule there are several key
reactions which need for all model parameters.

One trick used by people is the following. We can start one session
with Beps=0. All libraries will be generated in one session. Then
you should not have problem in subsequent calculations. People used
it for darkOmega and darkOmega2. For darkOmegaN one can have a problem
with huge number of shared loaded libraries.

Let me know about your result. It should be nice to get recommendations
/ improve micromegas for parallel calculations.

Best

Alexander Pukhov

On 4/5/24 14:56, Prudhvi Bhattiprolu wrote:
> New question #763290 on CalcHEP:
> https://answers.launchpad.net/calchep/+question/763290
>
> Dear MicrOmegas Team,
>
> I have a MicrOmegas executable for a model (MyModel) in the login node of a linux cluster: ~/micromegas_6.0/MyModel/main. And, to do a huge parameter space scan over, say, a million different input points, I am running this executable on 1000 work nodes simultaneously such that each of 1000 work nodes scans over 1000 input points. The only issue is that the same executable (located in the login node) when ran simultaneously on 1000 work nodes is taking a lot longer to compute than when ran on only machine at once. For comparison, the run time of MicrOmegas executable per input point in my model (with fast=1 and VZdecay=VWdecay=0 for relic abundance computation) when:
>
> Ran only on one node at once: < 1 second
> Ran simultaneously on 1000 nodes: anywhere from < 1 second to 500 seconds or more!
>
> Although the cases where the run time is 500 seconds are rare, the total runtime is still being dominated by these rare occurrences, especially for huge parameter space scans. I suspect that this is because all I/O operations are taking place in the login node, and while the executable is running on one work node, all the other work nodes are perhaps just waiting for their turn? To fix this, I tried copying the executable from the login node to each of the work nodes but that doesn't seem to help. So, I was wondering if there is a way to fix this issue? Is there a setting for parallel runs that I am missing? If not would it help to install MicrOmegas and generate the executable locally on the work node for each job? Any help on this would be great!
>
> Please let me know if there is anything else needed from my end! Thanks a lot!
>
> Best,
> Prudhvi
>

Sorry  for such a late  response.

I guess the problem is the following.  In principle we   assume that   
one  micromegas executable file can be launched parallel from different 
points of disk space.

But   micromegas generates libraries   of  matrix elements which are 
stored in directory work/so_generated.  And  at this point 1000 
processes can  waited for a one which generates a library, if this 
library needs for all of them.  You should see on the screen a  message 
"NEW PROCESS ..."  when library is generated. Different libraries  can 
be generated simultaneously.  But  as a rule there are  several key 
reactions which need  for all  model parameters.

One trick used by people is the  following.  We can start one session 
with Beps=0. All  libraries will be generated in one session.  Then 
you   should not have problem in  subsequent calculations.  People used 
it for darkOmega and darkOmega2.  For darkOmegaN one can have a problem 
with huge number of  shared loaded libraries.

Let me  know about your result. It should be nice to get recommendations 
/ improve micromegas for parallel calculations.

Best

Alexander Pukhov

Revision history for this message

Prudhvi Bhattiprolu (prudhvibhattiprolu) said on 2024-04-26 (last edit on 2024-04-26):

Dear Alexander,

Thank you so much for your response, and I apologize for my late reply.

Setting Beps=0 successfully generated all the necessary libraries in one session, making subsequent computations much faster on the login node. However, I'm still experiencing the same issue when running the executable in parallel across multiple work nodes.

My current plan is to install micromegas and generate the executable locally on each work node at the beginning of the job, then remove it at the end. I will also use the Beps=0 trick to generate all the libraries on each work node separately. This approach may cause a slight overhead (a few minutes) before computations start on each node, but I expect it will be faster overall. I'll share my findings in the next few days.

Thanks again for your help,
Prudhvi

Can you help with this problem?

Provide an answer of your own, or ask Prudhvi Bhattiprolu for more information if necessary.

To post a message you must log in.

Ask a question

Edit question

CalcHEP

How to efficiently run a MicrOmegas executable across various work nodes simultaneously on a cluster

Question information

Related bugs

Related FAQ:

Can you help with this problem?

Subscribers