How to efficiently run a MicrOmegas executable across various work nodes simultaneously on a cluster
Dear MicrOmegas Team,
I have a MicrOmegas executable for a model (MyModel) in the login node of a linux cluster: ~/micromegas_
Ran only on one node at once: < 1 second
Ran simultaneously on 1000 nodes: anywhere from < 1 second to 500 seconds or more!
Although the cases where the run time is 500 seconds are rare, the total runtime is still being dominated by these rare occurrences, especially for huge parameter space scans. I suspect that this is because all I/O operations are taking place in the login node, and while the executable is running on one work node, all the other work nodes are perhaps just waiting for their turn? To fix this, I tried copying the executable from the login node to each of the work nodes but that doesn't seem to help. So, I was wondering if there is a way to fix this issue? Is there a setting for parallel runs that I am missing? If not would it help to install MicrOmegas and generate the executable locally on the work node for each job? Any help on this would be great!
Please let me know if there is anything else needed from my end! Thanks a lot!
Best,
Prudhvi
Question information
- Language:
- English Edit question
- Status:
- Answered
- For:
- CalcHEP Edit question
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask Prudhvi Bhattiprolu for more information if necessary.