Large number of threads remain open
Hello everyone, and sorry to bother you again.
I am running Madgraph on a big shared computer, where each user is given a tight limit on the number of open threads (~ a few hundreds).
I am hitting the thread limit quite rapidly if I leave the code running for some time, for example when doing a scan in a parameter.
I think Madgraph opens new threads frequently, and will leave them idle in the background after their job is finished.
By looking at the function MultiCore.worker() in cluster.py, I take that a thread only terminates if its job ends with an error. If it is given no job at all, or if its job returns normally, the thread is left hanging.
Setting run_mode to 0 does not fix this issue, threads are still opened and left idle, just in a smaller number.
I am asking:
1. If this is intended behaviour, and
2. if the implementation of a watchdog timer, that will close a thread if it has not been working for the last x seconds, is a good/safe idea in my situation.
Thank you,
Claudio
Question information
- Language:
- English Edit question
- Status:
- Answered
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask Claudio Severi for more information if necessary.