slurm cluster lingering
Hi Olivier,
I am trying to run gridpack on a slurm cluster. I have modified mg5_configurati
cluster_type = slurm
cluster_queue = None #
cluster_size = 150
cluster_
I have commented out the rest (like cluster_local_path and cluster_temp_path).
The event generation runs normally but then it gets stuck:
Working on SubProcesses
INFO: P1_qq_n2x1pqq
INFO: P1_gg_n2x1pqq
INFO: P1_gq_n2x1pgq
INFO: P1_qq_n2x1pgg
INFO: Idle: 797, Running: 593, Completed: 843 [ 3.8s ]
INFO: Idle: 788, Running: 593, Completed: 852 [ 4.2s ]
INFO: Idle: 674, Running: 590, Completed: 969 [ 35.2s ]
INFO: Idle: 532, Running: 589, Completed: 1112 [ 1m 6s ]
INFO: Idle: 385, Running: 593, Completed: 1255 [ 1m 37s ]
INFO: Idle: 237, Running: 593, Completed: 1403 [ 2m 8s ]
INFO: Idle: 66, Running: 584, Completed: 1583 [ 2m 39s ]
INFO: Idle: 0, Running: 443, Completed: 1790 [ 3m 10s ]
INFO: Idle: 0, Running: 275, Completed: 1958 [ 3m 41s ]
INFO: Idle: 0, Running: 198, Completed: 2035 [ 4m 12s ]
INFO: Idle: 0, Running: 139, Completed: 2094 [ 4m 42s ]
INFO: Idle: 0, Running: 98, Completed: 2135 [ 5m 13s ]
INFO: Idle: 0, Running: 62, Completed: 2171 [ 5m 43s ]
INFO: Idle: 0, Running: 44, Completed: 2189 [ 6m 14s ]
INFO: Idle: 0, Running: 29, Completed: 2204 [ 6m 44s ]
INFO: Idle: 0, Running: 21, Completed: 2212 [ 7m 14s ]
INFO: Idle: 0, Running: 18, Completed: 2215 [ 7m 45s ]
INFO: Idle: 0, Running: 17, Completed: 2216 [ 8m 16s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 8m 46s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 9m 16s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 9m 46s ]
INFO: Start to wait 60s between checking status.
Note that you can change this time in the configuration file.
Press ctrl-C to force the update.
INFO: Idle: 0, Running: 16, Completed: 2217 [ 10m 47s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 11m 47s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 12m 47s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 13m 48s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 14m 48s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 15m 48s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 16m 49s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 17m 49s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 18m 51s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 19m 51s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 20m 51s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 21m 52s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 22m 52s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 23m 52s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 24m 53s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 25m 53s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 26m 53s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 27m 53s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 28m 54s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 29m 54s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 30m 54s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 31m 55s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 32m 55s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 33m 55s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 34m 55s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 35m 56s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 36m 56s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 37m 56s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 38m 56s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 39m 57s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 40m 57s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 41m 57s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 42m 58s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 43m 58s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 44m 58s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 45m 59s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 46m 59s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 47m 59s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 49m 1s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 50m 1s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 51m 2s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 52m 2s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 53m 2s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 54m 4s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 55m 4s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 56m 4s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 57m 4s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 58m 5s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 59m 5s ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 1h 0m ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 1h 1m ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 1h 2m ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 1h 3m ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 1h 4m ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 1h 5m ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 1h 6m ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 1h 7m ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 1h 8m ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 1h 9m ]
INFO: Idle: 0, Running: 16, Completed: 2217 [ 1h 10m ]
I feel that it is a problem with writing the output since I ran this process on my computer and it finished and jumped to the next step. Is there anything wrong that I have done regarding the cluster set up? Note that I did not change anything in cluster.py.
Thank you,
Amin
Question information
- Language:
- English Edit question
- Status:
- Answered
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
This question was reopened
Can you help with this problem?
Provide an answer of your own, or ask Amin Aboubrahim for more information if necessary.