Constant failure/retry in cluster mode
Hello,
I am trying to generate some events in "cluster" mode (multicore mode worked ok)
but I notice that the code keeps resubmitting jobs:
.....
WARNING: resubmit job (for the 2 times)
WARNING: resubmit job (for the 2 times)
WARNING: resubmit job (for the 2 times)
....
(I set the max retry up to 10 times and for some of them the limit is hit,
but not for all of them)
assuming that the cause for re-submitting is a failure,
is there a way to know the reason for the failure? all I get is:
CRITICAL: Fail to run correctly job 1263680.
with option: {'log': None, 'stdout': None, 'argument': [], 'nb_submit': 10, 'stderr': None, 'prog': 'ajob62', 'output_files': ['G13j'], 'time_check': 1420329234.815836, 'cwd': '/storage/
file missing: /storage/
Fails 10 times
No resubmition.
thanks,
Valerio
Question information
- Language:
- English Edit question
- Status:
- Answered
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask Valerio Dao for more information if necessary.