Avoid emails from LSF for job completion "bsub -o /dev/null -e /dev/null"
Hello,
on LSF system there is sometimes a default email at the completion of the job. This can be avoided putting
bsub -o /dev/null -e /dev/null options in the submission command.
How can I configure MG5 to pass these options to bsub? I cannot find anything like this in the input/mg5_
Thanks a lot for your support,
Roberto
Question information
- Language:
- English Edit question
- Status:
- Solved
- Assignee:
- No assignee Edit question
- Solved by:
- Roberto Franceschini
- Solved:
- Last query:
- Last reply:
Revision history for this message
|
#1 |
https:/
Cheers,
Olivier
On Jul 15, 2014, at 12:03 PM, Roberto Franceschini <email address hidden> wrote:
> New question #251681 on MadGraph5_aMC@NLO:
> https:/
>
> Hello,
> on LSF system there is sometimes a default email at the completion of the job. This can be avoided putting
> bsub -o /dev/null -e /dev/null options in the submission command.
>
> How can I configure MG5 to pass these options to bsub? I cannot find anything like this in the input/mg5_
>
> Thanks a lot for your support,
> Roberto
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.
Revision history for this message
|
#2 |
Dear Olivier, thanks a lot for pointing out this resource in the FAQ that is very interesting by itself and might be useful in the future independently of the issue at hand today.
I have tried to modify the code in a couple of different way, but I am afraid I am missing something crucial ...
If I understand correctly I had to modify the cluster.py file, which I did. I have added a value for the stdout and stderr, so that, if I understand the code, bsub will get the -o /dev/null -e /dev/null option.
After modifying the code cluster.py I have removed the .pyo so that at it gets generated as needed (I am not python-able, I hope I understand what's going on with the creation of the pyo file)
Here is my function:
@multiple_try()
def submit(self, prog, argument=[], cwd=None, stdout=None, stderr=None, log=None,
"""Submit the job prog to an LSF cluster"""
############ I PUT THIS PART #############
############ I PUT THIS PART #############
me_dir = os.path.
me_dir = misc.digest(
if not me_dir[
me_dir = 'a' + me_dir[1:]
text = ""
command = ['bsub ', '-J', me_dir]
if cwd is None:
cwd = os.getcwd()
else:
text = " cd %s;" % cwd
if stdout and isinstance(stdout, str):
if stderr and isinstance(stdout, str):
elif stderr == -2: # -2 is subprocess.STDOUT
pass
if log is None:
log = '/dev/null'
text += prog
if argument:
text += ' ' + ' '.join(argument)
if self.cluster_queue and self.cluster_queue != 'None':
a = misc.Popen(command, stdout=
output = a.communicate(
#Job <nnnn> is submitted to default queue <normal>.
try:
id = output.
except:
raise ClusterManagmen
if not id.isdigit():
raise ClusterManagmen
return id
I have generated a simple process to test this out and the thing is stuck at
INFO: Running Survey
Creating Jobs
Working on SubProcesses
P0_qq_qq
Start waiting for update. (more info in debug mode)
and in the end gives
Command "generate_events -f" interrupted with error:
Exception: [Fail 5 times]
['bsub ', '-J', 'b630eab49a7e1e', '-o', '/dev/null', '-e', '/dev/null', '-q', '2nd'] fails with no such file or directory
I am not sure that my "trick" to pass the option to give a value to stdout='/dev/null' and stderr makes sense, but at least it seems that the options are passed to the part of the code that "writes" the bsub command.
I have also tried to just replace bsub with "bsub -o /dev/null -e /dev/null" but it gives the same error.
Did I miss some steps or put the -o /dev/null -e /dev/null in the wrong place?
Thanks a lot for your help,
Roberto
Revision history for this message
|
#3 |
command = ['bsub ', '-J', me_dir] does not work due to a space after bsub. I do not know if this makes sense but I have remvoed that space and now it works.
Revision history for this message
|
#4 |
mmmm ... It seems I did think it was done a bit too early ...
Jobs run fine and the options -e /dev/null and -o /dev/null seem to have passed well, in fact jobs run and I do not get the emails as I was doing before.
However it's already two times that something funny happens at the moment of "Combine".
INFO: Combining Events
WARNING: resubmit job (for the 1 times)
CRITICAL: Fail to run correctly job 546672709.
with option: {'log': None, 'stdout': '/afs/cern.
file missing: /afs/cern.
Fails 1 times
No resubmition.
CRITICAL: Fail to run correctly job 546672709.
with option: {'log': None, 'stdout': '/afs/cern.
file missing: /afs/cern.
Fails 1 times
No resubmition.
Start waiting for update. (more info in debug mode)
Is my edit of the code messing up with the combine call or this is just bad luck?
Thanks for helping!
Roberto
Revision history for this message
|
#5 |
Hi Roberto,
> However it's already two times that something funny happens at the
> moment of "Combine".
> Is my edit of the code messing up with the combine call or this is just
> bad luck?
Yeah this is systematics.
The problem is that the combine step use the stdout of the program to pass some information to python.
So I check that this file exists. Since you force stdout to always be /dev/null the expected file did not exists…
One better change to do what you want is to do the following change:
def submit(self, prog, argument=[], cwd=None, stdout=None, stderr=None, log=None,
to:
def submit(self, prog, argument=[], cwd=None, stdout=“/dev/null”, stderr=“/dev/null”, log=None,
I think that it should be enough and work in most situation. But might fail depending on how the function is called.
Another way to do it which is probably safer, would be the following:
if stdout and isinstance(stdout, str):
by
if stdout and isinstance(stdout, str):
else:
and
if stderr and isinstance(stdout, str):
elif stderr == -2: # -2 is subprocess.STDOUT
pass
by
if stderr and isinstance(stdout, str):
elif stderr == -2: # -2 is subprocess.STDOUT
pass
else:
Cheers,
Olivier
On Jul 17, 2014, at 12:07 AM, Roberto Franceschini <email address hidden> wrote:
> Question #251681 on MadGraph5_aMC@NLO changed:
> https:/
>
> Roberto Franceschini gave more information on the question:
> mmmm ... It seems I did think it was done a bit too early ...
>
> Jobs run fine and the options -e /dev/null and -o /dev/null seem to have
> passed well, in fact jobs run and I do not get the emails as I was doing
> before.
>
> However it's already two times that something funny happens at the
> moment of "Combine".
>
> INFO: Combining Events
> WARNING: resubmit job (for the 1 times)
> CRITICAL: Fail to run correctly job 546672709.
> with option: {'log': None, 'stdout': '/afs/cern.
> file missing: /afs/cern.
> Fails 1 times
> No resubmition.
> CRITICAL: Fail to run correctly job 546672709.
> with option: {'log': None, 'stdout': '/afs/cern.
> file missing: /afs/cern.
> Fails 1 times
> No resubmition.
> Start waiting for update. (more info in debug mode)
>
> Is my edit of the code messing up with the combine call or this is just
> bad luck?
>
> Thanks for helping!
> Roberto
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.
Revision history for this message
|
#6 |
Thanks Olivier Mattelaer, that solved my question.
Revision history for this message
|
#7 |
Hi Olivier thanks a lot for the suggestions on how to change the code. I tested it with a new process and it worked perfectly.
I am now proceeding to try it out on a process for which I had a MG folder already, I guess I have to replace cluster.pyo in that folder ...
Thanks again,
Roberto
Revision history for this message
|
#8 |
Just the cluster.py file should be enough.
Cheers,
Olivier
On Jul 19, 2014, at 10:32 PM, Roberto Franceschini <email address hidden> wrote:
> Question #251681 on MadGraph5_aMC@NLO changed:
> https:/
>
> Roberto Franceschini posted a new comment:
> Hi Olivier thanks a lot for the suggestions on how to change the code. I
> tested it with a new process and it worked perfectly.
>
> I am now proceeding to try it out on a process for which I had a MG
> folder already, I guess I have to replace cluster.pyo in that folder ...
>
> Thanks again,
> Roberto
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.
Revision history for this message
|
#9 |
Hi Olivier thanks a lot for all your support, I have checked that for processes that have already been generated it suffices to change the file ./bin/internal/
I was wondering if the "combine" job can be controlled more in detail. In fact this seems to be a job of quite different nature than the "computation" ones. For instance in my case it typically takes longer than most of other jobs (maybe this is specific of my simple 2->2 process uu > uu that I am using for testing).
So I was wondering if I can change cluster.py so that the combine job is submitted to a different queue than the numerical jobs.
I have given a closer inspection to the file cluster.py but I do not see any distinction in the function submit to treat differently different types of commands for the executable to be run.
I do not know if this question is best to be asked in a separate thread, you tell me. In case I am happy to open it.
Cheers,
Roberto
Revision history for this message
|
#10 |
Hi Roberto,
The combine_events jobs make a lot of IO operation and the computational timing rise as N**2 where N is the number of events.
>
> I have given a closer inspection to the file cluster.py but I do not see
> any distinction in the function submit to treat differently different
> types of commands for the executable to be run.
Indeed we do not have any distinction.
I guess that the simplest way is to add a if statement on the name of the executable and change the queue accordingly.
Chers,
Olivier
On Jul 22, 2014, at 11:56 AM, Roberto Franceschini <email address hidden> wrote:
> Question #251681 on MadGraph5_aMC@NLO changed:
> https:/
>
> Status: Solved => Open
>
> Roberto Franceschini is still having a problem:
> Hi Olivier thanks a lot for all your support, I have checked that for
> processes that have already been generated it suffices to change the
> file ./bin/internal/
>
> I was wondering if the "combine" job can be controlled more in detail.
> In fact this seems to be a job of quite different nature than the
> "computation" ones. For instance in my case it typically takes longer
> than most of other jobs (maybe this is specific of my simple 2->2
> process uu > uu that I am using for testing).
>
> So I was wondering if I can change cluster.py so that the combine job is
> submitted to a different queue than the numerical jobs.
>
> I have given a closer inspection to the file cluster.py but I do not see
> any distinction in the function submit to treat differently different
> types of commands for the executable to be run.
>
> I do not know if this question is best to be asked in a separate thread, you tell me. In case I am happy to open it.
> Cheers,
> Roberto
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.
Revision history for this message
|
#11 |
Hi Olivier, thanks for the hint on the complexity of combine_events.
If I understand correctly you are saying that 100K events takes 100 times more time to combine than for 10K and that the time needed to combine for a 100K events run is the same almost regardless of the complexity of the process, i.e. 10K events run of pp>jj take the same time of 10K pp>jjj when it comes to combine_events, is this the case?
Best,
Roberto
Revision history for this message
|
#12 |
Hi,
I never run precise timing on this part of the code.
But yes this is what I expect.
Cheers,
Olivier
On Jul 23, 2014, at 1:11 AM, Roberto Franceschini <email address hidden> wrote:
> Question #251681 on MadGraph5_aMC@NLO changed:
> https:/
>
> Status: Answered => Solved
>
> Roberto Franceschini confirmed that the question is solved:
> Hi Olivier, thanks for the hint on the complexity of combine_events.
>
> If I understand correctly you are saying that 100K events takes 100
> times more time to combine than for 10K and that the time needed to
> combine for a 100K events run is the same almost regardless of the
> complexity of the process, i.e. 10K events run of pp>jj take the same
> time of 10K pp>jjj when it comes to combine_events, is this the case?
>
> Best,
> Roberto
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.