MadGraph5_aMC@NLO

Error in combine_events step on condor cluster

Asked by Kenneth Long on 2015-03-03

Hi Experts,

I am encountering an error which has been addressed several times on this forum, for example https://bugs.launchpad.net/mg5amcnlo/+bug/1071765, https://answers.launchpad.net/mg5amcnlo/+question/235824, where running generate_events on a cluster fails during the combine_events step. I am using a condor_cluster which does not have read access to the directory I am submitting from:

Fail to read the number of unweighted events in the combine.log file
=== Results Summary for run: run_04 tag: tag_1 ===

Cross-section : 0.2613 +- 0.0008017 pb
Nb of events : 0

I have been ignoring the issue for a while now because a simple fix is to run combine_events and store_events locally, which are quick enough to not really be worth running on the cluster anyway (the easiest solution I found is to change the calls to survey and refine in do_generate_events of madevent_interface.py to survey --cluster, refine --cluster and leave cluster mode off, but even after the jobs fail I can recover them by running combine_events run_xx, store_events run_xx,. I was under the impression that this had been fixed in the newest version from information in a related bug report. However, I still get the issue.

Alternatively I see that setting

cluster_temp_path = <directory writable but not readable by condor>

in me5_configuration.txt allows the combine_step to work without any issue, though the comments suggest it should not have any affect for the condor cluster. Can you explain more what this setting does and why it fixes the problem?

I also encounter a related error. When running generate_events I get the error:

INFO: Combining Events
Command "generate_events -f" interrupted with error:
KeyError : 'cluster_tmp_path'
Please report this bug on https://bugs.launchpad.net/madgraph5
More information is found in '/nfs_scratch/kdlong/wpz0jet/run_03_tag_1_debug.log'.
Please attach this file to your report.
quit

The debug file indicates

23 generate_events -f
24 Traceback (most recent call last):
25 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 879, in onecmd
26 return self.onecmd_orig(line, **opt)
27 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 872, in onecmd_orig
28 return func(arg, **opt)
29 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/madevent_interface.py", line 1939, in do_generate_events
30 self.exec_cmd('combine_events', postcmd=False)
31 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 919, in exec_cmd
32 stop = Cmd.onecmd_orig(current_interface, line, **opt)
33 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 872, in onecmd_orig
34 return func(arg, **opt)
35 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/madevent_interface.py", line 2513, in do_combine_events
36 if self.options['run_mode'] ==1 and self.options['cluster_tmp_path']:
37 KeyError: 'cluster_tmp_path'

From this I believe there is a typo in self.options['cluster_tmp_path']. Changing the key to cluster_temp_path (which is the variable name as defined in me5_configuration.txt) fixes the issue. I downloaded the code using bzr branch lp:madgraph5 under the impression I would get the latest stable version. I imagine this is not the case and I actually got a development version?

Thanks!

Kenneth

Question information

Language:: English Edit question

Status:: Answered

For:: MadGraph5_aMC@NLO Edit question

Assignee:: No assignee Edit question

Last query:: 2015-03-03

Last reply:: 2015-03-03

Link existing bug

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) said on 2015-03-03:

Hi,

In cluster mode, if cluster_temp_path is set, then the run is combining of events is done locally otherwise it is run on the cluster.
This is actually true for all cluster (including condor). For condor cluster, this is actually the only impact for that option.
For other cluster this options modifies the way job handle IO operation.

>>
>> From this I believe there is a typo in self.options['cluster_tmp_path']. Changing the key to cluster_temp_path (which is the variable name as defined in me5_configuration.txt) fixes the issue. I downloaded the code using bzr branch lp:madgraph5 under the impression I would get the latest stable version. I imagine this is not the case and I actually got a development version?
>
> Thanks!

Thanks I fix this for our next version.

Cheers,

Olivier

On 03 Mar 2015, at 02:21, Kenneth Long <email address hidden> wrote:

> New question #263102 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/263102
>
> Hi Experts,
>
> I am encountering an error which has been addressed several times on this forum, for example https://bugs.launchpad.net/mg5amcnlo/+bug/1071765, https://answers.launchpad.net/mg5amcnlo/+question/235824, where running generate_events on a cluster fails during the combine_events step. I am using a condor_cluster which does not have read access to the directory I am submitting from:
>
> Fail to read the number of unweighted events in the combine.log file
> === Results Summary for run: run_04 tag: tag_1 ===
>
> Cross-section : 0.2613 +- 0.0008017 pb
> Nb of events : 0
>
> I have been ignoring the issue for a while now because a simple fix is to run combine_events and store_events locally, which are quick enough to not really be worth running on the cluster anyway (the easiest solution I found is to change the calls to survey and refine in do_generate_events of madevent_interface.py to survey --cluster, refine --cluster and leave cluster mode off, but even after the jobs fail I can recover them by running combine_events run_xx, store_events run_xx,. I was under the impression that this had been fixed in the newest version from information in a related bug report. However, I still get the issue.
>
> Alternatively I see that setting
>
> cluster_temp_path = <directory writable but not readable by condor>
>
> in me5_configuration.txt allows the combine_step to work without any issue, though the comments suggest it should not have any affect for the condor cluster. Can you explain more what this setting does and why it fixes the problem?
>
> I also encounter a related error. When running generate_events I get the error:
>
> INFO: Combining Events
> Command "generate_events -f" interrupted with error:
> KeyError : 'cluster_tmp_path'
> Please report this bug on https://bugs.launchpad.net/madgraph5
> More information is found in '/nfs_scratch/kdlong/wpz0jet/run_03_tag_1_debug.log'.
> Please attach this file to your report.
> quit
>
> The debug file indicates
>
> 23 generate_events -f
> 24 Traceback (most recent call last):
> 25 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 879, in onecmd
> 26 return self.onecmd_orig(line, **opt)
> 27 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 872, in onecmd_orig
> 28 return func(arg, **opt)
> 29 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/madevent_interface.py", line 1939, in do_generate_events
> 30 self.exec_cmd('combine_events', postcmd=False)
> 31 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 919, in exec_cmd
> 32 stop = Cmd.onecmd_orig(current_interface, line, **opt)
> 33 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 872, in onecmd_orig
> 34 return func(arg, **opt)
> 35 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/madevent_interface.py", line 2513, in do_combine_events
> 36 if self.options['run_mode'] ==1 and self.options['cluster_tmp_path']:
> 37 KeyError: 'cluster_tmp_path'
>
>> From this I believe there is a typo in self.options['cluster_tmp_path']. Changing the key to cluster_temp_path (which is the variable name as defined in me5_configuration.txt) fixes the issue. I downloaded the code using bzr branch lp:madgraph5 under the impression I would get the latest stable version. I imagine this is not the case and I actually got a development version?
>
> Thanks!
>
> Kenneth
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Hi,

In cluster mode, if cluster_temp_path  is set, then the run is combining of events is done locally otherwise it is run on the cluster.
This is actually true for all cluster (including condor). For condor cluster, this is actually the only impact for that option.
For other cluster this options modifies the way job handle IO operation.

>> 
>> From this I believe there is a typo in self.options['cluster_tmp_path']. Changing the key to cluster_temp_path (which is the variable name as defined in me5_configuration.txt) fixes the issue. I downloaded the code using bzr branch lp:madgraph5 under the impression I would get the latest stable version. I imagine this is not the case and I actually got a development version?
> 
> Thanks!

Thanks I fix this for our next version.

Cheers,

Olivier

On 03 Mar 2015, at 02:21, Kenneth Long <question263102@answers.launchpad.net> wrote:

> New question #263102 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/263102
> 
> Hi Experts,
> 
> I am encountering an error which has been addressed several times on this forum, for example https://bugs.launchpad.net/mg5amcnlo/+bug/1071765, https://answers.launchpad.net/mg5amcnlo/+question/235824, where running generate_events on a cluster fails during the combine_events step. I am using a condor_cluster which does not have read access to the directory I am submitting from:
> 
> Fail to read the number of unweighted events in the combine.log file
>  === Results Summary for run: run_04 tag: tag_1 ===
> 
>     Cross-section :   0.2613 +- 0.0008017 pb
>     Nb of events :  0
> 
> I have been ignoring the issue for a while now because a simple fix is to run combine_events and store_events locally, which are quick enough to not really be worth running on the cluster anyway (the easiest solution I found is to change the calls to survey and refine in do_generate_events of madevent_interface.py to survey --cluster, refine --cluster and leave cluster mode off, but even after the jobs fail I can recover them by running combine_events run_xx, store_events run_xx,. I was under the impression that this had been fixed in the newest version from information in a related bug report. However, I still get the issue.
> 
> Alternatively I see that setting 
> 
> cluster_temp_path = <directory writable but not readable by condor>
> 
> in me5_configuration.txt allows the combine_step to work without any issue, though the comments suggest it should not have any affect for the condor cluster. Can you explain more what this setting does and why it fixes the problem?
> 
> I also encounter a related error. When running generate_events I get the error:
> 
> INFO: Combining Events 
> Command "generate_events -f" interrupted with error:
> KeyError : 'cluster_tmp_path'
> Please report this bug on https://bugs.launchpad.net/madgraph5
> More information is found in '/nfs_scratch/kdlong/wpz0jet/run_03_tag_1_debug.log'.
> Please attach this file to your report.
> quit
> 
> The debug file indicates
> 
> 23 generate_events -f
> 24 Traceback (most recent call last):
> 25   File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 879, in onecmd
> 26     return self.onecmd_orig(line, **opt)
> 27   File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 872, in onecmd_orig
> 28     return func(arg, **opt)
> 29   File "/nfs_scratch/kdlong/wpz0jet/bin/internal/madevent_interface.py", line 1939, in do_generate_events
> 30     self.exec_cmd('combine_events', postcmd=False)
> 31   File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 919, in exec_cmd
> 32     stop = Cmd.onecmd_orig(current_interface, line, **opt)
> 33   File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 872, in onecmd_orig
> 34     return func(arg, **opt)
> 35   File "/nfs_scratch/kdlong/wpz0jet/bin/internal/madevent_interface.py", line 2513, in do_combine_events
> 36     if self.options['run_mode'] ==1 and self.options['cluster_tmp_path']:
> 37 KeyError: 'cluster_tmp_path'
> 
>> From this I believe there is a typo in self.options['cluster_tmp_path']. Changing the key to cluster_temp_path (which is the variable name as defined in me5_configuration.txt) fixes the issue. I downloaded the code using bzr branch lp:madgraph5 under the impression I would get the latest stable version. I imagine this is not the case and I actually got a development version?
> 
> Thanks! 
> 
> Kenneth
> 
> -- 
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Can you help with this problem?

Provide an answer of your own, or ask Kenneth Long for more information if necessary.

To post a message you must log in.

Ask a question

Edit question

MadGraph5_aMC@NLO

Error in combine_events step on condor cluster

Question information

Related bugs

Related FAQ:

Can you help with this problem?

Subscribers