Error in combine_events step on condor cluster

Asked by Kenneth Long

Hi Experts,

I am encountering an error which has been addressed several times on this forum, for example https://bugs.launchpad.net/mg5amcnlo/+bug/1071765, https://answers.launchpad.net/mg5amcnlo/+question/235824, where running generate_events on a cluster fails during the combine_events step. I am using a condor_cluster which does not have read access to the directory I am submitting from:

Fail to read the number of unweighted events in the combine.log file
  === Results Summary for run: run_04 tag: tag_1 ===

     Cross-section : 0.2613 +- 0.0008017 pb
     Nb of events : 0

I have been ignoring the issue for a while now because a simple fix is to run combine_events and store_events locally, which are quick enough to not really be worth running on the cluster anyway (the easiest solution I found is to change the calls to survey and refine in do_generate_events of madevent_interface.py to survey --cluster, refine --cluster and leave cluster mode off, but even after the jobs fail I can recover them by running combine_events run_xx, store_events run_xx,. I was under the impression that this had been fixed in the newest version from information in a related bug report. However, I still get the issue.

Alternatively I see that setting

cluster_temp_path = <directory writable but not readable by condor>

in me5_configuration.txt allows the combine_step to work without any issue, though the comments suggest it should not have any affect for the condor cluster. Can you explain more what this setting does and why it fixes the problem?

I also encounter a related error. When running generate_events I get the error:

INFO: Combining Events
Command "generate_events -f" interrupted with error:
KeyError : 'cluster_tmp_path'
Please report this bug on https://bugs.launchpad.net/madgraph5
More information is found in '/nfs_scratch/kdlong/wpz0jet/run_03_tag_1_debug.log'.
Please attach this file to your report.
quit

The debug file indicates

 23 generate_events -f
 24 Traceback (most recent call last):
 25 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 879, in onecmd
 26 return self.onecmd_orig(line, **opt)
 27 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 872, in onecmd_orig
 28 return func(arg, **opt)
 29 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/madevent_interface.py", line 1939, in do_generate_events
 30 self.exec_cmd('combine_events', postcmd=False)
 31 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 919, in exec_cmd
 32 stop = Cmd.onecmd_orig(current_interface, line, **opt)
 33 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 872, in onecmd_orig
 34 return func(arg, **opt)
 35 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/madevent_interface.py", line 2513, in do_combine_events
 36 if self.options['run_mode'] ==1 and self.options['cluster_tmp_path']:
 37 KeyError: 'cluster_tmp_path'

From this I believe there is a typo in self.options['cluster_tmp_path']. Changing the key to cluster_temp_path (which is the variable name as defined in me5_configuration.txt) fixes the issue. I downloaded the code using bzr branch lp:madgraph5 under the impression I would get the latest stable version. I imagine this is not the case and I actually got a development version?

Thanks!

Kenneth

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi,

In cluster mode, if cluster_temp_path is set, then the run is combining of events is done locally otherwise it is run on the cluster.
This is actually true for all cluster (including condor). For condor cluster, this is actually the only impact for that option.
For other cluster this options modifies the way job handle IO operation.

>>
>> From this I believe there is a typo in self.options['cluster_tmp_path']. Changing the key to cluster_temp_path (which is the variable name as defined in me5_configuration.txt) fixes the issue. I downloaded the code using bzr branch lp:madgraph5 under the impression I would get the latest stable version. I imagine this is not the case and I actually got a development version?
>
> Thanks!

Thanks I fix this for our next version.

Cheers,

Olivier

On 03 Mar 2015, at 02:21, Kenneth Long <email address hidden> wrote:

> New question #263102 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/263102
>
> Hi Experts,
>
> I am encountering an error which has been addressed several times on this forum, for example https://bugs.launchpad.net/mg5amcnlo/+bug/1071765, https://answers.launchpad.net/mg5amcnlo/+question/235824, where running generate_events on a cluster fails during the combine_events step. I am using a condor_cluster which does not have read access to the directory I am submitting from:
>
> Fail to read the number of unweighted events in the combine.log file
> === Results Summary for run: run_04 tag: tag_1 ===
>
> Cross-section : 0.2613 +- 0.0008017 pb
> Nb of events : 0
>
> I have been ignoring the issue for a while now because a simple fix is to run combine_events and store_events locally, which are quick enough to not really be worth running on the cluster anyway (the easiest solution I found is to change the calls to survey and refine in do_generate_events of madevent_interface.py to survey --cluster, refine --cluster and leave cluster mode off, but even after the jobs fail I can recover them by running combine_events run_xx, store_events run_xx,. I was under the impression that this had been fixed in the newest version from information in a related bug report. However, I still get the issue.
>
> Alternatively I see that setting
>
> cluster_temp_path = <directory writable but not readable by condor>
>
> in me5_configuration.txt allows the combine_step to work without any issue, though the comments suggest it should not have any affect for the condor cluster. Can you explain more what this setting does and why it fixes the problem?
>
> I also encounter a related error. When running generate_events I get the error:
>
> INFO: Combining Events
> Command "generate_events -f" interrupted with error:
> KeyError : 'cluster_tmp_path'
> Please report this bug on https://bugs.launchpad.net/madgraph5
> More information is found in '/nfs_scratch/kdlong/wpz0jet/run_03_tag_1_debug.log'.
> Please attach this file to your report.
> quit
>
> The debug file indicates
>
> 23 generate_events -f
> 24 Traceback (most recent call last):
> 25 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 879, in onecmd
> 26 return self.onecmd_orig(line, **opt)
> 27 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 872, in onecmd_orig
> 28 return func(arg, **opt)
> 29 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/madevent_interface.py", line 1939, in do_generate_events
> 30 self.exec_cmd('combine_events', postcmd=False)
> 31 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 919, in exec_cmd
> 32 stop = Cmd.onecmd_orig(current_interface, line, **opt)
> 33 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/extended_cmd.py", line 872, in onecmd_orig
> 34 return func(arg, **opt)
> 35 File "/nfs_scratch/kdlong/wpz0jet/bin/internal/madevent_interface.py", line 2513, in do_combine_events
> 36 if self.options['run_mode'] ==1 and self.options['cluster_tmp_path']:
> 37 KeyError: 'cluster_tmp_path'
>
>> From this I believe there is a typo in self.options['cluster_tmp_path']. Changing the key to cluster_temp_path (which is the variable name as defined in me5_configuration.txt) fixes the issue. I downloaded the code using bzr branch lp:madgraph5 under the impression I would get the latest stable version. I imagine this is not the case and I actually got a development version?
>
> Thanks!
>
> Kenneth
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Can you help with this problem?

Provide an answer of your own, or ask Kenneth Long for more information if necessary.

To post a message you must log in.