Madgraph jobs are not submitted to condor cluster

Asked by safa gaid

Hello,

I am trying to submit the Madgraph jobs to condor cluster on lxplus Cern. For the settings of the cluster mode:

              run_mode : 1 (user set)
              cluster_queue : madgraph (user set)
               cluster_time : madgraph (user set)
               cluster_size : 150 (user set)
               cluster_memory : 150 (user set)
                    nb_core : 1 (user set)
          cluster_temp_path : None

However, before generating the events, I got this error.

File "/afs/cern.ch/user/s/sgaid/MG5_aMC_v3_2_0/madgraph/various/misc.py", line 433, in deco_f_retry
    raise error.__class__('[Fail %i times] \n %s ' % (i+1, error))
UnboundLocalError: local variable 'error' referenced before assignment

The script was working before, and now that I am using MG.3.2.0, I get this error.

Could you please clarify this point for me?

Thanks and best regards,

Safa.

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi,

> cluster_queue : madgraph (user set)
> cluster_time : madgraph (user set)
> cluster_size : 150 (user set)
> cluster_memory : 150 (user set)

Do you have a slurm queue called madgraph on lxplus? This sounds surprising.
Also cluster_time is unlikely correctly set on "madgraph"
What is your cluster_type? do you have some PLUGIN customization in place for LXPLUS?

Then what do you mean by working before? Can you check what was the above configration for that working before case? and which version of the code you were using?

Next question is why using 3.2.0?
We do only support the latest releases (3.5.3) and for user that want stable version we also offer a long term stable release (2.9.x). So if you like to keep a version fixed we advised to stick to the 2.9.x release (where we regularly provide bug fix (but only that))

Cheers,

Olivier

> On 23 Jan 2024, at 10:20, safa gaid <email address hidden> wrote:
>
> New question #709075 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/709075
>
> Hello,
>
> I am trying to submit the Madgraph jobs to condor cluster on lxplus Cern. For the settings of the cluster mode:
>
> run_mode : 1 (user set)
> cluster_queue : madgraph (user set)
> cluster_time : madgraph (user set)
> cluster_size : 150 (user set)
> cluster_memory : 150 (user set)
> nb_core : 1 (user set)
> cluster_temp_path : None
>
> However, before generating the events, I got this error.
>
> File "/afs/cern.ch/user/s/sgaid/MG5_aMC_v3_2_0/madgraph/various/misc.py", line 433, in deco_f_retry
> raise error.__class__('[Fail %i times] \n %s ' % (i+1, error))
> UnboundLocalError: local variable 'error' referenced before assignment
>
>
> The script was working before, and now that I am using MG.3.2.0, I get this error.
>
> Could you please clarify this point for me?
>
> Thanks and best regards,
>
> Safa.
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
safa gaid (safagaidsafa) said :
#2

Hello,

Thank you for your prompt reply.

I checked, and we do have SLURM installed in Lxplus CERN but I never used it before. Do you suggest that I can use it instead ?

I still get the same error and these the modifications:

I am now using release 2.9.5 and for the settings, I commented the following:

> cluster_queue : madgraph (user set)
> cluster_time : madgraph (user set)
> cluster_size : 150 (user set)
> cluster_memory : 150 (user set)

I did not mention in the previous message that I wrote the commands to a command file and run it using :

./../bin/mg5_aMC mycmnd

Any other suggestions, please?

Best,

Safa

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

Hi,

I would say that you are using a plugin to interact with the cluster that you are using.
The parameters
> cluster_time : madgraph (user set)
> cluster_memory : 150 (user set)

are not default in MG5aMC, so need to be defined by a plugin.
In any case, the setting of "cluster_time" to "madgraph" is likely a mistake on your side, since it is a weird value for a "time".

In the same spirit, the value for "cluster_queue" should correspond to a condor partition (or a queue/... whatever the nomencalture for your cluster is) and this is unlikely that the lxplus has such name for that.

Cheers,

Olivier

Revision history for this message
safa gaid (safagaidsafa) said :
#4

Dear Olivier,

Thank you for your reply.

I changed the cluster settings as you suggested, but I still cannot submit the jobs. I guess the problem is not with Madgraph but with Condor.
I will contact the IT contact at Cern to see if the problem is within Condor's version of lxplus or if I am doing something wrong somewhere in Condor.

Best,
Safa.

Can you help with this problem?

Provide an answer of your own, or ask safa gaid for more information if necessary.

To post a message you must log in.