How to set --nb_core=1 as default on cluster mode (condor)

Asked by Roberto Franceschini

Hello
 here we have a condor cluster that works only if I do require explicitely to use one core only by using

./bin/generate_events runname --nb_core=1
or launch --nb_core=1 in interactive mode.

I wanted to put --nb_core=1 as a default in the mg5_configuration.txt file. So I put

nb_core = 1 #
cluster_type = condor
cluster_queue = madgraph #

However when I start the event generation it gets stuck and it works only when I explicitly put --nb_core=1 in the command (as if the nb_core = 1 was ignored).

So here is my question: What's the difference between --nb_core=1 in the command and in the mg5_configuration.txt ? how can I get --nb_core=1 to be the default as if I did put it in the command line?

Thanks for your help.
Roberto

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Roberto Franceschini
Solved:
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi Roberto,

I think that you are using the cluster mode in a weird way.
In cluster mode (run_mode=1)
you just have to do
./bin/generate_events runname
and the above script acts as a cluster manager, controlling the job, analyze their result and submit additional job when appropriate.
all job that are submitted to condor are automatically running on a single cpu.

If you specify —nb_core=1 then you switch automatically to a local (multicore configuration with only one core) submition.

However, the usage of the nb_core is cluster configuration is use to allow to choose how much local core are allowed to use to compile the code.

Cheers,

Olivier

On Jun 20, 2014, at 10:21 PM, Roberto Franceschini <email address hidden> wrote:

> New question #250529 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/250529
>
> Hello
> here we have a condor cluster that works only if I do require explicitely to use one core only by using
>
> ./bin/generate_events runname --nb_core=1
> or launch --nb_core=1 in interactive mode.
>
> I wanted to put --nb_core=1 as a default in the mg5_configuration.txt file. So I put
>
> nb_core = 1 #
> cluster_type = condor
> cluster_queue = madgraph #
>
> However when I start the event generation it gets stuck and it works only when I explicitly put --nb_core=1 in the command (as if the nb_core = 1 was ignored).
>
> So here is my question: What's the difference between --nb_core=1 in the command and in the mg5_configuration.txt ? how can I get --nb_core=1 to be the default as if I did put it in the command line?
>
> Thanks for your help.
> Roberto
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Roberto Franceschini (franceschini-roberto) said :
#2

Hi Olivier, thanks a lot for your explanation of what these options are supposed to mean.

i tell you the 'phenomenology' you tell me if I am doing something weird.

I have a configuration file mg5_configuration.txt as follows:
#! Default Running mode
#! 0: single machine/ 1: cluster / 2: multicore
run_mode = 1 #

#! Cluster Type [pbs|sge|condor|lsf|ge|slurm] Use for cluster run only
#! And cluster queue
cluster_type = condor
cluster_queue = madgraph #

Then I try to run jobs on the cluster. If I run

./bin/generate_events run_name --cluster --nb_core=1 -f

it ives a fully working run, whereas if I do

./bin/generate_events run_name --cluster -f

it does not manage to launch job and conclude the calculation. This is why I wanted to have " --nb_core=1 " as a default in the mg5_configuration.txt file.

Can this be put in the mg5_configuration.txt file so that I can simply run

./bin/generate_events run_name -f

My attempts so far failed.
Thanks for helping,
Roberto

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

Hi Roberto,

The line
> cluster_queue = madgraph #

is unlikely to work for your cluster (and probably the reason why it doesn’t work).
This is the line that we use for our cluster. But this is site specific (so ask your IT team which queue you have to use).
If they say to you that they are not queue system (possible), then you just have to use None for that parameter.

> Can this be put in the mg5_configuration.txt file so that I can simply
> run
>
> ./bin/generate_events run_name -f

Yes you can.

Cheers,

Olivier

On Jun 23, 2014, at 4:41 PM, Roberto Franceschini <email address hidden> wrote:

> Question #250529 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/250529
>
> Status: Answered => Open
>
> Roberto Franceschini is still having a problem:
> Hi Olivier, thanks a lot for your explanation of what these options are
> supposed to mean.
>
> i tell you the 'phenomenology' you tell me if I am doing something
> weird.
>
> I have a configuration file mg5_configuration.txt as follows:
> #! Default Running mode
> #! 0: single machine/ 1: cluster / 2: multicore
> run_mode = 1 #
>
> #! Cluster Type [pbs|sge|condor|lsf|ge|slurm] Use for cluster run only
> #! And cluster queue
> cluster_type = condor
> cluster_queue = madgraph #
>
> Then I try to run jobs on the cluster. If I run
>
> ./bin/generate_events run_name --cluster --nb_core=1 -f
>
> it ives a fully working run, whereas if I do
>
> ./bin/generate_events run_name --cluster -f
>
> it does not manage to launch job and conclude the calculation. This is
> why I wanted to have " --nb_core=1 " as a default in the
> mg5_configuration.txt file.
>
> Can this be put in the mg5_configuration.txt file so that I can simply
> run
>
> ./bin/generate_events run_name -f
>
> My attempts so far failed.
> Thanks for helping,
> Roberto
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Roberto Franceschini (franceschini-roberto) said :
#4

Hi Olivier, I think that those succesful runs I had where acatually run in single-cpu mode locally and not sent to the cluster. I have realized it only now that I launched a bigger number of events and the process "madevent" is at the top of "top". I was previously fooled by seeing stuff in the condor list of submitted jobs because the jobs that that did not go well were still hanging around in the cluster.

I am waiting to know the queue name and I will put it asap in the configuration file.

However let me notice something. I have asked for 1M u u > u u events, very simple process by doing
./bin/generate_events runname --cluster --nb_core=1 -f

Despite the --cluster I see that madevent is run locally ... is this the way it is intended to be? am I the only one to see this behaviour?

I am sorry if this is getting a string of not so intelligent questions about how to run on the cluster ... I am a newbbie with condor ...

cheers,
roberto

Revision history for this message
Roberto Franceschini (franceschini-roberto) said :
#5

it turns out that we have no queue name, so now my mg5_configuration.txt reads

#! Default Running mode
#! 0: single machine/ 1: cluster / 2: multicore
run_mode = 1 #

#! Cluster Type [pbs|sge|condor|lsf|ge|slurm] Use for cluster run only
#! And cluster queue
cluster_type = condor
cluster_queue = None #

Before I go and test it (at risk to harm the cluster), may I ask you how can I see the JDL files that madgraph is generating for the submission to condor?

It turns out that our JDL needs to contain "universe = vanilla" ... if that makes sense to you.

Thanks a lot for following this not so amusing question on condor setup.
Roberto

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#6

Hi Roberto,

The routine creating the file is
./bin/internal/cluster.py
(the original file is madgraph/various/cluster.py)

at line 732, you have the code for the condor submission.

so you can add the line “raise Exception” after line 775.
to stop the code after the creation of the submition file.
and look at him (it create a local file submit_condor)

> It turns out that our JDL needs to contain "universe = vanilla" ... if
> that makes sense to you.

Yes this is a pretty common requirement, so this is include.

Cheers,

Olivier

On Jun 23, 2014, at 7:41 PM, Roberto Franceschini <email address hidden> wrote:

> Question #250529 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/250529
>
> Roberto Franceschini gave more information on the question:
> it turns out that we have no queue name, so now my mg5_configuration.txt
> reads
>
> #! Default Running mode
> #! 0: single machine/ 1: cluster / 2: multicore
> run_mode = 1 #
>
> #! Cluster Type [pbs|sge|condor|lsf|ge|slurm] Use for cluster run only
> #! And cluster queue
> cluster_type = condor
> cluster_queue = None #
>
> Before I go and test it (at risk to harm the cluster), may I ask you how
> can I see the JDL files that madgraph is generating for the submission
> to condor?
>
> It turns out that our JDL needs to contain "universe = vanilla" ... if
> that makes sense to you.
>
> Thanks a lot for following this not so amusing question on condor setup.
> Roberto
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Roberto Franceschini (franceschini-roberto) said :
#7

Hi Olivier, thanks for your input.

Finally I have MG to work with the local condor configuration without need to use the nb_core options. So the issue is solved.

Let me add that at least in my case the file submit_condor stays in the folder, maybe is some kind of accident. I have checked indeed that it contains the universe = vanilla line.

Sometime it gets stuck on "combine events" even though I am doing just small runs of 40K events, but I guess it's just a matter of waiting.

Thanks again for your support.
Best,
Roberto