MadGraph5_aMC@NLO

cluster run timing not improving after increasing CPU power

Asked by Arian Abrahantes on 2012-02-13

Dear MG-Team:

I have made two tests for my calculations (both shares identical cards)

1- SGE cluster single node with 4 CPU
2- SGE cluster about 50 CPU.

I have to say that I noticed the splitting jobs increased because at some point many CPU were available in the cluster (other users tasks finished) and my new runs passed from 500 queued pushing and running 4 jobs in a single node (case 1) to 4000 in all nodes (case 2).

The point is: How is it possible that both calculations came to have similar running time? Both calculation have 1h+30 minutes running time

comparing 1 and 2. Apparently splitting of the calculation did not improved speed, which, at least for me, sounds odd

Out of this I imagine that splitting a run in several jobs depends on cluster status (by some mean MG checks this) and it progates at some rate with calculation-slot/CPU availability. Which makes me think that if at some point anther users sent jobs to the cluster my calculation will even take longer in case 2 than in case 1.

Is this a signature of configuration/loading problems of my cluster?

Would there be an issue with inhomogeneities on the cluster (10x4CPU machines + 12x2CPU machines)? Should i use queues of homogeneous machines?

Is it there a possibility to limit the amount of jobs ("ajob") by one run? If the split is too high loosing one job may damage the full calculation.

If splitting did not improved speed wouldn't it be better to set as many "ajobs" scripts as CPU in the cluster. So my case 2 would have created all 50 jobs submit them at once and finished quickly.

Furthermore, Iwill test the same configuration in my laptop (mac 2.4GHz intel core 2 Duo) to check the running time which I beleive I already did but not sure and the running time it was sort of the same (1h+ for a calculation)

In any case I'd like to help to improve this issue so let me know if there anything I can test in my cluster.

best regards,

Question information

Language:: English Edit question

Status:: Solved

For:: MadGraph5_aMC@NLO Edit question

Assignee:: No assignee Edit question

Solved by:: Arian Abrahantes

Solved:: 2012-02-13

Last query:: 2012-02-13

Last reply:: 2012-02-13

Link existing bug

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) said on 2012-02-13:

#1

Hi arian,

What is your test process?

The jobs are in fact done in different step:
1) survey
2) refine 1
3) refine 2
4) store
5) pythia
6) pgs-delphes

Each of those steps can start only when the previous one is fully finished.
Which explains why all job are not send to the cluster at the same time.
step 1-2-3 have a predefine splitting (a quite small splitting in fact)
while 5-6 use only one node of the cluster. (in principle it's possible to split in multi-job this part, but we never did it)
(for the 4, I'm not 100% sure, but this part is in fact limited by the disk loading more than any possible splitting)

So the fact to have a larger cluster will help for the step 1 to 3.
the step 6 is quite often very long.

>I have to say that I noticed the splitting jobs increased because at some point many CPU were available in the cluster (other >users tasks finished) and my new runs passed from 500 queued pushing and running 4 jobs in a single node (case 1) to 4000 in all >nodes (case 2).

This is simply that point that one level finish and that he submitted the next level

>Out of this I imagine that splitting a run in several jobs depends on cluster status (by some mean MG checks this) and it progates >at some rate with calculation-slot/CPU availability. Which makes me think that if at some point anther users sent jobs to the >cluster my calculation will even take longer in case 2 than in case 1.

No the splitting is universal and not dependent of the cluster (in fact it's the same even in single machine)

>Is this a signature of configuration/loading problems of my cluster?

probably not, but might be (50 old machine can be slower than 5 fast one)

>Would there be an issue with inhomogeneities on the cluster (10x4CPU machines + 12x2CPU machines)? Should i use queues of >homogeneous machines?

Not an expert on SGE cluster, but I would say no.

>Is it there a possibility to limit the amount of jobs ("ajob") by one run? If the split is too high loosing one job may damage the full calculation.

The splitting being universal, no the only way to limit the number of run will be to merge those just before the submission on the cluster
(I mean in cluster.py)

>In any case I'd like to help to improve this issue so let me know if there anything I can test in my cluster.

I'd say check the time takes by each of the steps, and look at the time for each of the jobs submitted.
Then we can probably have more details analysis.

Cheers,

Olivier

Hi arian,

What is your test process?

The jobs are in fact done in different step:
1) survey
2) refine 1
3) refine 2
4) store
5) pythia
6) pgs-delphes

Each of those steps can start only when the previous one is fully finished.
Which explains why all job are not send to the cluster at the same time.
step 1-2-3 have a predefine splitting (a quite small splitting in fact)
while 5-6 use only one node of the cluster. (in principle it's possible to split in multi-job this part, but we never did it)
(for the 4, I'm not 100% sure, but this part is in fact limited by the disk loading more than any possible splitting)

So the fact to have a larger cluster will help for the step 1 to 3.
the step 6 is quite often very long.

>I have to say that I noticed the splitting jobs increased because at some point many CPU were available in the cluster (other >users tasks finished) and my new runs passed from 500 queued pushing and running 4 jobs in a single node (case 1) to 4000 in all >nodes (case 2).

This is simply that point that one level finish and that he submitted the next level

>Out of this I imagine that splitting a run in several jobs depends on cluster status (by some mean MG checks this) and it progates >at some rate with calculation-slot/CPU availability. Which makes me think that if at some point anther users sent jobs to the >cluster my calculation will even take longer in case 2 than in case 1.

No the splitting is universal and not dependent of the cluster (in fact it's the same even in single machine)

>Is this a signature of configuration/loading problems of my cluster?

probably not, but might be (50 old machine can be slower than 5 fast one)

>Would there be an issue with inhomogeneities on the cluster (10x4CPU machines + 12x2CPU machines)? Should i use queues of >homogeneous machines?

Not an expert on SGE cluster, but I would say no.

>Is it there a possibility to limit the amount of jobs ("ajob") by one run? If the split is too high loosing one job may damage the full calculation.

The splitting being universal, no the only way to limit the number of run will be to merge those just before the submission on the cluster
(I mean in cluster.py)

>In any case I'd like to help to improve this issue so let me know if there anything I can test in my cluster.

I'd say check the time takes by each of the steps, and look at the time for each of the jobs submitted.
Then we can probably have more details analysis.

Cheers,

Olivier

Revision history for this message

Arian Abrahantes (arian-abrahantes) said on 2012-02-13:

#2

Hi oliver:

I'll try to give more input following the order of your questions

> What is your test process?
>
>
pp>gogo, go>tt~ne , go>tb~nc

being ne and nc all neutralinos and charginos, so actually it is triple top

> The jobs are in fact done in different step:
> 1) survey
> 2) refine 1
> 3) refine 2
> 4) store
> 5) pythia
> 6) pgs-delphes
>
>
R: My calculation stops at parton level, so steps 5 and 6 are ruled out.

> Each of those steps can start only when the previous one is fully finished.
> Which explains why all job are not send to the cluster at the same time.
> step 1-2-3 have a predefine splitting (a quite small splitting in fact)
>

R: When you said "predifined splitting" what's the actual meaning? see
bellow

1- in my laptop there were 512 jobs(64*8) for surveying gg as starter and
probably 2994(368*8) for qq as starter. I said probably because I am doing
the test in the laptop as I write this post and I have checke that for the
first qq it set 368 jobs in the queue and still 7 other configuratins for
quark-antiquark to go which I presume, for symmetry reasons, should be the
same. So may be 2994+512 = 3506

2- Cluster (8x4CPU, I opted for a single queue of quad core machines) there
are:

Idle: 2237 Running: 31 Finish: 2212

for a total of 4480 jobs in the survey

So unless unless jobs in my laptop scale differently (you can judge that)
there are 1000 jobs more submitted to cluster queue.

This is simply that point that one level finish and that he submitted
> the next level
>
>
R: Well in this case I could arque as posted before: there is no level
change. they are two different runs changing CPU availability.

> No the splitting is universal and not dependent of the cluster (in fact
> it's the same even in single machine)
>
>
R: I gave you the numbers above so probably I scaled wrong jobs in my laptop

> >Is this a signature of configuration/loading problems of my cluster?
>
> probably not, but might be (50 old machine can be slower than 5 fast
> one)
>
>
R: In the previous post the 4 CPU machine is one of the eights I am using
right now they are all the same machines. So previous 4 cpu-slots run is
deffinettly 8 times smaller than my actual run which after two hours has
not finished the survey.

> The splitting being universal, no the only way to limit the number of run
> will be to merge those just before the submission on the cluster
> (I mean in cluster.py)
>
>
R: Uhmmm... this is a very nice idea...

> I'd say check the time takes by each of the steps, and look at the time
> for each of the jobs submitted.
> Then we can probably have more details analysis.
>
>
R: So surveying in the cluster using a queue of 8*4CPU takes about 15 jobs
done as average between every MG print (timing of every print is the one
you coded by you something like 20 or 30 seconds ) with a very high slot
occupancy (in average 28 slots in use). It has not finished in more than
two hours. yesterday's run submitting only 4 jobs to one of this machines
took about 1h and 30 minutes to do steps 1 2 and 3. But as I told you job
queued were largely smaller.

There is something awkward here.

I am pretty sure I am running the same cards all the time since I have a
script to run over several param cards w/o changing proc_card

There is something awkward here.

cheers,

arian

> Cheers,
>
> Olivier
>
> --
> To answer this request for more information, you can either reply to
> this email or enter your reply at the following page:
> https://answers.launchpad.net/madgraph5/+question/187669
>
> You received this question notification because you asked the question.
>

Hi oliver:

I'll try to give more input following the order of your questions

> What is your test process?
>
>
pp>gogo, go>tt~ne , go>tb~nc

being ne and nc all neutralinos and charginos, so actually it is triple top

> The jobs are in fact done in different step:
> 1) survey
> 2) refine 1
> 3) refine 2
> 4) store
> 5) pythia
> 6) pgs-delphes
>
>
R: My calculation stops at parton level, so steps 5 and 6 are ruled out.

> Each of those steps can start only when the previous one is fully finished.
> Which explains why all job are not send to the cluster at the same time.
> step 1-2-3 have a predefine splitting (a quite small splitting in fact)
>

R: When you said "predifined splitting" what's the actual meaning? see
bellow

1- in my laptop there were 512 jobs(64*8) for surveying gg as starter and
probably 2994(368*8) for qq as starter. I said probably because I am doing
the test in the laptop as I write this post and I have checke that for the
first qq it set 368 jobs in the queue and still 7 other configuratins for
quark-antiquark to go which I presume, for symmetry reasons, should be the
same. So may be 2994+512 = 3506

2- Cluster (8x4CPU, I opted for a single queue of quad core machines) there
are:

Idle: 2237 Running: 31 Finish: 2212

for a total of 4480 jobs in the survey

So unless unless jobs in my laptop scale differently (you can judge that)
there are 1000 jobs more submitted to cluster queue.

This is simply that point that one level finish and that he submitted
> the next level
>
>
R: Well in this case I could arque as posted before: there is no level
change. they are two different runs changing CPU availability.

> No the splitting is universal and not dependent of the cluster (in fact
> it's the same even in single machine)
>
>
R: I gave you the numbers above so probably I scaled wrong jobs in my laptop

> >Is this a signature of configuration/loading problems of my cluster?
>
> probably not, but might be (50 old machine can be slower than 5 fast
> one)
>
>
R: In the previous post the 4 CPU machine is one of the eights I am using
right now they are all the same machines. So previous 4 cpu-slots run is
deffinettly 8 times smaller than my actual run which after two hours has
not finished the survey.

> The splitting being universal, no the only way to limit the number of run
> will be to merge those just before the submission on the cluster
> (I mean in cluster.py)
>
>
R: Uhmmm... this is a very nice idea...

> I'd say check the time takes by each of the steps, and look at the time
> for each of the jobs submitted.
> Then we can probably have more details analysis.
>
>
R: So surveying in the cluster using a queue of 8*4CPU takes about 15 jobs
done as average between every MG print (timing of every print is the one
you coded by you something like 20 or 30 seconds ) with a very high slot
occupancy (in average 28 slots in use). It has not finished in more than
two hours. yesterday's run submitting only 4 jobs to one of this machines
took about 1h and 30 minutes to do steps 1 2 and 3. But as I told you job
queued were largely smaller.

There is something awkward here.

I am pretty sure I am running the same cards all the time since I have a
script to run over several param cards w/o changing proc_card

There is something awkward here.

cheers,

arian

> Cheers,
>
> Olivier
>
> --
> To answer this request for more information, you can either reply to
> this email or enter your reply at the following page:
> https://answers.launchpad.net/madgraph5/+question/187669
>
> You received this question notification because you asked the question.
>

Revision history for this message

Arian Abrahantes (arian-abrahantes) said on 2012-02-13:

#3

Dear oliver I came to a whole different issue wrking on this and it has to do wit the shortened of jobs queued. I am in the funny situation where my script logic

1-creta folder of the run
2- ./bin/newprocess
3- replace param_card
4- ./bin/generate_event
5 - retrieve final xsection
6- repeat 3 until finishing my param card

in this logivc the first run, as stated above creates 4480 jobs in the survey
however second pass (different param_card) xcreates 1984 jobs in he survey
third paramcard 640 jobs,
next paramcard 608

my xsection should peak around the second param_card as it does

now after fllowing close this issue with the job queue, I am not sure if the final results is right or wrong. Or if this inteded to be a feuture in MG.

I will close temporarily this thread and provide you a test script, if needed, so may be you could reproduce it in a different cluster and let me know.

bcheers,

arian

To post a message you must log in.

Ask a question

Edit question

Subscribers