regarding setting no. of cores for a madgraph run

Asked by Disha Bhatia on 2021-02-14

Hello,

I am using a machine which has 64-cores. However when ruuning madgraph, it is currently using only first 3 cores,
and rest are left unused.

49066 abhaya 20 0 137028 85084 10468 R 100.0 0.5 14:43.52 madevent
49088 abhaya 20 0 134892 82776 10540 R 100.0 0.5 14:43.45 madevent
49085 abhaya 20 0 137032 84620 10720 R 100.0 0.5 14:43.44 madevent
  702 abhaya 20 0 42348 4148 3160 R 0.7 0.0 0:00.04 top
 1513 root 20 0 19788 772 512 S 0.3 0.0 474:06.69 irqbalance
    1 root 20 0 185348 4964 3376 S 0.0 0.0 4:25.04 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:09.35 kthreadd
    4 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
    7 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 mm_percpu_wq
    8 root 20 0 0 0 0 S 0.0 0.0 55:20.77 ksoftirqd/0

Is there a way in which we can increase the number of cores for the operation so that the jobs run faster?

In the mg5_configuration file, the run_mode has been set equal to 2.

Thanks,
Disha

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Disha Bhatia
Solved:
2021-02-14
Last query:
2021-02-14
Last reply:
2021-02-14

Hi,

Our parralelization is based on multi-processing not multi-threading so it is possible that
simple task will not be improve by having 64 core available.

You can set nb_core to 64 to be sure that we correctly identify that you have 64 core available.
But this should be the case.

After that you can play with hidden parameter to force the submission of a larger number of process but the
configuration of that depends of the type of run that you do (LO, NLO_PS, fNLO, MadWeight, loop-induced,...)

From your executable, I guess that you are doing LO or loop-induced.
Since loop-induced is very agressive on submitting a large number of process, I guess your issue is with LO.

Since 2.9.0,
We have a hidden section in the run_card available via the command
"update psoptim"

Then you will have that block in the run_card
#*********************************************************************
# Phase-Space Optim (advanced)
#*********************************************************************
   0 = job_strategy ! see appendix of 1507.00020 (page 26)
   0 = hard_survey ! force to have better estimate of the integral at survey for difficult mode like interference
   -1.0 = tmin_for_channel ! limit the non-singular reach of --some-- channel of integration related to T-channel diagram (value between -1 and 0), -1 is no impact
   -1 = survey_splitting ! for loop-induced control how many core are used at survey for the computation of a single iteration.
   2 = survey_nchannel_per_job ! control how many Channel are integrated inside a single job on cluster/multicore
   -1 = refine_evt_by_job ! control the maximal number of events for the first iteration of the refine (larger means less jobs)
   -O = global_flag ! fortran optimization flag use for the all code
     = aloha_flag ! fortran optimization flag for aloha function. Suggestions: '-ffast-math'
    = matrix_flag ! fortran optimization flag for matrix.f function. Suggestions: '-O3'

which should allow you to modify the pattern of job submission to match your need.

Settting job_strategy to "2" will pass you to the loop-induced mode of job submission where the number of submitted job will blow up (and the associated overheard) allowing to better saturate the number of core that you have available.

Cheers,

Olivier

> On 14 Feb 2021, at 10:25, Disha Bhatia <email address hidden> wrote:
>
> New question #695530 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/695530
>
> Hello,
>
> I am using a machine which has 64-cores. However when ruuning madgraph, it is currently using only first 3 cores,
> and rest are left unused.
>
> 49066 abhaya 20 0 137028 85084 10468 R 100.0 0.5 14:43.52 madevent
> 49088 abhaya 20 0 134892 82776 10540 R 100.0 0.5 14:43.45 madevent
> 49085 abhaya 20 0 137032 84620 10720 R 100.0 0.5 14:43.44 madevent
> 702 abhaya 20 0 42348 4148 3160 R 0.7 0.0 0:00.04 top
> 1513 root 20 0 19788 772 512 S 0.3 0.0 474:06.69 irqbalance
> 1 root 20 0 185348 4964 3376 S 0.0 0.0 4:25.04 systemd
> 2 root 20 0 0 0 0 S 0.0 0.0 0:09.35 kthreadd
> 4 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
> 7 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 mm_percpu_wq
> 8 root 20 0 0 0 0 S 0.0 0.0 55:20.77 ksoftirqd/0
>
> Is there a way in which we can increase the number of cores for the operation so that the jobs run faster?
>
> In the mg5_configuration file, the run_mode has been set equal to 2.
>
> Thanks,
> Disha
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Disha Bhatia (dishabhatia1989) said : #2

Thank you very much for the detailed response. This helps.