How to launch parallel jobs with different random seeds?

Asked by Ans

I am trying to run madgraph (with madminer https://madminer.readthedocs.io/en/latest/) and in the run card I have set the iseed to 0. However, since I want to parallelise it, I launch multiple jobs of 1k events each, and so that the jobs don't overwrite each other, I make them save the output to different folders (folder1, folder2,....).

Each job is launched 1 second after the next. But it seems that madgraph doesn't set the random seed by time. So even though iseed = 0, I see in hte LHE files produced that the iseed is always the same: 27.

Can I make madgraph set the seed by time?

Otherwise how can I parallelise the simulation?

I could generate a different run card for each job with a new random seed but hopefully there's a more elegant solution.

What are your suggestions?

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi,

We have plenty method of parallelization available dedicated to various type of cluster/machine/size of jobs.
The default mode of MG5aMC use all the thread available on your single machine for example.

We also have the idea of gridpack which are targetted for the event generation of very large sample.
The idea is to store the optimization of the code and then ask to generate N times a small number of events on a single core/thread.
In your case you do not store the results of the "survey" which might hurt you quite a lot in term of efficiency.

We can not use a random seed based on time since a lot of physicist believes strongly in reproducibility.
In top of that since the range of seed allowed in our code is quite limited, I prefer to keep the code like this.

One solution to your problem is to copy the file SubProcesses/randinit from the previous directory to the next such that
code creating the actual seed will take into account all the directories. I never heard anyone doing that.
(typically we run in "multi_run" mode and set the initial seed quite different in different directory for such kind of multi-directory run.)

Cheers,

Olivier

Revision history for this message
Ans (ansans) said :
#2

Hi Olivier,

Thanks for the reply. I'm happy to do it in the recommended way if possible. Currently what I do is I submit 200 jobs to my cluster (4 cores each), each does one run_multiple(), and the output for each job is saved in a different directory. To be sure to have different random seeds, the job includes generating a new run_card.dat file each time, with a different random seed. The random seed is (jobNumber*20 +10).

I'm not sure if this could cause problems. Is this similar to the recommended strategy?

I launch the jobs in parallel with no idea which directories will be written to first (it depends on which of the jobs runs first in my cluster). So I'm not sure I can copy something from one directory to another serially.

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

Hi,

This means that you create 200 different directory? This sounds a quite bad strategy.

Why not use the cluster mode included in our code? Is your cluster not supported?
Depending of your cluster configuration, you might need to customise the submitting script obviously. But this will allow you to create a single directory and let our code handle the parralelization (in that case you can directly request a large number of events someting like 500k)

For information we support the major job scheduller slurm, condor, pbs. and many others.

If for some reason, this is not a good strategy, then you can pass to the gridpack strategy.
Where you use the cluster in the above mode to create all the output of the machine learning part of the code. and then you can unpack that code directly on the node( local disk obviously), generates N events on a single thread and then move only the events file to scratch area. (and either let your job-scheduller to clean the local area or do it yourself).

Cheers,

Olivier

> On 20 Sep 2019, at 17:52, Ans <email address hidden> wrote:
>
> Question #684058 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/684058
>
> Status: Answered => Open
>
> Ans is still having a problem:
> Hi Olivier,
>
> Thanks for the reply. I'm happy to do it in the recommended way if
> possible. Currently what I do is I submit 200 jobs to my cluster (4
> cores each), each does one run_multiple(), and the output for each job
> is saved in a different directory. To be sure to have different random
> seeds, the job includes generating a new run_card.dat file each time,
> with a different random seed. The random seed is (jobNumber*20 +10).
>
> I'm not sure if this could cause problems. Is this similar to the
> recommended strategy?
>
> I launch the jobs in parallel with no idea which directories will be
> written to first (it depends on which of the jobs runs first in my
> cluster). So I'm not sure I can copy something from one directory to
> another serially.
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Ans (ansans) said :
#4

Hi Olivier,

I've been trying to use the cluster mode strategy but I'm not completely sure how it works. In the input/mg5_configuration.txt I change:
run_mode = 1
cluster_type = ge
cluster_queue = mc_longlasting

(My cluster uses the grid engine batch system where jobs are submitted with "qsub" command and the queue for long jobs on multiple cores (4) is the one mentioned above.

I tested it on very few events and I find that it just runs interactively like usual, it doesn't submit any jobs to the cluster.

You mentioned that I will need to customise the "submission script", where do I find it?

I will need to simulate something like 1M events for each parameter point (I tweak the couplings and simulate a few datasets, each with ~ 1M events)

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#5

HI,

The easiest method is to modify the file
madgraph/various/cluster.py
and edit the class "GECluster" (line 1506)
to fit the constraint of your cluster.

Cheers,

Olivier

> On 20 Dec 2019, at 12:53, Ans <email address hidden> wrote:
>
> Question #684058 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/684058
>
> Status: Answered => Open
>
> Ans is still having a problem:
> Hi Olivier,
>
> I've been trying to use the cluster mode strategy but I'm not completely sure how it works. In the input/mg5_configuration.txt I change:
> run_mode = 1
> cluster_type = ge
> cluster_queue = mc_longlasting
>
> (My cluster uses the grid engine batch system where jobs are submitted
> with "qsub" command and the queue for long jobs on multiple cores (4) is
> the one mentioned above.
>
> I tested it on very few events and I find that it just runs
> interactively like usual, it doesn't submit any jobs to the cluster.
>
> You mentioned that I will need to customise the "submission script",
> where do I find it?
>
> I will need to simulate something like 1M events for each parameter
> point (I tweak the couplings and simulate a few datasets, each with ~ 1M
> events)
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Can you help with this problem?

Provide an answer of your own, or ask Ans for more information if necessary.

To post a message you must log in.