Slurm cluster cluster_time setting
Hi,
I set the MG5 with:
cluster_type = slurm
cluster_queue = sixhour
in the input/mg5_configure file and in the cluster.py file, I add:
command = ['sbatch', '-o', stdout,
with the "-t" to set the cluster time, however it works not good, in the "run_02_
automatic_
The cluster _time is set to be None.
Do you know how to set the time properly?
If I already have a madgraph output, do you know how to set the cluster time in this folder?
Thank you!
Question information
- Language:
- English Edit question
- Status:
- Answered
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Revision history for this message
|
#1 |
You have two cluster.py
I guess you modified the wrong one (the one used depends on how you run the code)
- madgraph/
- bin/internal/
Otherwise you can also use the PLUGIN approach to define your own cluster class which is cleaner.
Cheers,
Olivier
> On 12 May 2020, at 00:43, Li <email address hidden> wrote:
>
> New question #690648 on MadGraph5_aMC@NLO:
> https:/
>
> Hi,
> I set the MG5 with:
>
> cluster_type = slurm
> cluster_queue = sixhour
>
> in the input/mg5_configure file and in the cluster.py file, I add:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-t', "6:00:00",
> '-e', stderr, prog] + argument
>
> with the "-t" to set the cluster time, however it works not good, in the "run_02_
>
> MadEvent Options
> ----------------
> automatic_
> notification_center : True
> cluster_temp_path : None
> cluster_memory : None
> cluster_size : 100
> cluster_queue : sixhour (user set)
> nb_core : 16 (user set)
> cluster_time : None
> run_mode : 1 (user set)
>
> The cluster _time is set to be None.
>
> Do you know how to set the time properly?
> If I already have a madgraph output, do you know how to set the cluster time in this folder?
> Thank you!
>
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.
Revision history for this message
|
#2 |
Hi Olivier,
I checked that both two cluster files are changed ( I changed madgraph/
Do you think if I should use other commands like "--time" instead "-t"?
Or do you think add a line "cluster_time = 6:00:00" in input/mg5_
What do you mean by PLUGIN aprroach?
And one thing puzzles me is I don't know if I write the right command until some error occur then there is a debug.log file, is there any way to output the configure information before I get the debug.log file?
Thank you!
Revision history for this message
|
#3 |
Hi,
> I checked that both two cluster files are changed ( I changed madgraph/
Yes and No, when you do output the first file is copied into the second.
But after that they are different and will stay different.
> Do you think if I should use other commands like "--time" instead
> "-t"?
For me "-t" is working as expected there.
> What do you mean by PLUGIN aprroach?
https:/
In particular example 3.
> And one thing puzzles me is I don't know if I write the right
> command until some error occur then there is a debug.log file, is there
> any way to output the configure information before I get the debug.log
> file?
If you use the hardcoding approach (with either "-t" or "--time" hardcoded in cluster.py), I do not understand why you need that.
but you can see that information by typing
"display options"
Cheers,
Olivier
> On 12 May 2020, at 08:54, Li <email address hidden> wrote:
>
> Question #690648 on MadGraph5_aMC@NLO changed:
> https:/
>
> Li posted a new comment:
> Hi Olivier,
> I checked that both two cluster files are changed ( I changed madgraph/
>
> Do you think if I should use other commands like "--time" instead
> "-t"?
>
> Or do you think add a line "cluster_time = 6:00:00" in
> input/mg5_
>
> What do you mean by PLUGIN aprroach?
>
> And one thing puzzles me is I don't know if I write the right
> command until some error occur then there is a debug.log file, is there
> any way to output the configure information before I get the debug.log
> file?
>
> Thank you!
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.
Revision history for this message
|
#4 |
Hi Olivier,
Thank you!
>>>>Yes and No, when you do output the first file is copied into the second.
But after that they are different and will stay different.
Yeah and I mean both of them I uesed '-t', "6:00:00". Is it possible I add it in the wrong place? I add it at line 1673, after '-J', me_dir
>>>>For me "-t" is working as expected there.
I used MG5 2.7.1.2, and I used the previous command, but the cluster_time is still None at the debug.log file. I also tried with "--time=6:00:00" instead of '-t', "6:00:00", it still can run but I don't know if this solve the problem.
>>>>>https:/
In particular example 3.
It seems a good way though I don't really understand...
>>>>If you use the hardcoding approach (with either "-t" or "--time" hardcoded in cluster.py), I do not understand why you need that.
I can not log in the cluster now because it's in the maintaince right now so I can not try "display options" now... but I will try it later. The reason is I used the "-t" but it dosen't work.
So I am quite confused now as if "-t", "6:00:00" is the right command but the debug.log file shows the cluster_time is still None...
Revision history for this message
|
#5 |
Hi,
> Is it
> possible I add it in the wrong place? I add it at line 1673, after '-J',
> me_dir
sounds good to me. That's what I did recently and it was working.
Are you sure that your issue is related to that?
When doing that you fully bypass the variable cluster_time so I would not worry about that....
Cheers,
Olivier
> On 12 May 2020, at 12:27, Li <email address hidden> wrote:
>
> Question #690648 on MadGraph5_aMC@NLO changed:
> https:/
>
> Li posted a new comment:
> Hi Olivier,
>
> Thank you!
>
>>>>> Yes and No, when you do output the first file is copied into the second.
> But after that they are different and will stay different.
>
> Yeah and I mean both of them I uesed '-t', "6:00:00". Is it
> possible I add it in the wrong place? I add it at line 1673, after '-J',
> me_dir
>
>>>>> For me "-t" is working as expected there.
>
> I used MG5 2.7.1.2, and I used the previous command, but the
> cluster_time is still None at the debug.log file. I also tried with "--
> time=6:00:00" instead of '-t', "6:00:00", it still can run but I don't
> know if this solve the problem.
>
>>>>>> https:/
> In particular example 3.
>
> It seems a good way though I don't really understand...
>
>>>>> If you use the hardcoding approach (with either "-t" or "--time"
> hardcoded in cluster.py), I do not understand why you need that.
>
>
> I can not log in the cluster now because it's in the maintaince right now so I can not try "display options" now... but I will try it later. The reason is I used the "-t" but it dosen't work.
>
>
>
> So I am quite confused now as if "-t", "6:00:00" is the right command but the debug.log file shows the cluster_time is still None...
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.
Revision history for this message
|
#6 |
Hi Olivier,
I see.
Use "display options", the cluster_time is set to be 6 hours, and no matter what I do in bin/internal/
But I tried with "-t", "0:00:10" in the bin/internal/
INFO: Idle: 3, Running: 59, Completed: 15 [ 0.16s ]
INFO: Idle: 3, Running: 59, Completed: 15 [ 0.25s ]
INFO: Idle: 0, Running: 43, Completed: 34 [ 30.4s ]
DEBUG: Job 10235593: missing output:
DEBUG: Job 10235594: missing output:
DEBUG: Job 10235595: missing output:
DEBUG: Job 10235596: missing output:
DEBUG: Job 10235597: missing output:
DEBUG: Job 10235598: missing output:
DEBUG: Job 10235605: missing output:
I will check if there is other reason...
Revision history for this message
|
#7 |
BTW, for example one error, it has a "resubmit" which I think it means one sixhour thread dead, which is wired. But the "option" do not specify the cluster time.
Working on SubProcesses
INFO: P2_gg_wpgqq_wp_lvl
INFO: P2_gq_wpggq_wp_lvl
INFO: P2_gq_wpqqq_wp_lvl
INFO: P2_qq_wpgqq_wp_lvl
INFO: P2_qq_wpggg_wp_lvl
INFO: P1_gg_wpqq_wp_lvl
INFO: P1_gq_wpgq_wp_lvl
INFO: P1_qq_wpqq_wp_lvl
INFO: P1_qq_wpgg_wp_lvl
INFO: Idle: 3, Running: 34, Completed: 40 [ 1s ]
INFO: Idle: 0, Running: 35, Completed: 42 [ 1.1s ]
INFO: Idle: 0, Running: 28, Completed: 49 [ 31.3s ]
INFO: Idle: 0, Running: 21, Completed: 56 [ 1m 1s ]
INFO: Idle: 0, Running: 10, Completed: 67 [ 1m 32s ]
INFO: Idle: 0, Running: 8, Completed: 69 [ 2m 3s ]
INFO: Idle: 0, Running: 7, Completed: 70 [ 2m 33s ]
INFO: Idle: 0, Running: 4, Completed: 73 [ 3m 3s ]
INFO: Idle: 0, Running: 3, Completed: 74 [ 3m 33s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 4m 3s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 4m 34s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 5m 4s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 5m 34s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 6m 4s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 6m 34s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 7m 4s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 7m 35s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 8m 5s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 8m 35s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 9m 5s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 9m 35s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 10m 5s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 10m 36s ]
INFO: Idle: 0, Running: 1, Completed: 76 [ 11m 6s ]
WARNING: resubmit job (for the 1 times)
INFO: Idle: 1, Running: 0, Completed: 77 [ 11m 36s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 12m 6s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 12m 36s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 13m 7s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 13m 37s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 14m 7s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 14m 37s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 15m 7s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 15m 37s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 16m 7s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 16m 38s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 17m 8s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 17m 38s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 18m 8s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 18m 38s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 19m 8s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 19m 38s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 20m 9s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 20m 39s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 21m 9s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 21m 39s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 22m 9s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 22m 40s ]
INFO: Idle: 0, Running: 1, Completed: 77 [ 23m 10s ]
CRITICAL: Fail to run correctly job 10175751.
with option: {'log': None, 'stdout': None, 'argument': ['0', '9', '10'], 'nb_submit': 1, 'stderr': None, 'prog': '/temp/
file missing: /temp/30day/
Fails 1 times
No resubmition.
INFO: All jobs finished
INFO: Idle: 0, Running: 0, Completed: 78 [ 23m 40s ]
INFO: End survey
refine 50000
Creating Jobs
INFO: Refine results to 50000
INFO: Generating 50000.0 unweighted events.
Error when reading /temp/30day/
Command "import command run.txt" interrupted in sub-command:
"multi_run 40" with error:
IOError : [Errno 2] No such file or directory: '/temp/
Please report this bug on https:/
More information is found in '/temp/
Please attach this file to your report.
Revision history for this message
|
#8 |
You should check this directory:
/temp/
and look at the associated log file, you might have more information on why the job fails.
Cheers,
Olivier
Revision history for this message
|
#9 |
Hi Olivier,
Just check the file,
Process in group number 2
A PDF is used, so alpha_s(MZ) is going to be modified
Old value of alpha_s from param_card: 0.11799999999999999
New value of alpha_s from PDF cteq6l1: 0.13000000000000000
Warning! ptj set to xqcut= 30.000000000000000 to improve integration efficiency
Note that this might affect non-radiated jets,
e.g. from decays. Use cut_decays=F in run_card.
Warning! mmjj set to xqcut= 30.000000000000000 to improve integration efficiency
Note that this might affect non-radiated jets,
e.g. from decays. Use cut_decays=F in run_card.
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
Define smin to 1008100.0000000000
******
* MadGraph/MadEvent *
* -------
* http://
* http://
* http://
* -------
* *
* PARAMETER AND COUPLING VALUES *
* *
******
External Params
-----
mdl_MB = 4.7000000000000002
mdl_MT = 173.00000000000000
mdl_MTA = 1.7769999999999999
mdl_MZ = 91.188000000000002
mdl_MH = 125.00000000000000
aEWM1 = 132.50700000000001
mdl_Gf = 1.1663900000000
aS = 0.11799999999999999
mdl_ymb = 4.7000000000000002
mdl_ymt = 173.00000000000000
mdl_ymtau = 1.7769999999999999
mdl_WT = 1.4915000000000000
mdl_WZ = 2.4414039999999999
mdl_WW = 2.0476000000000001
mdl_WH = 6.3823389999999
Internal Params
-----
mdl_conjg__CKM3x3 = 1.0000000000000000
mdl_CKM3x3 = 1.0000000000000000
mdl_conjg__CKM1x1 = 1.0000000000000000
mdl_complexi = (0.000000000000
mdl_MZ__exp__2 = 8315.2513440000002
mdl_MZ__exp__4 = 69143404.913893804
mdl_sqrt__2 = 1.4142135623730951
mdl_MH__exp__2 = 15625.000000000000
mdl_aEW = 7.5467711139788
mdl_MW = 80.419002445756163
mdl_sqrt__aEW = 8.6872153846781
mdl_ee = 0.30795376724436879
mdl_MW__exp__2 = 6467.2159543705357
mdl_sw2 = 0.22224648578577766
mdl_cw = 0.88190334743339216
mdl_sqrt__sw2 = 0.47143025548407230
mdl_sw = 0.47143025548407230
mdl_g1 = 0.34919219678733299
mdl_gw = 0.65323293034757990
mdl_vev = 246.21845810181637
mdl_vev__exp__2 = 60623.529110035903
mdl_lam = 0.12886910601690263
mdl_yb = 2.6995554250465
mdl_yt = 0.99366614581500623
mdl_ytau = 1.0206617000654
mdl_muH = 88.388347648318430
mdl_I1x33 = (2.699555425046
mdl_I2x33 = (0.993666145815
mdl_I3x33 = (0.993666145815
mdl_I4x33 = (2.699555425046
mdl_ee__exp__2 = 9.4835522759998
mdl_sw__exp__2 = 0.22224648578577769
mdl_cw__exp__2 = 0.77775351421422245
Internal Params evaluated point by point
-----
mdl_sqrt__aS = 0.34351128074635334
mdl_G__exp__2 = 1.4828317324943823
Couplings of sm
-----
GC_10 = -0.12177E+01 0.00000E+00
GC_11 = 0.00000E+00 0.12177E+01
GC_12 = 0.00000E+00 0.14828E+01
GC_100 = 0.00000E+00 0.46191E+00
Collider parameters:
------
Running at P P machine @ 27000.000000000000 GeV
PDF set = cteq6l1
alpha_s(Mz)= 0.1300 running at 1 loops.
alpha_s(Mz)= 0.1300 running at 1 loops.
Renormalization scale set on event-by-event basis
Factorization scale set on event-by-event basis
getting user params
Enter number of events and max and min iterations:
Number of events and iterations 1000 5 3
Enter desired fractional accuracy:
Desired fractional accuracy: 0.10000000000000001
Enter 0 for fixed, 2 for adjustable grid:
Suppress amplitude (0 no, 1 yes)?
Using suppressed amplitude.
Exact helicity sum (0 yes, n = number/event)?
Explicitly summing over helicities
Enter Configuration Number:
Running Configuration Number: 10
Not subdividing B.W.
Attempting mappinvarients 1 7
Completed mapping 7
about to integrate 13 1000 5 3 13 1
Using non-zero grid deformation.
13 dimensions 1000 events 13 invarients 5 iterations 1 config(s), (0.99)
Using h-tuple random number sequence.
Error opening grid
Using Uniform Grid! 28
Using uniform alpha 1.0000000000000000
Grid defined OK
Set CM energy to 27000.00
Mapping Graph 10 to config 10
Setting BW -1 1 80.419002445756163
Setting grid 2 0.12346E-05 1
Setting grid 3 0.12346E-05 1
Setting grid 4 0.12346E-05 1
Transforming s_hat 1/s 12 1.3828532235939
10 1 2 3 4 5 6 7 8 9 10 11 12 13
Masses: 0.000E+00 0.000E+00 0.000E+00 0.000E+00 0.000E+00 0.000E+00 0.000E+00
Using random seed offsets 10 : 3
with seed 60
Ranmar initialization seeds 14790 9438
******
* You are using the DiscreteSampler module *
* part of the MG5_aMC framework *
* Author: Valentin Hirschi *
******
Particle 3 4 5 6 7
Et > 10.0 0.0 30.0 30.0 30.0
E > 0.0 0.0 0.0 0.0 0.0
Eta < 2.5 -1.0 5.0 5.0 5.0
xqcut: 0.0 0.0 30.0 30.0 30.0
d R # 3 > -0.0 0.0 0.0 0.0 0.0
d R # 4 > -0.0 -0.0 0.0 0.0 0.0
d R # 5 > -0.0 -0.0 -0.0 0.0 0.0
d R # 6 > -0.0 -0.0 -0.0 -0.0 0.0
s min # 3> 0.0 0.0 0.0 0.0 0.0
s min # 4> 0.0 0.0 0.0 0.0 0.0
s min # 5> 0.0 0.0 0.0 900.0 900.0
s min # 6> 0.0 0.0 0.0 0.0 900.0
xqcutij # 3> 0.0 0.0 0.0 0.0 0.0
xqcutij # 4> 0.0 0.0 0.0 0.0 0.0
xqcutij # 5> 0.0 0.0 0.0 30.0 30.0
xqcutij # 6> 0.0 0.0 30.0 0.0 30.0
No points passed cuts!
Loosen cuts or increase max_events 5000000
Deleting file events.lhe
ls status:
events.lhe
grid_information
input_app.txt
results.dat
run_01_0_log.txt
run_01_1_log.txt
run_01_2_log.txt
run_01_3_log.txt
run1_app.log
Best,
Li
Can you help with this problem?
Provide an answer of your own, or ask Li for more information if necessary.