p p > t t~ b b~ randomly failing
Hi everyone,
I'm currently trying to run jobs at NLO using the process p p > t t~ b b~, but they randomly fail around 70% of the time. The error given is always as follows:
INFO: Idle: 41, Running: 8, Completed: 0 [ current time: 09h55 ]
ttbb1/SubProces
^[[1;34mWARNING: program ttbb1/SubProces
ttbb1/SubProces
date: write error: Broken pipe
INFO: Idle: 41, Running: 7, Completed: 1 [ 6m 39s ]
ttbb1/SubProces
INFO: Idle: 41, Running: 6, Completed: 2 [ 6m 39s ]
ttbb1/SubProces
INFO: Idle: 41, Running: 5, Completed: 3 [ 6m 40s ]
ttbb1/SubProces
INFO: Idle: 41, Running: 4, Completed: 4 [ 6m 40s ]
ttbb1/SubProces
INFO: Idle: 41, Running: 3, Completed: 5 [ 6m 40s ]
ttbb1/SubProces
date: write error: Broken pipe
INFO: Idle: 41, Running: 2, Completed: 6 [ 6m 40s ]
ttbb1/SubProces
INFO: Idle: 41, Running: 0, Completed: 8 [ 6m 40s ]
And the error info in SubProcesses/
#######
# #
# You are using OneLOop-3.6 #
# #
# for the evaluation of 1-loop scalar 1-, 2-, 3- and 4-point functions #
# #
# author: Andreas van Hameren <email address hidden> #
# date: 18-02-2015 #
# #
# Please cite #
# A. van Hameren, #
# Comput.Phys.Commun. 182 (2011) 2427-2438, arXiv:1007.4716 #
# A. van Hameren, C.G. Papadopoulos and R. Pittau, #
# JHEP 0909:106,2009, arXiv:0903.4665 #
# in publications with results obtained with the help of this program. #
# #
#######
ERROR in OneLOop dilog2_r: r1,r2 = .24721463961207
ERROR in OneLOop dilog2_r: r1,r2 = .24721463961207
ERROR in OneLOop dilog2_r: r1,r2 = .24721463961207
ERROR in OneLOop dilog2_r: r1,r2 = .15609855898036
ERROR in OneLOop dilog2_r: r1,r2 = .15609855898036
ERROR in OneLOop dilog2_r: r1,r2 = .15858222042568
ERROR in OneLOop dilog2_r: r1,r2 = .15939439287578
ERROR in OneLOop dilog2_r: r1,r2 = .10118590118950
ERROR in OneLOop dilog2_r: r1,r2 = .16999994070893
...
The strange thing is, all these jobs are IDENTICAL other than the values of iseed, which I manually change when I run multiple jobs. I have been using a cluster ran by a SLURM job manager, and if I submit 10 identical jobs with the parameters:
10 nodes
4 tasks per core
6000 MB memory per core
gcc version 4.8.2
and my bash scripts includes:
generate p p > t t~ b b~ [QCD]
output ttbb1
launch ttbb1
set nevents 50000
set iseed 10*928475
I have played around with these parameters hundreds of times, yet the result is always the same: around 3 jobs completed successfully, and the last 7 failed with the above errors. I don't see how this error can be so random. Any ideas?
Thanks,
Zack
Question information
- Language:
- English Edit question
- Status:
- Solved
- Assignee:
- Hua-Sheng Shao Edit question
- Solved by:
- Zack
- Solved:
- Last query:
- Last reply:
Revision history for this message
|
#1 |
Dear Zack,
Which version of the code are you using?
Best regards,
Rikkert
Revision history for this message
|
#2 |
Hi Rikkert,
Thanks for the reply! I'm using the newest version of aMC@NLO, 2.3.0.
Thanks,
Zack
Revision history for this message
|
#3 |
Dear Zack,
Errors of the type:
ERROR in OneLOop dilog2_r: r1,r2 = .24721463961207
can be ignored; they are due to using a new version of the OneLoop library which prints this line rather than ignoring it.
Can you try to see if there is another error? Something like:
grep -i error ttbb1/SubProces
If this doesn't really tell us anything, can you check explicitly the log*.txt files in the
ttbb1/SubProces
directory for any other errors? From what you copied above, it might very well be that this was the problematic one.
Best regards,
Rikkert
Revision history for this message
|
#4 |
Hi Rikkert,
So you were right, I searched all those log.txt files and found this error in P0_uxu_
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x2B1533DB32E7
#1 0x2B1533DB38EE
#2 0x2B15347DD69F
#3 0x2B15347DD625
#4 0x2B15347DEE04
#5 0x2B153481B536
#6 0x2B1534820E65
#7 0x2B15348239B2
#8 0x8E5A89
#9 0x8E497F
#10 0x8E3F99
#11 0x8E5E9F
#12 0x888326
#13 0x88B9A2
#14 0x884526
#15 0x50027A
#16 0x5012CB
#17 0x4F2631
#18 0x4D0117
#19 0x4D29A4
#20 0x4D2D88
#21 0x4AB9E1
#22 0x46E63B
#23 0x46F537
#24 0x4B579B
#25 0x4B130B
#26 0x4B7C18
#27 0x4BB7EA
#28 0x2B15347C9D5C
#29 0x4096D8
Time in seconds: 400
Also I did a quick google search on the error recieved on the job's output "non zero status: 134.", this also corresponds to a SIGABRT error in C++. Although I have no idea how you'd fix that.
Thanks,
Zack
Revision history for this message
|
#5 |
Dear Zack,
Can you try running without IREGI? To do this, you'll have to change in ttbb1/SubProces
#MLReductionLib
!1|4|3|2
to
#MLReductionLib
1|4|2
(You don't need to recompile the code)
Best to check is to only rerun the one channel that gave the problem before. Execute
../madevent_mintMC < input_app.txt
from within the ttbb1/SubProces
> MLReductionLib = 1|4|2
Let me know if this works.
Best regards,
Rikkert
Revision history for this message
|
#6 |
Hi Rikkert,
So I did what you suggested and removed IREGI from the mix and re-ran just that one channel. This completed successfully with no errors.
Then just to test it out, I edited out the IREGI again and re-launched the entire job interactively from MG5. It then failed again, and upon inspecting the MadLoopParams.dat card afterwards, I found that the value for MLReductionLib had somehow changed back to default. I repeated this process multiple times to make sure, and every time it always went back to its old settings WITH IREGI.
So what does this mean? Is this something that I would be able to fix?
Thanks,
Zack
Revision history for this message
|
#7 |
Dear Zack,
I think that in that case you also need to change it in Cards/MadLoopPa
Best,
Rikkert
Revision history for this message
|
#8 |
Hi Rikkert,
Alright thanks! Will running all my jobs without IREGI make much of a difference? I looked it up in the aMC@NLO manual, and I guess it's a library for tensor integral reduction? And if I edit it out, it will just go ahead and use the other 3 libraries instead, correct?
Also, do you know of any way to edit the value for MLReductionLib from the SubProcesses/
Thanks,
Zack
Revision history for this message
|
#9 |
Let me forward this to Valentin, because he knows this better.
Best,
Rikkert
Revision history for this message
|
#11 |
Hi Zack,
About the edition of the MadLoopParams.dat, what happens is the following:
A) The jobs indeed use the MadLoopParams.dat card located inside each P* folder in SubProcesses.
B) When using the set command in the dynamic MG5 interface, only the file MadLoopParams.dat in the Cards directory is changed.
This is intentional, and what happens is that at the beginning of the run (when the run is launched from the MadGraph5 interface, as it should always be done except when debugging particular channels as you have been doing), MG5 will automatically write new MadLoopParams.dat card in each P* folder which will reflect your modifications performed in the card present in the Cards directory. This allows us to change all the default parameters (those you didn't touch and which are still prefixed with an exclamation mark) to those which are thought as being most appropriate for the process at hand.
Long story short, you are right that the 'set' command doesn't directly change the card in the P* directories, but it effectively does so when you actually launch the run.
Coming back to the IREGI issue, what would be really helpful is if you could re-run the 'P0_uxu_tbbxtx/GF2' channel locally but with IREGI this time and tell us which phase-space point triggers the issue.
In order to do this, please change the file 'BinothLHA.f'. Around the lines 160, find the following:
call sloopmatrixhel_
$ ,tolerance,
and change it so as to add the following lines above the 'call sloopmatrixhel(
write (*,*) '=== START virtual computation monitoring ==='
call getpoles(
call write_mom(p)
write (*,*) '=== END virtual computation monitoring ==='
call sloopmatrixhel_
$ ,tolerance,
You can then recompile by running :
cd <full_path>
export madloop=true
make madevent_mintMC
(if you use lhapdf you have to do "export lhapdf_
I assume that you are not using a customized version of fastjet, otherwise you would need to export the variable 'fastjet_config' similarly as above.
Then you can run the GF2 channel exactly like you did before, with
../madevent_mintMC < input_app.txt
from within the GF2 folder.
You should normally be able to reproduce the crash again, but this time you will be able to read off the PS point which was attempted with IREGI before the crash. Could you report this PS point to us here?
Thanks for your efforts in helping us improving the stability of our code.
Best,
PS: What matters is mostly the PS point, so if any of the lines above are problematic at the compilation time, just limit yourself to 'call write_mom(p)' as it is the crucial information to report here.
Revision history for this message
|
#12 |
Hi Valentin,
Thanks for the reply! I'm glad to help anyway I can. I did what you said, and here is what I got:
...
(A bunch of lines of OneLOop dilog2 stuff with Virtual Computation Monitoring seemingly working fine)
...
ERROR in OneLOop dilog2_r: r1,r2 = .26851131751105
ERROR in OneLOop dilog2_r: r1,r2 = .54738430704262
ERROR in OneLOop dilog2_r: r1,r2 = .81850439097397
ERROR in OneLOop dilog2_r: r1,r2 = .98519452961588
ERROR in OneLOop dilog2_r: r1,r2 = .98519452961588
=== START virtual computation monitoring ===
mu_r = 503.54576119702290
alpha_S = 9.4545587558498
1/eps**2 expected from MadFKS= -7.650666901004
1/eps expected from MadFKS= 4.9344541310788
Phase space point:
--
E | px | py | pz | m
0.
0.
0.
0.
0.
0.
Four-momentum conservation sum:
0.
---
=== END virtual computation monitoring ===
ERROR in OneLOop dilog2_r: r1,r2 = .16448418189163
ERROR in OneLOop dilog2_r: r1,r2 = .16448418189163
ERROR in OneLOop dilog2_r: r1,r2 = .35931084536695
ERROR in OneLOop dilog2_r: r1,r2 = .50116362486379
ERROR in OneLOop dilog2_r: r1,r2 = .66709464903267
ERROR in OneLOop dilog2_r: r1,r2 = .22410152491753
ERROR in OneLOop dilog2_r: r1,r2 = .34275743390657
ERROR in OneLOop dilog2_r: r1,r2 = .66709464903267
ERROR in OneLOop dilog2_r: r1,r2 = .18307688620557
ERROR in OneLOop dilog2_r: r1,r2 = .18307688620557
ERROR in OneLOop dilog2_r: r1,r2 = .37442138709574
ERROR in OneLOop dilog2_r: r1,r2 = .16448418189163
ERROR in OneLOop dilog2_r: r1,r2 = .66709464903267
ERROR in OneLOop dilog2_r: r1,r2 = .18307688620557
ERROR in OneLOop dilog2_r: r1,r2 = .37442138709536
ERROR in OneLOop dilog2_r: r1,r2 = .52290355814090
ERROR in OneLOop dilog2_r: r1,r2 = .30028176032301
ERROR in OneLOop dilog2_r: r1,r2 = .83839197252289
ERROR in OneLOop dilog2_r: r1,r2 = .16448418189163
ERROR in OneLOop dilog2_r: r1,r2 = .16448418189163
ERROR in OneLOop dilog2_r: r1,r2 = .66709464903267
ERROR in OneLOop dilog2_r: r1,r2 = .66709464903267
ERROR in OneLOop dilog2_r: r1,r2 = .18307688620557
ERROR in OneLOop dilog2_r: r1,r2 = .18307688620557
ERROR in OneLOop dilog2_r: r1,r2 = .65332686378803
ERROR in OneLOop dilog2_r: r1,r2 = .37442138709553
ERROR in OneLOop dilog2_r: r1,r2 = .16448418189163
ERROR in OneLOop dilog2_r: r1,r2 = .66709464903267
ERROR in OneLOop dilog2_r: r1,r2 = .18307688620557
ERROR in OneLOop dilog2_r: r1,r2 = .52290355814090
ERROR in OneLOop dilog2_r: r1,r2 = .35404399688536
ERROR in OneLOop dilog2_r: r1,r2 = .32896836378326
ERROR in OneLOop dilog2_r: r1,r2 = .32896836378326
ERROR in OneLOop dilog2_r: r1,r2 = .41919598626144
ERROR in OneLOop dilog2_r: r1,r2 = .60139634983655
ERROR in OneLOop dilog2_r: r1,r2 = .77827709053811
ERROR in OneLOop dilog2_r: r1,r2 = .26145177907045
ERROR in OneLOop dilog2_r: r1,r2 = .41130892068788
ERROR in OneLOop dilog2_r: r1,r2 = .77827709053811
ERROR in OneLOop dilog2_r: r1,r2 = .91538443102788
ERROR in OneLOop dilog2_r: r1,r2 = .91538443102788
ERROR in OneLOop dilog2_r: r1,r2 = .46894986352228
ERROR in OneLOop dilog2_r: r1,r2 = .32896836378326
ERROR in OneLOop dilog2_r: r1,r2 = .77827709053811
ERROR in OneLOop dilog2_r: r1,r2 = .91538443102788
ERROR in OneLOop dilog2_r: r1,r2 = .46894986352181
ERROR in OneLOop dilog2_r: r1,r2 = .56025381229382
ERROR in OneLOop dilog2_r: r1,r2 = .32126214532350
ERROR in OneLOop dilog2_r: r1,r2 = .89827711341738
ERROR in OneLOop dilog2_r: r1,r2 = .16448418189163
ERROR in OneLOop dilog2_r: r1,r2 = .16448418189163
ERROR in OneLOop dilog2_r: r1,r2 = .35931084536695
ERROR in OneLOop dilog2_r: r1,r2 = .50116362486379
ERROR in OneLOop dilog2_r: r1,r2 = .66709464903267
ERROR in OneLOop dilog2_r: r1,r2 = .22410152491753
ERROR in OneLOop dilog2_r: r1,r2 = .34275743390657
ERROR in OneLOop dilog2_r: r1,r2 = .66709464903267
ERROR in OneLOop dilog2_r: r1,r2 = .18307688620557
ERROR in OneLOop dilog2_r: r1,r2 = .18307688620557
ERROR in OneLOop dilog2_r: r1,r2 = .37442138709574
ERROR in OneLOop dilog2_r: r1,r2 = .16448418189163
ERROR in OneLOop dilog2_r: r1,r2 = .66709464903267
ERROR in OneLOop dilog2_r: r1,r2 = .18307688620557
ERROR in OneLOop dilog2_r: r1,r2 = .37442138709536
ERROR in OneLOop dilog2_r: r1,r2 = .52290355814090
ERROR in OneLOop dilog2_r: r1,r2 = .30028176032301
ERROR in OneLOop dilog2_r: r1,r2 = .83839197252289
ERROR in OneLOop dilog2_r: r1,r2 = .16448418189163
ERROR in OneLOop dilog2_r: r1,r2 = .16448418189163
ERROR in OneLOop dilog2_r: r1,r2 = .66709464903267
ERROR in OneLOop dilog2_r: r1,r2 = .66709464903267
ERROR in OneLOop dilog2_r: r1,r2 = .18307688620557
ERROR in OneLOop dilog2_r: r1,r2 = .18307688620557
ERROR in OneLOop dilog2_r: r1,r2 = .65332686378803
ERROR in OneLOop dilog2_r: r1,r2 = .37442138709553
ERROR in OneLOop dilog2_r: r1,r2 = .16448418189163
ERROR in OneLOop dilog2_r: r1,r2 = .66709464903267
ERROR in OneLOop dilog2_r: r1,r2 = .18307688620557
*** glibc detected *** ../madevent_mintMC: double free or corruption (out): 0x000000000b700b40 ***
======= Backtrace: =========
/lib64/
/lib64/
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
/lib64/
../madevent_
======= Memory map: ========
00400000-00af5000 r-xp 00000000 9a0:2bdd6 144669860016793138 /sfs/lustre/
00cf4000-00d34000 rw-p 006f4000 9a0:2bdd6 144669860016793138 /sfs/lustre/
00d34000-0a706000 rw-p 00000000 00:00 0
0b647000-0b984000 rw-p 00000000 00:00 0 [heap]
2af99a23c000-
2af99a25c000-
2af99a45b000-
2af99a45c000-
2af99a45d000-
2af99a45e000-
2af99a549000-
2af99a748000-
2af99a750000-
2af99a752000-
2af99a768000-
2af99a87d000-
2af99aa7d000-
2af99aaa3000-
2af99ab26000-
2af99ad25000-
2af99ad26000-
2af99ad27000-
2af99ad3c000-
2af99af3c000-
2af99af3d000-
2af99af3e000-
2af99af79000-
2af99b178000-
2af99b179000-
2af99b303000-
2af99b503000-
2af99b507000-
2af99b508000-
7fff8da3a000-
7fff8dbff000-
ffffffffff60000
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x2AF99A7812E7
#1 0x2AF99A7818EE
#2 0x2AF99B1AB69F
#3 0x2AF99B1AB625
#4 0x2AF99B1ACE04
#5 0x2AF99B1E9536
#6 0x2AF99B1EEE65
#7 0x2AF99B1F19B2
#8 0x8E5DA9
#9 0x8E4C9F
#10 0x8E42B9
#11 0x8E61BF
#12 0x888646
#13 0x88BCC2
#14 0x884846
#15 0x50059A
#16 0x5015EB
#17 0x4F2951
#18 0x4D0437
#19 0x4D2CC4
#20 0x4D30A8
#21 0x4ABCB1
#22 0x46E63B
#23 0x46F537
#24 0x4B5ABB
#25 0x4B162B
#26 0x4B7F38
#27 0x4BBB0A
#28 0x2AF99B197D5C
#29 0x4096D8
Aborted
Is this what you're looking for? If not let me know, or if you want me to do anything additional. I have all this output piped into a text file too if you want that, I just don't see anywhere on this page to attach additional files?
Thanks,
Zack
Revision history for this message
|
#13 |
Hi Zak,
I meant to put the line:
write (*,*) '=== END virtual computation monitoring ==='
*after* the call to MadLoop, i.e.
call sloopmatrixhel_
$ ,tolerance,
write (*,*) '=== END virtual computation monitoring ==='
So as to be sure that the segfault comes from within MadLoop (i.e. the 'END virtual comp...' line does not appear in the log).
But anyway, it seems clear that this is the case, but I just want to be 100% sure.
Also, it seems that the 'pz' component of the momenta specification is cropped and ends with a $ symbol. I suppose this is a feature of your editor. If this is indeed the case could you send me the full specification of that last PS point?
Revision history for this message
|
#14 |
Hi Valentin,
Alright I think I got it right this time, let me know if it needs to be altered at all. Here is the output:
ERROR in OneLOop dilog2_r: r1,r2 = .56025381229382
ERROR in OneLOop dilog2_r: r1,r2 = .32126214532350
ERROR in OneLOop dilog2_r: r1,r2 = .89827711341738
=== END virtual computation monitoring ===
=== START virtual computation monitoring ===
mu_r = 177.70013629524848
alpha_S = 0.10805585305410952
1/eps**2 expected from MadFKS= -2.189514217359
1/eps expected from MadFKS= -2.014507200618
Phase space point:
--
E | px | py | pz | m
0.
0.
0.
0.
0.
0.
Four-momentum conservation sum:
-0.
---
ERROR in OneLOop dilog2_r: r1,r2 = .16624250630229
ERROR in OneLOop dilog2_r: r1,r2 = .16624250630229
ERROR in OneLOop dilog2_r: r1,r2 = .82073614092109
ERROR in OneLOop dilog2_r: r1,r2 = .82073614092109
ERROR in OneLOop dilog2_r: r1,r2 = .16624250630229
ERROR in OneLOop dilog2_r: r1,r2 = .82073614092109
ERROR in OneLOop dilog2_r: r1,r2 = .32530045079029
ERROR in OneLOop dilog2_r: r1,r2 = .32523948708655
ERROR in OneLOop dilog2_r: r1,r2 = .32091304232165
ERROR in OneLOop dilog2_r: r1,r2 = .32962672212495
ERROR in OneLOop dilog2_r: r1,r2 = .16624250630229
ERROR in OneLOop dilog2_r: r1,r2 = .16624250630229
ERROR in OneLOop dilog2_r: r1,r2 = .82073614092109
ERROR in OneLOop dilog2_r: r1,r2 = .82073614092109
ERROR in OneLOop dilog2_r: r1,r2 = .16624250630229
ERROR in OneLOop dilog2_r: r1,r2 = .82073614092109
ERROR in OneLOop dilog2_r: r1,r2 = .32091304232165
ERROR in OneLOop dilog2_r: r1,r2 = .32962672212495
ERROR in OneLOop dilog2_r: r1,r2 = .16624250630229
ERROR in OneLOop dilog2_r: r1,r2 = .16624250630229
ERROR in OneLOop dilog2_r: r1,r2 = .82073614092109
ERROR in OneLOop dilog2_r: r1,r2 = .82073614092109
ERROR in OneLOop dilog2_r: r1,r2 = .16624250630229
ERROR in OneLOop dilog2_r: r1,r2 = .82073614092109
ERROR in OneLOop dilog2_r: r1,r2 = .32530045079029
ERROR in OneLOop dilog2_r: r1,r2 = .32523948708655
ERROR in OneLOop dilog2_r: r1,r2 = .32091304232165
*** glibc detected *** ../madevent_mintMC: double free or corruption (out): 0x000000000af7eb40 ***
======= Backtrace: =========
/lib64/
/lib64/
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
../madevent_
/lib64/
../madevent_
======= Memory map: ========
00400000-00af5000 r-xp 00000000 9a0:2bdd6 144669860016796217 /sfs/lustre/
00cf4000-00d34000 rw-p 006f4000 9a0:2bdd6 144669860016796217 /sfs/lustre/
00d34000-0a706000 rw-p 00000000 00:00 0
0aec5000-0b202000 rw-p 00000000 00:00 0 [heap]
2ad1b3643000-
2ad1b3663000-
2ad1b3862000-
2ad1b3863000-
2ad1b3864000-
2ad1b3865000-
2ad1b3950000-
2ad1b3b4f000-
2ad1b3b57000-
2ad1b3b59000-
2ad1b3b6f000-
2ad1b3c84000-
2ad1b3e84000-
2ad1b3eaa000-
2ad1b3f2d000-
2ad1b412c000-
2ad1b412d000-
2ad1b412e000-
2ad1b4143000-
2ad1b4343000-
2ad1b4344000-
2ad1b4345000-
2ad1b4380000-
2ad1b457f000-
2ad1b4580000-
2ad1b470a000-
2ad1b490a000-
2ad1b490e000-
2ad1b490f000-
7fffa386b000-
7fffa39cd000-
ffffffffff60000
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x2AD1B3B882E7
#1 0x2AD1B3B888EE
#2 0x2AD1B45B269F
#3 0x2AD1B45B2625
#4 0x2AD1B45B3E04
#5 0x2AD1B45F0536
#6 0x2AD1B45F5E65
#7 0x2AD1B45F89B2
#8 0x8E5D59
#9 0x8E4C4F
#10 0x8E4269
#11 0x8E616F
#12 0x8885F6
#13 0x88BC72
#14 0x8847F6
#15 0x50054A
#16 0x50159B
#17 0x4F2901
#18 0x4D03E7
#19 0x4D2C74
#20 0x4D3058
#21 0x4ABC53
#22 0x46E63B
#23 0x46F537
#24 0x4B5A6B
#25 0x4B15DB
#26 0x4B7EE8
#27 0x4BBABA
#28 0x2AD1B459ED5C
#29 0x4096D8
Aborted
Revision history for this message
|
#15 |
Thanks for the details.
So basically for the process 'u~ u > t b b~ t~' and the phase space point:
mu_r = 177.70013629524848
alpha_S = 0.10805585305410952
0.1777003392540
0.1777003392540
0.1730002240129
0.4700127610796
0.4700121659678
0.1730002052246
IREGI crashes. (I forgot to ask you to printout the helicity configuration picked, but that is not too relevant here).
What baffles me about this PS point is how soft it is, I'm not sure how it happens that such a soft kinematic configuration gets probed. I supposed it is bound to, when throwing sufficiently many points, and this would explain why the issue only randomly happens.
I tried to reproduce this issue locally on my mac, and unfortunately, even though the result returned by IREGI is completely unstable, it doesn't crash. This isn't too surprising however, because the Mac architecture is typically less sensitive to this memory issues than LINUX distributions.
Huasheng is the author of IREGI and has a CERN account (so that he can test this directly in the same environment), so I'll forward this issue to him. Sorry for all the bouncing.
In the meantime, you can simply disable IREGI for now, as instructed by Rikkert.
Thanks again for reporting this and for your help resolving this issue.
Revision history for this message
|
#16 |
Of course, I'm glad I could help. If you need anything else at all from my side, just let me know.
Also, thank you so much to both Valentin and Rikkert for your help in fixing this problem which I've been trying to deal with for quite some time now. I appreciate it!
Thanks,
Zack