launch ending with a non zero status

Asked by Christian

Hello,

I tried simulating p p > t t~ t t~[QCD] @ NLO and fixed order with req_acc_FO = 0.00005

After more or less 8 days of simulation, I got a warning : launch ending with a non zero status



Terminal
INFO:
INFO: Idle: 26, INFO: Idle: 25, INFO: Idle: 24, INFO: Idle: 23, INFO: Idle: 22, INFO: Idle: 21, INFO: Idle: 20, Idle: 19, INFO: Idle: 18, INFO: Idle: 17, INFO: Idle: 16, INFO: Idle: 15, INFO: Idle: 14, INFO: Idle: 13, INFO: Idle: 12, INFO: Idle: 11, INFO: Idle: 10, INFO: Idle: 9, INFO: Idle: 8, INFO:
Idle: 7,
INFO:
Idle: 6,
INFO:
INFO:
Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, Running: 12, INFO: Idle: 5, Running: 12, INFO: Idle: 4, Running: 12, INFO: Idle: 3, Running: 12, INFO: Idle: 2, Running: 12, INFO: Idle: 1, Running: 12, INFO: Idle: 0, Running: 12, INFO: Idle: 0, Running: 11, Idle: 0, Running: 10, Idle: 0, Running: 9, INFO: Idle: 0, Running: 8, INFO: Idle: 0, Running: 6, INFO: Idle: 0, Running: 5, INFO: Idle: 0, Running: 4, INFO: Idle: 0, Running: 3, INFO: Idle: 0, Running: 0,
Completed: 38 [ Completed: 39 [ Completed: 40 [ Completed: 41 [ Completed: 42 [ Completed: 43 [ Completed: 44 [ Completed: 45 [ Completed: 46 [ Completed: 47 [ Completed: 48 [ Completed: 49 [ Completed: 50 [ Completed: 51 [ Completed: 52 [ Completed: 53 [ Completed: 54 [ Completed: 55 [ Completed: 56 Completed: 57 [ Completed: 58 [ Completed: 59 [ Completed: 60 [ Completed: 61 [ Completed: 62 [ Completed: 63 [ Completed: 64 Completed: 65 [ Completed: 66 [ Completed: 67 [ Completed: 68 [ Completed: 70 [ Completed: 71 [ Completed: 72 [ Completed: 73 [ Completed: 76 [
sum of cpu time of last step: 0 second
INFO:
Current results:
Total cross section:
INFO: Refining results, step 2 INFO: Idle: 65, Running: 12, INFO: Idle: 64, Running: 12,
1
9h 30m 1 9h 30m 9h 34m 9h 34m 1 9h 34m ] 10h 23m 1 10h 23m 10h 23m 10h 23m 10h 44m 10h 44m 12h 13m 12h 13m 12h 14m 12h 17m ] 12h 17m 12h 17m ] 12h 36m 12h 57m 12h 58m 13h 6m 13h 7m 13h 7m 14h 10m 1 14h 19m 14h 20m 1 14h 20m 14h 26m 14h 26m 14h 26m 14h 28m 1 14h 30m 14h 39m 14h 48m 14h 55m 14h 59m
9.565e-03 +- 5.7e-06 pb
1
]
Completed: [current time: 16h12 ] Completed: 1 [ 84h 29m ]
INFO: Idle: 63, Running: 12, Completed: 2 [ 187h 17m ]
Terminal
Terminal
/home/pazs/Desktop/MG5_aMC_v3_5_1/november4tops/SubProcesses/PO_gg_tttxtx/ajob1: line 38: 3798992 Aborted
Running: 10, Completed: 4 [ 187h 17m ]
(core dumped) ../madevent_mintFO > log.txt <input_app.txt 2>&1
WARNING: program /home/pazs/Desktop/MG5_aMC_v3_5_1/november4tops/SubProcesses/PO_gg_tttxtx/ajob1 3 all 1 2 launch ends with non zero status: 134. Stop all computation INFO: Idle: 63, Running: 11, Completed: 3 [ 187h 17m INFO: Idle: 63, INFO: Idle: 63, Running: 9, INFO: Idle: 63, Running: 8, INFO: Idle: 63, Running: 7, INFO: Idle: 63, Running: 6, INFO: Idle: 63, Running: 5, INFO: Idle: 63, Running: 4, INFO: Idle: 63, Running: 3, INFO: Idle: 63, Running: 2, INFO: Idle: 63, Running: 1, INFO: Idle: 63, Running: 0,
Completed: 5 [ Completed: 6 [ Completed: 7 [ Completed: 8 [ Completed: 9 [ Completed: 10 [ Completed: 11 [ Completed: 12 [ Completed: 13 Completed: 14 [
187h 17m 187h 17m 187h 17m ] 187h 17m 187h 17m ] 187h 17m ] 187h 17m 1 187h 17m ] 187h 17m 187h 17m

It didn't format the way I expected, but I think the important part is:

"/home/pazs/Desktop/MG5_aMC_v3_5_1/november4tops/SubProcesses/PO_gg_tttxtx/ajob1: line 38: 3798992 Aborted
Running: 10, Completed: 4 [ 187h 17m ]
(core dumped) ../madevent_mintFO > log.txt <input_app.txt 2>&1
WARNING: program /home/pazs/Desktop/MG5_aMC_v3_5_1/november4tops/SubProcesses/PO_gg_tttxtx/ajob1 3 all 1 2 launch ends with non zero status: 134. Stop all computation"

The plan now is to see if changing the number of random seeds or decreasing the accuracy would help. The accuracy right now is at about 0.00005. (I mean: req_acc_FO = 0.00005 by that)

I wanted to ask if there are any additional suggestions, since this simulation takes a little bit longer than the others.

Best regards

Christian

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
Last query:
Last reply:
Revision history for this message
Best Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi,

I guess that your jobs are cancelled due to some external signal.
When googling exit code 134, it is indeed signaling that the code stoped due to receiving a signal to abort.

If you are running on a cluster it is likely related to a waltime or similar.
If it is running locally, this might be due to the amount of RAM used.

In any case requesting for such precision does not really make sense.
Also remember that asking a 10 times more accurate computation means that the running of the job will be 100 times longer (10 square). four top at NLO is already quite a beast of complexity, so it does not surprise me too much that it requires more than 8 days of computing.

Not sure why you ask so much precision, but maybe doing a biased generation will be more efficient.

Cheers,

Olivier

Revision history for this message
Christian (chris0990) said :
#2

Hi,

thank you for your fast answer. The reason I tried to run this with a higher precision is that I am making multiple runs for p p > ... and I want to keep the same settings.

At some point I had the problem that the statistical error was too high (for the transverse momentum of the top in pair production), so increasing req_acc_FO seemed like the best idea.
I started with 1 permil, then 1/10.000, but the errors in my plots where too big, which is why I ended up with 1/20.000

I wanted to keep the setting the same for all my runs in my thesis, but I guess that I will have to change the settings for the 4tops production

Thank you

Christian

Revision history for this message
Christian (chris0990) said :
#3

Thanks Olivier Mattelaer, that solved my question.