aMC@NLO on condor: Log Errors for failed jobs

Asked by senka duric

Dear experts,
I am trying to generate events in aMC@NLO for wz+2jets[QCD] process on condor. Some jobs fail with message that was already mentioned on the launchpad:

-------------------------------->
CRITICAL: Fail to run correctly job 4908087.
...
            file missing: /.../SubProcesses/P0_gdx_wptaptamuxg/GF28/results.dat
            Fails 1 times
            No resubmition.
<--------------------------------

I was already running aMC@NLO for some other processes on condor and everything was fine, so I don't think the problem is in the condor cluster settings. If I look into appropriate dir for this failed job (SubProcesses/P0_gdx_wptaptamuxg/GF28) in the log file I find this:

----------------------------->
...
accumulated results ABS integral = 0.2997E-03 +/- 0.2390E-04 ( 7.975 %)
accumulated results Integral = 0.1265E-03 +/- 0.1972E-04 ( 15.590 %)
accumulated results Virtual = 0.1434E-06 +/- 0.1689E-05 ( ******* %)
accumulated results Virtual ratio = 0.1532E+01 +/- 0.2702E+00 ( 17.638 %)
accumulated results ABS virtual = 0.1265E-04 +/- 0.1685E-05 ( 13.322 %)
accumulated results Born*ao2pi = 0.1337E-05 +/- 0.1667E-06 ( 12.475 %)
accumulated result Chi^2 per DoF = 0.4322E+01
update virtual fraction to: 0.016 -1.926
  1: 0 1 2 3 4
 ------- iteration 4
 Update # PS points (even): 10240 --> 10240
 Error 0 in kinematics_driver: fks variables
  -1.2487951322589195E-009 0.98703393655168381
Time in seconds: 1749
<----------------------------

In another example, wz+01jets with aMC@NLO but using BSM model, I also have some jobs failing, with the log file containing:

--------------------------->
########################################################################890799495182314
#*--- POLES CANCELLED ---- * #
#**********************You are using OneLOop-3.4****** #
#TRIED 100800 PS POINTS AND ONLY 0 GAVE A NON-ZERO INT#GRAND.
# for the evaluation of 1-loop scalar 1-, 2-, 3- and 4-point functions #
# --------------------------------- #
# author: Andreas van Hameren <email address hidden> #
# date: 02-01-2014000000002 #
#mdl_CWWWL2 = 0.0000000000000000 #
# Please cite 0.0000000000000000 #
# A. van Hameren,0000000000000 #
# Comput.Phys.Commun. 182 (2011) 2427-2438, arXiv:1007.4716 #
# A. van Hameren, C.G. Papadopoulos and R. Pittau, #
# JHEP 0909:106,2009, arXiv:0903.4665 #
# in publications with results obtained with the help of this program. #
#mdl_CphiBL2 = 0.0000000000000000 #
########################################################################
 alpha_s value used for the virtuals is (for the first PS point): 0.11890799495182314
 ---- POLES CANCELLED ----9
 ERROR: INTEGRAL APPEARS TO BE ZERO.
 TRIED 100800 PS POINTS AND ONLY 0 GAVE A NON-ZERO INTEGRAND.
Time in seconds: 19569999999999999

<-------------------------

What is the problem and how can I fix this?
Thank you in advance!

Greetings,
Senka

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
marco zaro Edit question
Last query:
Last reply:
Revision history for this message
marco zaro (marco-zaro) said :
#1

Dear Senka,
you have two kind of errors:
the first is
 Error 0 in kinematics_driver: fks variables
  -1.2487951322589195E-009 0.98703393655168381
which is a check failing because the first of the two numbers is negative (should be the fraction of energy carried by the extra parton).
However it is extremely small, and it should be negative because of some instability.
I subscribe to this question also Rikkert and Paolo, so that they are aware of the problem.
One solution that i propose would be to change
montecarlocounter.f (line 1256 in 2.3.1) and allow xi_i_fks to be less than 0, but > some technical threshold, like 1e-6 or so.
Paolo, Rik, would that be fine?
Would it be possible to discard such a point a priori?

the second error is there because some integration channels give 0.
Since you are saying you are using some bsm model, this may be due to the fact that some feynman diagrams are proportional to some couplings that you are setting to zero.
This problem should be solved by setting the couplings to a very small number...

Please let us know if you have further problems...
Cheers,

Marco

Revision history for this message
senka duric (senka-duric) said :
#2

Hi Marco,
thank you for your reply.
So for the second problem I run with setting all parameters to non zero value, like 0.00001, and now all jobs finish successfully.
For the first problem, I am not sure what is the suggestion. I allow it to be smaller then 0, but bigger then -1e-6? What would this do and is this safe?

Senka

Revision history for this message
Paolo Torrielli (paolo-torrielli) said :
#3

Hi Senka,

> For the first problem, I am not sure what is the suggestion.
> I allow it to be smaller then 0, but bigger then -1e-6?
> What would this do and is this safe?

I would modify the code in the following way:
if xi_i_fks < - 1e-6 —> stop
if - 1e-6 <= xi_i_fks < 0 —> set it to 0 and go on, maybe printing a message
if xi_i_fks >= 0 go on normally.

xi_i_fks is a rescaled energy, so allowing it to be (even slightly)
negative would crash the code somewhere else.

I’m anyway surprised by this error, as I’ve never seen the
FKS variable out of its range before.

Cheers.
Paolo

Can you help with this problem?

Provide an answer of your own, or ask senka duric for more information if necessary.

To post a message you must log in.