IOError : [Errno 2] No such file or directory: SubProcesses/P0_gg_h3z/G1/results.dat For loop induced process

Asked by Khawla Jaffel

Hello experts,

I keep seeing this error every time i run mg 2.6.5 for loop induced process " Only " with run_mode =1 on condor .

INFO: Idle: 1, Running: 0, Completed: 3 [ 3m 12s ]
INFO: Idle: 1, Running: 0, Completed: 3 [ 4m 13s ]
INFO: Idle: 1, Running: 0, Completed: 3 [ 5m 13s ]
INFO: Idle: 1, Running: 0, Completed: 3 [ 6m 13s ]
INFO: All jobs finished
INFO: Idle: 0, Running: 0, Completed: 4 [ 7m 17s ]
Error when reading /eos/home-k/kjaffel/ZAAnalysis_run2/ZAPrivateProduction/genproductions/bin/MadGraph5_aMCatNLO/HToZATo2L2B_500p00_300p00_1p00_ggH_TuneCP5_13
TeV_pythia8/HToZATo2L2B_500p00_300p00_1p00_ggH_TuneCP5_13TeV_pythia8_gridpack/work/processtmp/SubProcesses/P0_gg_h3z/G1/results.dat

Command "generate_events pilotrun" interrupted with error:
IOError : [Errno 2] No such file or directory: '/eos/home-k/kjaffel/ZAAnalysis_run2/ZAPrivateProduction/genproductions/bin/MadGraph5_aMCatNLO/HToZATo2L2B_500p
00_300p00_1p00_ggH_TuneCP5_13TeV_pythia8/HToZATo2L2B_500p00_300p00_1p00_ggH_TuneCP5_13TeV_pythia8_gridpack/work/processtmp/SubProcesses/P0_gg_h3z/G1/results.dat'
Please report this bug on https://bugs.launchpad.net/mg5amcnlo
More information is found in '/eos/home-k/kjaffel/ZAAnalysis_run2/ZAPrivateProduction/genproductions/bin/MadGraph5_aMCatNLO/HToZATo2L2B_500p00_300p00_1p00_ggH
_TuneCP5_13TeV_pythia8/HToZATo2L2B_500p00_300p00_1p00_ggH_TuneCP5_13TeV_pythia8_gridpack/work/processtmp/pilotrun_tag_1_debug.log'.
Please attach this file to your report.

In ` /work/processtmp/SubProcesses/P0_gg_h3z/ ` I do have:

G1_{1, 2, 3 and 4 }/results.dat
But somehow something is not working right in hadding these final results , maybe !

I switch to run_mode = 0 , so I don't want spitted jobs:
But I get this warning :

Single-core mode not supported for loop-induced processes.
Beware that MG5aMC now changes your runtime options to a multi-core mode with only one active core.

Working on SubProcesses
INFO: P0_gg_h3z
INFO: Idle: 1, Running: 0, Completed: 0 [ current time: 13h38 ]
INFO: Idle: 0, Running: 0, Completed: 1 [ 56s ]
INFO: Idle: 0, Running: 0, Completed: 1 [ 56s ]
  === Results Summary for run: pilotrun tag: tag_1 ===

     Cross-section : 0.7721 +- 0.003944 pb
     Nb of events : 0

In addition I have this message :
INFO: fail to reach target 5000 !!!

Any idea why this happening ?

Thanks in advance,
Khawla

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

For the cluster error,
What is the content of the results.dat?
Some cluster have very slow filesystem and this might be the issue since loop-induced relies quite a lot on the filesystem to sync the various jobs.

For the rest, loop induced is indeed not compatible with run_mode=0.

In addition I have this message :
INFO: fail to reach target 5000 !!!

Since you are in gridpack generation mode, you bypass the event generation and therefore the code complains because you did not reach the "official" event target. I would not worry about this.

Cheers,

Olivier

Revision history for this message
Khawla Jaffel (kjaffel) said :
#2

Hi Olivier,

Thanks for the fast replay:

This all the content of G1_1/results.dat : sth similar for the rest G1_jobid/results.dat

 0.39115E+00 0.17418E-01 0.00000E+00 516 0 1 0 0.000E+00 0.80420E-03 0.39115E+00 0.00000E+00 0.00000E+00 0
   1 0.39115E+00 0.17418E-01 0.99574E+00 0.19598E+01 0.39115E+00
 <run_statistics>
<u_return_code> 0, 0, 0, 0, 0, 0, 0, 1300, 0, 0</u_return_code>
<t_return_code> 0, 1299, 0, 1, 0, 0, 0, 0, 0, 0</t_return_code>
<h_return_code> 0, 0, 1300, 0, 0, 0, 0, 0, 0, 0</h_return_code>
 <average_time>0.0001095407542128</average_time>
 <cumulated_time>0.1885980000000000</cumulated_time>
 <max_prec>0.0000000000000534</max_prec>
 <min_prec>0.0000000000000417</min_prec>
 <n_evals>1300</n_evals>
 </run_statistics>

Cheers,
Khawla

Revision history for this message
Best Olivier Mattelaer (olivier-mattelaer) said :
#3

Hi,

This file is correct but the code complains about
 G1/results.dat
which should be the merging of all the G1_X/results.dat direcotory.
Since you have only a small number of jobs it seems that the cluster fails to resubmit the second/third wave of job which are only possible after merging the result of the first wave. Do you see any message that jobs are submitted for the second iteration?

Condor cluster are quite particular in their implementation compare to other cluster, unfortunatly I do not have such type of cluster available to test anymore. Since loop induced are quite atypical job due to such wave system, it is possible that the two are not compatible.

If this is the case, one solution is to set in the run_card job_strategy to 0 (default for loop-induced is 2) and then you should go back to the normal job submission on cluster (but you will have less parralelization and much longer jobs)

Cheers,

Olivier

Revision history for this message
Khawla Jaffel (kjaffel) said :
#4

Hello Olivier,

I don't see any 2nd iteration, but you can have a look to the log file here ;
https://cp3.irmp.ucl.ac.be/~kjaffel/HToZATo2L2B_500p00_300p00_1p50_ggH_TuneCP5_13TeV_pythia8.log

Indeed set job_strategy to 0 is one way to go around this, thanks !
Cheers,
Khawla

Revision history for this message
Khawla Jaffel (kjaffel) said :
#5

Thanks Olivier Mattelaer, that solved my question.