Jobs failing randomly while showering Gridpack MG5 events

Asked by Krishna Kulkarni

Hi,

I am trying to generate 400 LHE files 10k events per file from Gridpack. About half of them successfully generates LHE files but other half are getting error similar to this one :

----------------------------------------------------------------------------------------------
05:40:44 Error detected in "launch --parton --only_generation -f --name=run_01"
05:40:44 write debug file /nfs/dust/atlas/user/kulkarnk/qualification_task/hard_event_generation/gridpack_4/run/1/madevent/run_01_tag_1_debug.log
05:40:44 If you need help with this issue please contact us on https://answers.launchpad.net/madgraph5
05:40:44 aMCatNLOError : An error occurred during the collection of results.
05:40:44 Please check the .log files inside the directories which failed:
05:40:44 /nfs/dust/atlas/user/kulkarnk/qualification_task/hard_event_generation/gridpack_4/run/1/madevent/SubProcesses/P2_ucx_ttxucx/GF3/log.txt
05:40:44 
05:40:44 quit
05:40:44 INFO:
05:40:46
05:40:46 Py:MadGraphUtils INFO Copying generated events to /nfs/dust/atlas/user/kulkarnk/qualification_task/hard_event_generation/gridpack_4/run/1.
05:40:46 Py:MadGraphUtils ERROR MadSpin was run but can't find output folder Events/run_01_decayed_1.
05:40:46 Shortened traceback (most recent user call last):
05:40:46 File "/cvmfs/atlas.cern.ch/repo/sw/software/x86_64-slc6-gcc47-opt/19.2.5/MCProd/19.2.5.19.5/InstallArea/jobOptions/EvgenJobTransforms/skeleton.GENtoEVGEN.py", line 225, in <module>
05:40:46 include(jo)
05:40:46 File "../../gen/aMCatNLO_Herwigpp_EvtGen_NonAllHad_GridPack.py", line 146, in <module>
05:40:46 generate(run_card_loc='run_card.dat',param_card_loc='aMcAtNlo_param_card_loopsmnobmass.dat',njobs=jobs,proc_dir=process_dir,run_name=runName,madspin_card_loc=madspin_card_loc,grid_pack=gridpack_mode,gridpack_dir=gridpack_dir,nevents=nevents,random_seed=runArgs.randomSeed,mode=nmode,cluster_type='pbs',gridpack_compile=True,required_accuracy=0.1)
05:40:46 File "/afs/desy.de/user/k/kulkarnk/private/qualification_task/new_madgraph/InstallArea/python/MadGraphControl/MadGraphUtils.py", line 159, in generate
05:40:46 generate_from_gridpack(run_name=run_name,gridpack_dir=gridpack_dir,nevents=nevents,random_seed=random_seed,card_check=proc_dir,param_card=param_card_loc,madspin_card=madspin_card_loc,extlhapath=extlhapath,gridpack_compile=gridpack_compile)
05:40:46 File "/afs/desy.de/user/k/kulkarnk/private/qualification_task/new_madgraph/InstallArea/python/MadGraphControl/MadGraphUtils.py", line 742, in generate_from_gridpack
05:40:46 raise RuntimeError('MadSpin was run but can\'t find output folder %s.'%NLOdir)
05:40:46 RuntimeError: MadSpin was run but can't find output folder Events/run_01_decayed_1.
05:40:46 Py:Athena INFO leaving with code 8: "an unknown exception occurred"
----------------------------------------------------------------------------------------------

The only difference between different jobs is different random number. Also, this pattern of failing is random and not specific to any particular random number seed.

Please let me know if you can help me. Thanks!

Question information

Language:
English Edit question
Status:
Needs information
For:
MadGraph5_aMC@NLO Edit question
Assignee:
marco zaro Edit question
Last query:
Last reply:
Whiteboard:
Can you copy- paste the last lines of /nfs/dust/atlas/user/kulkarnk/qualification_task/hard_event_generation/gridpack_4/run/1/madevent/SubProcesses/P2_ucx_ttxucx/GF3/log.txt Marco will need this to investigate what the problem is. Olivier
Revision history for this message
Krishna Kulkarni (kulkarnk) said :
#1

Hi Olivier,

This is the output from log.txt :

-----------------------------------------------------------------------------
 ERROR :: MadFKS parameter file FKS_params.dat could not be found or is malformed. Please specify it.
Time in seconds: 0
-----------------------------------------------------------------------------

Revision history for this message
marco zaro (marco-zaro) said :
#2

Hi,
May there be some problem related to copying the FKS_params.dat file to the nodes?
I have never seen this error before..
Best,
Marco

Can you help with this problem?

Provide an answer of your own, or ask Krishna Kulkarni for more information if necessary.

To post a message you must log in.