text file busy / MadWeight multicore running problems

Asked by Kirill Skovpen

Hello,

When running MadWeight with multicore submission I get very often errors alike (running in a clean environment with no leftover processes):

compile
compile P0_qq_tzq_t_qwp_wp_qq_z_emep
compile P0_qq_tzq_t_qwp_wp_qq_z_mummup
compile P0_qq_tzq_t_qwp_wp_qq_z_tamtap
MadWeight code has been compiled.
check_events
time verif event Lhco_filter 0.231580972672
250 selected events for ./SubProcesses/P0_qq_tzq_t_qwp_wp_qq_z_emep subprocess
time Lhco_filter 0.232993125916
time verif event Lhco_filter 0.189543008804
250 selected events for ./SubProcesses/P0_qq_tzq_t_qwp_wp_qq_z_mummup subprocess
time Lhco_filter 0.196044921875
time verif event Lhco_filter 1.45345807076
0 selected events for ./SubProcesses/P0_qq_tzq_t_qwp_wp_qq_z_tamtap subprocess
time Lhco_filter 1.46044492722
submit_jobs
INFO: remove job currently running
/grid_mnt/opt__sbg__cms__ui5_data1/kskovpen/ttH/CMSSW_5_3_16_patch1/src/IPHCAnalysis/NTuple/macros/TTbarHiggs/run/procTZQ/MEM/MG5_aMC_v2_2_1/tZq_2L/SubProcesses/P0_qq_tzq_t_qwp_wp_qq_z_emep/fermi/sub.MW_driver.py1.25.25.verif_0.lhco.2000.weight.1: line 14: 5824 Terminated ./MW_driver.py 1 25 25 verif_0.lhco 2000 weight 1
cp: cannot stat `/tmp/run5715/output_1_1.xml': No such file or directory
/grid_mnt/opt__sbg__cms__ui5_data1/kskovpen/ttH/CMSSW_5_3_16_patch1/src/IPHCAnalysis/NTuple/macros/TTbarHiggs/run/procTZQ/MEM/MG5_aMC_v2_2_1/tZq_2L/SubProcesses/P0_qq_tzq_t_qwp_wp_qq_z_emep/fermi/sub.MW_driver.py1.0.25.verif_0.lhco.2000.weight.0: line 14: 5839 Terminated ./MW_driver.py 1 0 25 verif_0.lhco 2000 weight 0
cp: cannot stat `/tmp/run5723/output_1_0.xml': No such file or directory

and then when I try to kill things manually I get:

Traceback (most recent call last):
  File "/grid_mnt/opt__sbg__cms__ui5_data1/kskovpen/ttH/CMSSW_5_3_16_patch1/src/IPHCAnalysis/NTuple/macros/TTbarHiggs/run/procTZQ/MEM/MG5_aMC_v2_2_1/tZq_2L/bin/internal/cluster.py", line 491, in launch
    stderr=subprocess.STDOUT)
  File "/grid_mnt/opt__sbg__cms__ui5_data1/kskovpen/ttH/CMSSW_5_3_16_patch1/src/IPHCAnalysis/NTuple/macros/TTbarHiggs/run/procTZQ/MEM/MG5_aMC_v2_2_1/tZq_2L/bin/internal/misc.py", line 531, in deco_f
    return f(arg, *args, **opt)
  File "/grid_mnt/opt__sbg__cms__ui5_data1/kskovpen/ttH/CMSSW_5_3_16_patch1/src/IPHCAnalysis/NTuple/macros/TTbarHiggs/run/procTZQ/MEM/MG5_aMC_v2_2_1/tZq_2L/bin/internal/misc.py", line 551, in Popen
    return subprocess.Popen(arg, *args, **opt)
  File "/cvmfs/cms.cern.ch/slc6_amd64_gcc472/external/python/2.6.4/lib/python2.6/subprocess.py", line 621, in __init__
    errread, errwrite)
  File "/cvmfs/cms.cern.ch/slc6_amd64_gcc472/external/python/2.6.4/lib/python2.6/subprocess.py", line 1126, in _execute_child
    raise child_exception
OSError: [Errno 26] Text file busy
^CTraceback (most recent call last):
  File "./bin/madweight.py", line 37, in <module>
    subprocess.call([sys.executable] + ['-O'] + sys.argv)
  File "/cvmfs/cms.cern.ch/slc6_amd64_gcc472/external/python/2.6.4/lib/python2.6/subprocess.py", line 470, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/cvmfs/cms.cern.ch/slc6_amd64_gcc472/external/python/2.6.4/lib/python2.6/subprocess.py", line 1157, in wait
stopping all operation
            in order to quit madweight please enter exit

Could you please let me know know If there is something I can do to avoid such issues during the MadWeight running ?

Thanks,

Kirill

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi Kirill,

One possible reason might be the presence of a process with 0 selected events:
as the following line indicates
0 selected events for ./SubProcesses/P0_qq_tzq_t_qwp_wp_qq_z_tamtap subprocess

That's might be the reason why you see directly the line
INFO: remove job currently running
Which indicates that the submision of the jobs fails a critical problem.

Another potential reason might be if you do not have writing access to the /tmp (which is the default directory for running in multi-core). Could you try to specify another directory for running in input/mg5_configuration.txt

Cheers,

Olivier

Can you help with this problem?

Provide an answer of your own, or ask Kirill Skovpen for more information if necessary.

To post a message you must log in.