running Madgraph on condor

Asked by Sarah Malik

I'm getting an error when running Madgraph on condor.
My error is very similar to what was reported here:
https://answers.launchpad.net/madgraph5/+question/203756
so I tried one of the solutions that was suggested and changed the cluster.py file to :
   command = ['qsub','-o', os.path.relpath(stdout,cwd),
                   '-N', me_dir,
                   '-e', os.path.relpath(stderr,cwd),
                   '-V']

so it takes into account relative paths. However, this does not seem to have solved the problem.
I get the following error when running the ./bin/generate_events
 :[smalik@cmslpc15 TryZ]$ ./bin/generate_events
No module named madgraph.interface.extended_cmd
No module named madgraph.various.misc
************************************************************
* *
* W E L C O M E to M A D G R A P H 5 *
* M A D E V E N T *
* *
* * * *
* * * * * *
* * * * * 5 * * * * *
* * * * * *
* * * *
* *
* VERSION 5.1.4.8.2 *
* *
* The MadGraph Development Team - Please visit us at *
* https://server06.fynu.ucl.ac.be/projects/madgraph *
* *
* Type 'help' for in-line help. *
* *
************************************************************
load configuration from Cards/me5_configuration.txt
Using default text editor "vi". Set another one in ./input/mg5_configuration.txt
Using default eps viewer "gv". Set another one in ./input/mg5_configuration.txt
No valid web browser found. Please set in ./input/mg5_configuration.txt
generate_events
Which programs do you want to run?
  0 / auto : running existing card
  1 / parton : Madevent
  2 / pythia : MadEvent + Pythia.
  3 / pgs : MadEvent + Pythia + PGS.
 [0, 1, 2, 3, auto, parton, pythia, pgs][20s to answer]
0
Will run in mode parton
Do you want to edit one cards (press enter to bypass editing)?
  1 / param : param_card.dat (be carefull about parameter consistency, especially widths)
  2 / run : run_card.dat
  9 / plot : plot_card.dat
  Path to a valid card.
 [0, done, 1, param, 2, run, 9, plot, enter path][20s to answer]
0
Generating 100 events with run name run_02
survey run_02
compile directory
Using random number seed offset = 48
Running Survey
Creating Jobs
Working on SubProcesses
    P3_gg_llgqq
    P3_gq_llggq
    P3_qq_llggg
    P3_gq_llqqq
    P3_qq_llgqq
    P2_gg_llqq
    P2_gq_llgq
    P2_qq_llqq
    P2_qq_llgg
    P1_gq_llq
    P1_qq_llg
    P0_qq_ll
 Idle: 149 Running: 0 Finish: 0
 Idle: 149 Running: 0 Finish: 0
 Idle: 149 Running: 0 Finish: 0
 Idle: 137 Running: 1 Finish: 11
 Idle: 116 Running: 1 Finish: 32
 Idle: 29 Running: 8 Finish: 112
INFO: All jobs finished
Command "generate_events " interrupted in sub-command:
"generate_events" with error:
IOError : [Errno 2] No such file or directory: '/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/SubProcesses/P3_gg_llgqq/G1/results.dat'
Please report this bug on https://bugs.launchpad.net/madgraph5
More information is found in '/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/run_02_tag_1_debug.log'.
Please attach this file to your report.

and my run_02_tag_1_debug.log file looks like this:
#************************************************************
#* MadGraph/MadEvent 5 *
#* *
#* * * *
#* * * * * *
#* * * * * 5 * * * * *
#* * * * * *
#* * * *
#* *
#* *
#* VERSION 5.1.4.8.2 *
#* *
#* The MadGraph Development Team - Please visit us at *
#* https://server06.fynu.ucl.ac.be/projects/madgraph *
#* *
#************************************************************
#* *
#* Command File for MadEvent *
#* *
#* run as ./bin/madevent.py filename *
#* *
#************************************************************
generate_events
Traceback (most recent call last):
  File "/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/bin/internal/extended_cmd.py", line 549, in onecmd
    return cmd.Cmd.onecmd(self, line)
  File "/uscmst1/prod/sw/cms/slc5_amd64_gcc434/external/python/2.6.4-cms14/lib/python2.6/cmd.py", line 219, in onecmd
    return func(arg)
  File "/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/bin/internal/madevent_interface.py", line 1724, in do_generate_events
    postcmd=False)
  File "/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/bin/internal/extended_cmd.py", line 587, in exec_cmd
    stop = cmd.Cmd.onecmd(current_interface, line)
  File "/uscmst1/prod/sw/cms/slc5_amd64_gcc434/external/python/2.6.4-cms14/lib/python2.6/cmd.py", line 219, in onecmd
    return func(arg)
  File "/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/bin/internal/madevent_interface.py", line 2033, in do_survey
    cross, error = sum_html.make_all_html_results(self)
  File "/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/bin/internal/sum_html.py", line 319, in make_all_html_results
    P_comb.add_results(name, pjoin(P_path,name,'results.dat'), mfactor)
    stop = cmd.Cmd.onecmd(current_interface, line)
  File "/uscmst1/prod/sw/cms/slc5_amd64_gcc434/external/python/2.6.4-cms14/lib/python2.6/cmd.py", line 219, in onecmd
    return func(arg)
  File "/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/bin/internal/madevent_interface.py", line 2033, in do_survey
    cross, error = sum_html.make_all_html_results(self)
  File "/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/bin/internal/sum_html.py", line 319, in make_all_html_results
    P_comb.add_results(name, pjoin(P_path,name,'results.dat'), mfactor)
  File "/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/bin/internal/sum_html.py", line 109, in add_results
    oneresult.read_results(filepath)
  File "/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/bin/internal/cluster.py", line 40, in deco_f_retry
    return f(*args, **opt)
  File "/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/bin/internal/sum_html.py", line 52, in read_results
    for line in open(filepath):
IOError: [Errno 2] No such file or directory: '/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/SubProcesses/P3_gg_llgqq/G1/results.dat'
Value of current Options:
              web_browser : None
              text_editor : None
          pythia-pgs_path : /uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/pythia-pgs
                  td_path : /uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/td
             delphes_path :
             cluster_type : condor
         madanalysis_path : /uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/MadAnalysis
            cluster_queue : None
       group_subprocesses : Auto
         fortran_compiler : None
                  nb_core : 8
      exrootanalysis_path : /uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/ExRootAnalysis
               eps_viewer : None
                  timeout : 20
   automatic_html_opening : False
             cluster_mode : 1
             pythia8_path :
ignore_six_quark_processes : False
                 run_mode : 1

I'm trying to figure out whether this is a Madgraph issue or a condor issue.

Thanks!
Sarah

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi Sarah,

Note that the qsub line is not used by condor cluster. So this is a completely different problem.

The cluster seems fine but your jobs seems to runs extremelly fast so might be a problem for running the jobs on the node.
Could you look at the file
/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/SubProcesses/P3_gg_llgqq/G1/log.txt
to see if the program indeed runs and produces results?

A second posibility can happen: a too long filesystem latency. Do you have the file:
/uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/SubProcesses/P3_gg_llgqq/G1/results.dat?

Cheers,

Olivier

Revision history for this message
Sarah Malik (sarah-alam-malik) said :
#2

Hi Olivier,

Thanks for the prompt response.

There does indeed seem to be a log file produced in /uscms_data/d3/smalik/CMSSW_4_2_5/Madgraph/MadGraph5_v1_4_8_2/TryZ/SubProcesses/P3_gg_llgqq/G1/
It looks like the process did run, the log file starts with Process in group number 3
 Warning: parameter auto_ptj_mjj not found
          setting it to default value T
 Warning: parameter ptonium not found
          setting it to default value 0.0000000000000000
 Warning: parameter etaonium not found
          setting it to default value 100.000000000000000
 Warning: parameter xmtcentral not found
          setting it to default value 0.0000000000000000
 Warning: parameter d not found
          setting it to default value 1.00000000000000000
 Warning: parameter ptllmin not found
          setting it to default value 0.0000000000000000
 Warning: parameter ptllmax not found
          setting it to default value 100000.000000000000
 Warning: parameter ptl1min not found
          setting it to default value 0.0000000000000000
 Warning: parameter ptl1max not found
          setting it to default value 100000.000000000000
 Warning: parameter ptl2min not found
          setting it to default value 0.0000000000000000
 Warning: parameter ptl2max not found

......and ends with
       813 0 1224 0
 Accuracy: 0.009 0.100 0.046 0.036
 Finished due to accuracy 8.71003473307617286E-003 0.10000000000000001

 -------------------------------------------------------------------------------
 Accumulated results: Integral = 0.1444E-04
                        Std dev = 0.6601E-06
             Chi**2 per DoF. = 0.0363
 -------------------------------------------------------------------------------
 Found 922 events.
 Wrote 108 events.
 Correct xsec 1.44359372701633114E-005
 Event xsec 1.44359372701632572E-005
 Events wgts > 1: 5
 % Cross section > 1: 1.31972182503460248E-007 0.91419199206569957
-----------------------------------------------------
---------------------------
 Results Last 3 iters: Integral = 0.1444E-04
                         Std dev = 0.6601E-06
                 Chi**2 per DoF. = 0.0408
-----------------------------------------------------
---------------------------
 Status 0.10000000000000001 4 5

However, there is no results. dat file produced.

Thanks for your help!
Sarah

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

Hi Sarah,

Could you install bzr and install the following branch on your cluster:
lp:~maddevelopers/madgraph5/CMS
(via the command bzr branch lp:~maddevelopers/madgraph5/CMS)

This version should be more stable on the filesystem point of view which might be linked to your problem.
So I think that this should solve your problem. Let me know.
(Those modification will be include in the 1.5.0 release which is comming in one/two weeks)

Cheers,

Olivier

Can you help with this problem?

Provide an answer of your own, or ask Sarah Malik for more information if necessary.

To post a message you must log in.