Error when trying to create gridpack on cluster frontend node

Asked by Ponf

Hi,

I'm trying to run MadGraph (v.2.5.5) on a cluster. Since I'm trying a process with a huge number of final states the generation of even few events takes a lot of time and I need events for quite a few different mass configurations for a scan. My strategy is to generate a gridpack for each parameter point on a different node and then generate events from the gridpacks on each node.

My problem is that even trying to generate a gridpack on the frontend node of the cluster throws an error and I have no idea where it could come from. The relevant part of the run_01_tag_1_debug.log is:

#************************************************************
#* MadGraph5_aMC@NLO/MadEvent *
#* *
#* * * *
#* * * * * *
#* * * * * 5 * * * * *
#* * * * * *
#* * * *
#* *
#* *
#* VERSION 2.5.5 20xx-xx-xx *
#* *
#* The MadGraph5_aMC@NLO Development Team - Find us at *
#* https://server06.fynu.ucl.ac.be/projects/madgraph *
#* *
#************************************************************
#* *
#* Command File for MadEvent *
#* *
#* run as ./bin/madevent.py filename *
#* *
#************************************************************
generate_events -f
Traceback (most recent call last):
  File "/frontend_node/work/eventfiles/madgraph/pp2sqsqbarjj/bin/internal/extended_cmd.py", line 1430, in onecmd
    return self.onecmd_orig(line, **opt)
  File "/frontend_node/work/eventfiles/madgraph/pp2sqsqbarjj/bin/internal/extended_cmd.py", line 1384, in onecmd_orig
    return func(arg, **opt)
  File "/frontend_node/work/eventfiles/madgraph/pp2sqsqbarjj/bin/internal/madevent_interface.py", line 2054, in do_generate_events
    postcmd=False)
  File "/frontend_node/work/eventfiles/madgraph/pp2sqsqbarjj/bin/internal/extended_cmd.py", line 1457, in exec_cmd
    stop = Cmd.onecmd_orig(current_interface, line, **opt)
  File "/frontend_node/work/eventfiles/madgraph/pp2sqsqbarjj/bin/internal/extended_cmd.py", line 1384, in onecmd_orig
    return func(arg, **opt)
  File "/frontend_node/work/eventfiles/madgraph/pp2sqsqbarjj/bin/internal/madevent_interface.py", line 2906, in do_survey
    cross, error = sum_html.make_all_html_results(self)
  File "/frontend_node/work/eventfiles/madgraph/pp2sqsqbarjj/bin/internal/sum_html.py", line 718, in make_all_html_results
    Presults = collect_result(cmd, folder_names=folder_names, jobs=jobs)
  File "/frontend_node/work/eventfiles/madgraph/pp2sqsqbarjj/bin/internal/sum_html.py", line 690, in collect_result
    P_comb.add_results(name, pjoin(P_path,name,'results.dat'), mfactor)
  File "/frontend_node/work/eventfiles/madgraph/pp2sqsqbarjj/bin/internal/sum_html.py", line 412, in add_results
    oneresult.read_results(filepath)
  File "/frontend_node/work/eventfiles/madgraph/pp2sqsqbarjj/bin/internal/sum_html.py", line 306, in read_results
    self.xsec = data[:10]
ValueError: need more than 5 values to unpack
                              Run Options
                              -----------
               stdout_level : None

                         MadEvent Options
                         ----------------
     automatic_html_opening : False (user set)
        notification_center : True
          cluster_temp_path : None
             cluster_memory : None
               cluster_size : 100
              cluster_queue : None
                    nb_core : 64 (user set)
               cluster_time : None
                   run_mode : 2

                      Configuration Options
                      ---------------------
                text_editor : micro (user set)
         cluster_local_path : None
      cluster_status_update : (600, 30)
               pythia8_path : None (user set)
                  hwpp_path : None (user set)
            pythia-pgs_path : None (user set)
                    td_path : None (user set)
               delphes_path : None (user set)
                thepeg_path : None (user set)
               cluster_type : condor
          madanalysis5_path : None (user set)
           cluster_nb_retry : 1
                 eps_viewer : None
                web_browser : None
               syscalc_path : None (user set)
           madanalysis_path : None (user set)
                     lhapdf : /frontend_node/software/madgraph/MG5_aMC_v2_5_5/HEPTools/lhapdf6/bin/lhapdf-config (user set)
              f2py_compiler : None
                 hepmc_path : None (user set)
         cluster_retry_wait : 300
           fortran_compiler : None
                auto_update : 7 (user set)
        exrootanalysis_path : None (user set)
                    timeout : 60
               cpp_compiler : None
#************************************************************
#* MadGraph5_aMC@NLO *
#* *
#* * * *
#* * * * * *
#* * * * * 5 * * * * *
#* * * * * *
#* * * *
#* *
#* *
#* VERSION 2.5.5 2017-05-26 *
#* *
#* The MadGraph5_aMC@NLO Development Team - Find us at *
#* https://server06.fynu.ucl.ac.be/projects/madgraph *
#* *
#************************************************************
#* *
#* Command File for MadGraph5_aMC@NLO *
#* *
#* run as ./bin/mg5_aMC filename *
#* *
#************************************************************
set group_subprocesses Auto
set ignore_six_quark_processes False
set loop_optimized_output True
set loop_color_flows False
set gauge unitary
set complex_mass_scheme False
set max_npoint_for_channel 0
import model sm
define p = g u c d s u~ c~ d~ s~
define j = g u c d s u~ c~ d~ s~
define l+ = e+ mu+
define l- = e- mu-
define vl = ve vm vt
define vl~ = ve~ vm~ vt~
import model MSSM_SLHA2
define sq = ul ur dl dr cl cr sl sr
define sq~ = ul~ ur~ dl~ dr~ cl~ cr~ sl~ sr~
generate p p > sq sq~ j j
output pp2sqsqbarjj

The original error message said that the problem was encountered in one of the subprocesses:
...
INFO: Idle: 0, Running: 0, Completed: 9212 [ 5h 3m ]
Error when reading /frontend_node/work/eventfiles/madgraph/pp2sqsqbarjj/SubProcesses/P1_gg_ururxgg/G194/results.dat
Command "generate_events -f" interrupted with error:
ValueError : need more than 5 values to unpack
Please report this bug on https://bugs.launchpad.net/mg5amcnlo
More information is found in '/frontend_node/work/eventfiles/madgraph/pp2sqsqbarjj/run_01_tag_1_debug.log'.
Please attach this file to your report.
quit
...

, so I looked at the log file in the subprocess (log.txt) and found this:

....
Iteration 7 Mean: 0.2134E+06 Abs mean: 0.2134E+06 Fluctuation: 0.867E+05 0.104E+11 9.6%
  7 0.2134E+06 0.2134E+06 +- 0.8671E+05 145.40
 Relative summed weights:
  0.0000E+00 0.0000E+00 0.5785E+00 0.0000E+00
  0.0000E+00 0.0000E+00 0.4215E+00 0.0000E+00
 Relative number of events:
  0.0000E+00 0.0000E+00 0.5029E+00 0.0000E+00
  0.0000E+00 0.0000E+00 0.4971E+00 0.0000E+00
 Events:
           0 0 64341 0
           0 0 63605 0
 Accuracy: 1.174 0.010 0.116 102.774
 Found 3480 events.
 Wrote 3 events.
 Actual xsec 89672.488534070755
 Correct abs xsec 89672.488534103759
 Event xsec 89672.488534103759
 Events wgts > 1: 1
 % Cross section > 1: 3.5468016940285452E-003 3.9552841144579643E-006
 Iteration 8 Mean: 0.1264E+07 Abs mean: 0.1264E+07 Fluctuation: 0.643E+06 0.132E+12 9.7%
  8 0.1264E+07 0.1264E+07 +- 0.6433E+06 257.48
Thanks for using LHAPDF 6.1.6. Please make sure to cite the paper:
  Eur.Phys.J. C75 (2015) 3, 132 (http://arxiv.org/abs/1412.7420)

Program received signal SIGBUS: Access to an undefined portion of a memory object.

Backtrace for this error:
#0 0x2B9431B57B97
#1 0x2B9431B56D90
#2 0x2B94325E827F

....

But I don't really understand what that means. I hope someone can help me with this!

Thanks in advance!

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Ponf
Solved:
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

I would suggest to try with 2.6.5 instead. Since 2.5.5 is quite old, it is likely that such problem is resolved.

Cheers,

Olivier

PS: For mass scan, if you have some "close" mass points, you might be interesting in reweighting method.

Revision history for this message
Ponf (ponf) said :
#2

I did that and it fixed the problem but I got a new one now:

Error detected in "generate_events -f"
write debug file /w0/tmp/lsf_fp689533.47222962.1201/pp2sqsqbarjj_1201/run_01_tag_1_debug.log
If you need help with this issue please contact us on https://answers.launchpad.net/mg5amcnlo
MadGraph5Error : A compilation Error occurs when trying to compile /w0/tmp/lsf_fp689533.47222962.1201/pp2sqsqbarjj_1201/Source.
        The compilation fails with the following output message:
            ifort -O -w -fbounds-check -fPIC -extend-source -c -o combine_events.o combine_events.f
            ifort -O -w -fbounds-check -fPIC -extend-source -c -o alfas_functions.o alfas_functions.f
            ifort -o ../bin/internal/combine_events combine_events.o rw_events.o ranmar.o kin_functions.o open_file.o rw_routines.o alfas_functions.o setrun.o -L../lib/ -lmodel -lpdf -I/rwthfs/rz/cluster/home/fp689533/software/madgraph/MG5_aMC_v2_6_5/HEPTools/lhapdf6/include -I/usr/include -L/rwthfs/rz/cluster/home/fp689533/software/madgraph/MG5_aMC_v2_6_5/HEPTools/lhapdf6//lib -lLHAPDF -lLHAPDF -lstdc++ -lbias
            ld: Warning: size of symbol `bias_' changed from 12 in combine_events.o to 16 in ../lib//libbias.a(dummy.o)
            ../lib//libLHAPDF.a(AlphaS_ODE.o): In function `std::vector<std::pair<int, double>, std::allocator<std::pair<int, double> > >::_M_range_check(unsigned long) const':
            AlphaS_ODE.cc:(.text._ZNKSt6vectorISt4pairIidESaIS1_EE14_M_range_checkEm[_ZNKSt6vectorISt4pairIidESaIS1_EE14_M_range_checkEm]+0x48): undefined reference to `std::__throw_out_of_range_fmt(char const*, ...)'
            make: *** [../bin/internal/combine_events] Error 1

Do you know where this could come from?

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

This seems to be a lhapdf error
Did you use the same compiler for compiling that code?

Cheers
Olivier

Ps I am currently in Holliday so do not expect any fast answer for the moment

Revision history for this message
Ponf (ponf) said :
#4

Hey,

oh thanks for taking the time during a holiday!

But you were right. I changed some startup scripts and accidentally used the wrong compiler. It works just fine now!

Cheers