Cannot reconstruct saved job status in .../job_status.pkl

Asked by Matthieu Marinangeli

Dear Experts,

I'm experiencing some problems with aMC@NLO running on macOs. I get at some points en error message saying

########

Error detected in "launch --parton --nocompile --only_generation --force --name run"
write debug file /Users/matthieumarinangeli/pythia8226/examples/amcatnlorun/run_tag_1_debug.log
If you need help with this issue please contact us on https://answers.launchpad.net/mg5amcnlo
aMCatNLOError : Cannot reconstruct saved job status in /Users/matthieumarinangeli/pythia8226/examples/amcatnlorun/SubProcesses/job_status.pkl

########

the content of run_tag_1_debug.log is

########

launch --parton --nocompile --only_generation --force --name run
Traceback (most recent call last):
  File "/Users/matthieumarinangeli/pythia8226/examples/amcatnlorun/bin/internal/extended_cmd.py", line 1430, in onecmd
    return self.onecmd_orig(line, **opt)
  File "/Users/matthieumarinangeli/pythia8226/examples/amcatnlorun/bin/internal/extended_cmd.py", line 1384, in onecmd_orig
    return func(arg, **opt)
  File "/Users/matthieumarinangeli/pythia8226/examples/amcatnlorun/bin/internal/amcatnlo_run_interface.py", line 1227, in do_launch
    evt_file = self.run(mode, options)
  File "/Users/matthieumarinangeli/pythia8226/examples/amcatnlorun/bin/internal/amcatnlo_run_interface.py", line 1439, in run
    req_acc,mode_dict[mode],1,mode,fixed_order=False)
  File "/Users/matthieumarinangeli/pythia8226/examples/amcatnlorun/bin/internal/amcatnlo_run_interface.py", line 1545, in create_jobs_to_run
    pjoin(self.me_dir,'SubProcesses','job_status.pkl'))
aMCatNLOError: Cannot reconstruct saved job status in /Users/matthieumarinangeli/pythia8226/examples/amcatnlorun/SubProcesses/job_status.pkl
Value of current Options:
              text_editor : None
              web_browser : None
        cluster_temp_path : None
                  timeout : 60
       cluster_local_path : None
            cluster_queue : None
         madanalysis_path : None
                   lhapdf : lhapdf-config
             cluster_size : 100
           cluster_memory : None
    cluster_status_update : (600, 30)
             cluster_time : None
            f2py_compiler : None
               hepmc_path : None
             pythia8_path : None
                hwpp_path : None
   automatic_html_opening : False
       cluster_retry_wait : 300
             stdout_level : None
          pythia-pgs_path : None
                 mg5_path : None
                  td_path : None
             delphes_path : None
              thepeg_path : None
             cluster_type : condor
        madanalysis5_path : None
      exrootanalysis_path : None
         fortran_compiler : None
                  nb_core : 4
              auto_update : 0
         cluster_nb_retry : 1
               eps_viewer : None
             syscalc_path : None
             cpp_compiler : None
      notification_center : True
                 run_mode : 2

Any ideas how to fix this?

Matthieu

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
marco zaro Edit question
Solved by:
Matthieu Marinangeli
Solved:
Last query:
Last reply:
Revision history for this message
marco zaro (marco-zaro) said :
#1

Hi Matthieu,
what process are you looking at?
what is the sequence of commands that gives this error?

Cheers,
Marco

Revision history for this message
Matthieu Marinangeli (marinang) said :
#2

 I'm trying to run the example script main34.py of the Pythia8 package which use madgraph 2.3.3. The process generated is
 p p > mu+ mu- [QCD].

But actually I stopped trying to use my Mac, because even for p p > mu+ mu- it doesn't work (problem when loading dynamic libraries if remember) for this and I'm using my cluster. See https://answers.launchpad.net/mg5amcnlo/+question/649293

Revision history for this message
marco zaro (marco-zaro) said :
#3

Hi Matt,
I don't understand. Shall we close this question and move to
https://answers.launchpad.net/mg5amcnlo/+question/649293 ?
I was not aware that inside pythia there are some scripts that involve
mg5_aMC...
anyway, you may want to try the 'tutorial' and 'tutorial NLO' commands of
MG5_AMC, which are also helpful to check the system configuration

Cheers,

Marco

Marco Zaro

On Wed, Jul 19, 2017 at 11:27 AM, Matthieu Marinangeli <
<email address hidden>> wrote:

> Question #649181 on MadGraph5_aMC@NLO changed:
> https://answers.launchpad.net/mg5amcnlo/+question/649181
>
> Matthieu Marinangeli posted a new comment:
> I'm trying to run the example script main34.py of the Pythia8 package
> which use madgraph 2.3.3. The process generated is
> p p > mu+ mu- [QCD].
>
> But actually I stopped trying to use my Mac, because even for p p > mu+
> mu- it doesn't work (problem when loading dynamic libraries if remember)
> for this and I'm using my cluster. See
> https://answers.launchpad.net/mg5amcnlo/+question/649293
>
> --
> You received this question notification because you are subscribed to
> the question.
>

Revision history for this message
Matthieu Marinangeli (marinang) said :
#4

Hi Marco,

Yes I will focus on this "working" framework. Yes I did the tutorial.

Revision history for this message
Josh McFayden (mcfayden) said :
#5

I am seeing the same problem today. Somehow I did not see it previously but it's reproducible from a simple setup on lxplus:
export PATH=/afs/cern.ch/sw/lcg/external/gcc/4.7.0/x86_64-slc6-gcc47-opt/bin:${PATH}
export LD_LIBRARY_PATH=/afs/cern.ch/sw/lcg/external/gcc/4.7.0/x86_64-slc6-gcc47-opt/lib64:${LD_LIBRARY_PATH}
tar xvzf /afs/cern.ch/user/m/mcfayden/public/MG5_aMC_v2.5.5.tar.gz
mkdir madspin_nlo_gridpack/
cd madspin_nlo_gridpack/
cp /afs/cern.ch/user/m/mcfayden/public/proc_card_mg5.dat .
../MG5_aMC_v2_5_5/bin/mg5_aMC proc_card_mg5.dat
sed -i 's/10000 = nevents/0 = nevents/g' test/Cards/run_card.dat
sed -i 's/-1.0 = req_acc/0.01 = req_acc/g' test/Cards/run_card.dat
./test/bin/generate_events -f -p
sed -i 's/0 = nevents/10000 = nevents/g' test/Cards/run_card.dat
./test/bin/generate_events -f -p --nocompile --only_generation

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#6

Hi Josh,

Looks like Marco is still not operational since the time he is back from holiday.

I see two solutions for this (both are working for me):
1) do not request 0 event but 1 (the number does not matter actually) when generating the gridpack
2) apply the following patch:
=== modified file 'madgraph/interface/amcatnlo_run_interface.py'
--- madgraph/interface/amcatnlo_run_interface.py 2017-07-29 08:02:08 +0000
+++ madgraph/interface/amcatnlo_run_interface.py 2017-08-05 18:32:53 +0000
@@ -1457,11 +1458,12 @@
                 self.update_status(status, level='parton')
                 self.run_all_jobs(jobs_to_run,mint_step,fixed_order=False)
                 self.collect_log_files(jobs_to_run,mint_step)
+ jobs_to_run,jobs_to_collect=self.collect_the_results(options,req_acc,jobs_to_run, \
+ jobs_to_collect,mint_step,mode,mode_dict[mode],fixed_order=False)
                 if mint_step+1==2 and nevents==0:
                     self.print_summary(options,2,mode)
                     return
- jobs_to_run,jobs_to_collect=self.collect_the_results(options,req_acc,jobs_to_run, \
- jobs_to_collect,mint_step,mode,mode_dict[mode],fixed_order=False)
+
             # Sanity check on the event files. If error the jobs are resubmitted
             self.check_event_files(jobs_to_collect)

The above lines were not touched since 2015, (version 2.3.3) so did you change your way to generate the gridpack?
Is it recent that you ask for 0 event during that stage?

Cheers,

Olivier

Revision history for this message
Josh McFayden (mcfayden) said :
#7

Hi Olivier,

Thanks a lot. I can confirm that setting nevents=1 solves the problem.

We always set nevents=0 for ATLAS samples, but we had been working with 2.3.3 as the default version for a long time, so that's what's used for a most samples. Only more recently (or for special cases) have updated to 2.4.X and 2.5.X.

Best,

Josh

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#8

Ok perfect make sense then.

I have pushed the patch such that it would be again possible to run with 0 events in the next version.
(should be released by next week)

Cheers,

Olivier