NLO run randomly gets stuck while loading LHAPDF
Dear authors,
We experience issues when running NLO processes with pythia8 shower, eg p p > e+ e- [QCD], on a MOAB cluster using gcc9.2. Madgraph seems to sometimes randomly get stuck after printing the line
INFO: Using LHAPDF v6.3.0 interface for PDFs
Out of 20 runs 7 finished just fine, the remaining runs got stuck and I cancelled them after waiting for 30min without anything happening. We found no pattern in which runs crash, it seems to be totally random.
The problem is that we have no idea about what is going wrong, because we get no error message, and also did not find any hints in the log files. For the failed runs no MCatNLO/
More information on the issue
- The same thing happens using other showers (tested Pythia6Q and Herwig6)
- On another cluster with PGS architecture things work fine after following exactly the same steps.
We are grateful for any ideas about what could cause this issue. Are there any log files that we are missing, or other ways to get more information from madgraph? What is madgraph supposed to be doing when the crash happens?
The full log is:
INFO: *******
* *
* W E L C O M E to M A D G R A P H 5 *
* a M C @ N L O *
* *
* * * *
* * * * * *
* * * * * 5 * * * * *
* * * * * *
* * * *
* *
* VERSION 5.3.5.1 *
* *
* The MadGraph5_aMC@NLO Development Team - Find us at *
* http://
* *
* Type 'help' for in-line help. *
* *
*******
INFO: load configuration from /work/ws/
INFO: load configuration from /work/ws/
INFO: load configuration from /work/ws/
No valid eps viewer found. Please set in ./input/
No valid web browser found. Please set in ./input/
launch -f
INFO: will run in mode: aMC@NLO
INFO: Starting run
Not able to open file /work/ws/
INFO: Compiling the code
INFO: Using built-in libraries for PDFs
INFO: Compiling source...
INFO: ...done, continuing with P* directories
INFO: Compiling directories...
INFO: Compiling on 40 cores
INFO: Compiling P0_uux_emep...
INFO: Compiling P0_ddx_emep...
INFO: Compiling P0_uxu_emep...
INFO: Compiling P0_dxd_emep...
INFO: P0_uux_emep done.
INFO: P0_ddx_emep done.
INFO: P0_uxu_emep done.
INFO: P0_dxd_emep done.
INFO: Checking test output:
INFO: P0_uux_emep
INFO: Result for test_ME:
INFO: Passed.
INFO: Result for test_MC:
INFO: Passed.
INFO: Result for check_poles:
INFO: Poles successfully cancel for 20 points over 20 (tolerance=1.0e-05)
INFO: P0_ddx_emep
INFO: Result for test_ME:
INFO: Passed.
INFO: Result for test_MC:
INFO: Passed.
INFO: Result for check_poles:
INFO: Poles successfully cancel for 20 points over 20 (tolerance=1.0e-05)
INFO: P0_uxu_emep
INFO: Result for test_ME:
INFO: Passed.
INFO: Result for test_MC:
INFO: Passed.
INFO: Result for check_poles:
INFO: Poles successfully cancel for 20 points over 20 (tolerance=1.0e-05)
INFO: P0_dxd_emep
INFO: Result for test_ME:
INFO: Passed.
INFO: Result for test_MC:
INFO: Passed.
INFO: Result for check_poles:
INFO: Poles successfully cancel for 20 points over 20 (tolerance=1.0e-05)
INFO: Starting run
INFO: Using 40 cores
INFO: Cleaning previous results
INFO: Doing NLO matched to parton shower
INFO: Setting up grids
INFO: Idle: 0, Running: 8, Completed: 0 [ current time: 10h47 ]
INFO: Idle: 0, Running: 7, Completed: 1 [ 2.7s ]
INFO: Idle: 0, Running: 6, Completed: 2 [ 3s ]
INFO: Idle: 0, Running: 5, Completed: 3 [ 3s ]
INFO: Idle: 0, Running: 4, Completed: 4 [ 3s ]
INFO: Idle: 0, Running: 3, Completed: 5 [ 5.4s ]
INFO: Idle: 0, Running: 2, Completed: 6 [ 5.5s ]
INFO: Idle: 0, Running: 1, Completed: 7 [ 5.5s ]
INFO: Idle: 0, Running: 0, Completed: 8 [ 5.5s ]
sum of cpu time of last step: 0 second
INFO: Determining the number of unweighted events per channel
Intermediate results:
Random seed: 34
Total cross section: 2.092e+03 +- 8.3e+00 pb
Total abs(cross section): 2.410e+03 +- 8.5e+00 pb
INFO: Computing upper envelope
INFO: Idle: 0, Running: 8, Completed: 0 [ current time: 10h47 ]
INFO: Idle: 0, Running: 7, Completed: 1 [ 6.3s ]
INFO: Idle: 0, Running: 6, Completed: 2 [ 6.4s ]
INFO: Idle: 0, Running: 5, Completed: 3 [ 6.5s ]
INFO: Idle: 0, Running: 4, Completed: 4 [ 6.6s ]
INFO: Idle: 0, Running: 3, Completed: 5 [ 12.1s ]
INFO: Idle: 0, Running: 2, Completed: 6 [ 12.2s ]
INFO: Idle: 0, Running: 1, Completed: 7 [ 12.2s ]
INFO: Idle: 0, Running: 0, Completed: 8 [ 12.4s ]
sum of cpu time of last step: 0 second
INFO: Updating the number of unweighted events per channel
Intermediate results:
Random seed: 34
Total cross section: 2.095e+03 +- 5.0e+00 pb
Total abs(cross section): 2.434e+03 +- 5.1e+00 pb
INFO: Generating events
INFO: Idle: 0, Running: 6, Completed: 2 [ current time: 10h47 ]
INFO: Idle: 0, Running: 5, Completed: 3 [ 1s ]
INFO: Idle: 0, Running: 4, Completed: 4 [ 1.1s ]
INFO: Idle: 0, Running: 3, Completed: 5 [ 3.3s ]
INFO: Idle: 0, Running: 2, Completed: 6 [ 3.6s ]
INFO: Idle: 0, Running: 1, Completed: 7 [ 3.8s ]
INFO: Idle: 0, Running: 0, Completed: 8 [ 3.9s ]
sum of cpu time of last step: 0 second
INFO: Doing reweight
INFO: Idle: 0, Running: 4, Completed: 4 [ current time: 10h47 ]
INFO: Idle: 0, Running: 3, Completed: 5 [ 0.12s ]
INFO: Idle: 0, Running: 2, Completed: 6 [ 0.16s ]
INFO: Idle: 0, Running: 1, Completed: 7 [ 0.22s ]
INFO: Idle: 0, Running: 0, Completed: 8 [ 0.24s ]
INFO: Collecting events
INFO:
----
Summary:
Process p p > e+ e- [QCD]
Run at p-p collider (6500.0 + 6500.0 GeV)
Number of events generated: 10000
Total cross section: 2.095e+03 +- 5.0e+00 pb
----
Scale variation (computed from LHE events):
----
INFO: The /work/ws/
INFO: Events generated
reweight -from_cards
decay_events -from_cards
INFO: Preparing MCatNLO run
INFO: Using LHAPDF v6.3.0 interface for PDFs
### At this point most of the runs get stuck. For the successful runs things continue:
INFO: Compiling MCatNLO for PYTHIA8...
INFO: ... done
INFO: Showering events...
INFO: (Running in /work/ws/
INFO: Idle: 1, Running: 0, Completed: 0 [ current time: 10h48 ]
INFO: Idle: 0, Running: 0, Completed: 1 [ 1m 31s ]
INFO: The file /work/ws/
It contains showered and hadronized events in the HEPMC format obtained by showering the parton-level event file /work/ws/
INFO: Run complete
INFO:
quit
INFO:
The files in the Events/run_0N directory are (for a successful run):
alllogs_0.html
alllogs_1.html
alllogs_2.html
events.lhe.gz
events_
res_0.txt
res_1.txt
res_2.txt
run_02_
RunMaterial.tar.gz
summary.txt
Failed runs have the same files with the same contents, but no events_
Best,
Jonas Spinner
Question information
- Language:
- English Edit question
- Status:
- Answered
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask Jonas Spinner for more information if necessary.