Madgraph on a condor cluster and pythia8

Asked by Daniel Egana-Ugrinovic on 2017-08-28

Hi,

I am trying to run Madgraph on a Condor cluster (CondorVersion: 8.6.5 Aug 01 2017 BuildID: 411560), using MG5_aMC_v2_6_0 and Pythia8. To do it, I set the default running mode to 1 in the me5_configuration.txt file, and I run the events with generate_events --cluster.

Everything seems to work fine, up to the point that Pythia8 starts working. At that point, some of the pythia jobs on the cluster are set on hold and the simulation fails. I checked the reason why the cluster is putting the jobs on hold, and I get that for three jobs, the corresponding nodes are not finding the file run_PY8.sh:

HoldReason = "Error from <email address hidden>: Failed to execute '/export/scratch2/danielegana/MG5_aMC_v2_6_0/bin/alignedLHC3j/Events/run_yL_0.5_DM_2_MFc_70_Mp_70_Mphi_110/PY8_parallelization/run_PY8.sh' with arguments 47: (errno=2: 'No such file or directory')"

Not all the jobs get this error, some of the pythia jobs seem to run successfully. If I check the PY8_parallelization folder, I see that there are 100 "split" subfolders (split_0 to split_99). Most of them have healthy events.hepmc files, but three "split" subfolders have defective events.hepmc files which contain no info at all. So the number of "split" subfolders which do not contain good hepmc files coincides with the number of jobs that the cluster is setting on hold because it did not find the run_PY8.sh file. Do you have an idea why this could be happening? Let me also add that when running madgraph on single-machine mode, everything works fine.

On a different question, is there a way to control in how many jobs do I want to split the task? My system administrator pointed out that too many short jobs with the script survey.sh are being sent to the cluster. Since the jobs are short, they spend little CPU time, and most of their time is just spent in waiting in line to transfer data back to my home folder.

Thanks a lot!

Daniel

Question information

Language:
English Edit question
Status:
Open
For:
MadGraph5_aMC@NLO Edit question
Assignee:
Valentin Hirschi Edit question
Last query:
2017-08-28
Last reply:

Can you help with this problem?

Provide an answer of your own, or ask Daniel Egana-Ugrinovic for more information if necessary.

To post a message you must log in.