condor and mg5_amcnlo

Asked by lorenzo marafatto

Hi all
I have a command like this
./bin/mg5_aMC script.txt
which works and produces run directory

I'd like to know how to submit this using htcondor
many thanks

Question information

Language:
English Edit question
Status:
Open
For:
MadGraph5_aMC@NLO Edit question
Assignee:
marco zaro Edit question
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

We do have internal support for cluster including condor.
In the following FAQ, you will learn how to edit the handling of condor if your sysadmin has configured the cluster in a way which is not compatible with the default.

The way it works is that a job controller is launched in the machine where you run the executable and that controller will then submitted job on the cluster.

A second solution is to use the normal method of condor submition (so writting a submition script by hand) requesting one full node and ask MG5aMC to run in multi-core mode. In that case you have full flexibility since you are writting the submition script manually (and therefore can follows the instruction of your sys-admin)

You can also combine the two method
- setup madgraph in cluster mode (so the main script will ask condor to run various task on the cluster)
- submit the controller on the cluster via a submition script (then in that case, you have to request a single core for that computation and put nb_core=1 to avoid that compilation is done on multiple core if the compilation of the code is done within your script.

So the important value that you have to setup in the file
input/mg5_configuration.txt
is run_mode (0=single core, 1= cluster, 2= multicore)
nb_core (keep it to 0 to use full node capability, set it to one if you submit the main script on a single thread)
and obviously all the cluster_xxx parameter that are cluster specific
# cluster_type = condor
# cluster_queue = madgraph
# cluster_size = 150

Here is the FAQ:
FAQ #2249: “How to add a cluster support / edit the way jobs are submitted on a supported cluster”.

Revision history for this message
lorenzo marafatto (lmaraf) said :
#2

Hi Olivier

I tried the solution
but the job goes idle and then exits without being running

these are my script and condor files

1) mg5_script.xtx

launch eewpwm_qed_new -f
launch eewpwm_qed_new_2 -f
launch eewpwm_qed_new_3 -f
launch eewpwm_qed_new_LL -f
launch eewpwm_qed_new_LL_2 -f
launch eewpwm_qed_new_LL_3 -f

2) execute.sh

#!/bin/bash
./bin/mg5_aMC mg5_script.txt

3) sumbit.sub (for condor)

executable  = execute.sh
transfer_input_files = mg5_script.txt
universe    = vanilla
output      = output.out
error       = error.err
log         = log.log
should_transfer_files = YES
queue

have you any more suggestions?
many thanks again
Lorenzo Marafatto

Il giovedì 19 gennaio 2023 21:50:37 CET, Olivier Mattelaer <email address hidden> ha scritto:

Your question #704470 on MadGraph5_aMC@NLO changed:
https://answers.launchpad.net/mg5amcnlo/+question/704470

    Status: Open => Answered

    Related FAQ set to:
    How to add a cluster support / edit the way jobs are submitted on a supported cluster
    https://answers.launchpad.net/mg5amcnlo/+faq/2249

Olivier Mattelaer proposed the following answer:
We do have internal support for cluster including condor.
In the following FAQ, you will learn how to edit the handling of condor if your sysadmin has configured the cluster in a way which is not compatible with the default.

The way it works is that a job controller is launched in the machine
where you run the executable and that controller will then submitted job
on the cluster.

A second solution is to use the normal method of condor submition (so
writting a submition script by hand)  requesting one full node and ask
MG5aMC to run in multi-core mode. In that case you have full flexibility
since you are writting the submition script manually (and therefore can
follows the instruction of your sys-admin)

You can also combine the two method
- setup madgraph in cluster mode (so the main script will ask condor to run various task on the cluster)
- submit the controller on the cluster via a submition script (then in that case, you have to request a single core for that computation and put nb_core=1 to avoid that compilation is done on multiple core if the compilation of the code is done within your script.

So the important value that you have to setup in the file
input/mg5_configuration.txt
is run_mode (0=single core, 1= cluster, 2= multicore)
nb_core (keep it to 0 to use full node capability, set it to one if you submit the main script on a single thread)
and obviously all the cluster_xxx parameter  that are cluster specific
# cluster_type = condor
# cluster_queue = madgraph
# cluster_size = 150

Here is the FAQ:
FAQ #2249: “How to add a cluster support / edit the way jobs are submitted on a supported cluster”.

--
If this answers your question, please go to the following page to let us
know that it is solved:
https://answers.launchpad.net/mg5amcnlo/+question/704470/+confirm?answer_id=0

If you still need help, you can reply to this email or go to the
following page to enter your feedback:
https://answers.launchpad.net/mg5amcnlo/+question/704470

You received this question notification because you asked the question.

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

Hi,

Sorry, I do not have a condor cluster anymore. In top of that it can depend of the cluster configuration, so even if your condor script is fine for a given cluster it might not be for another. My suggestion here is to check the user documentation of your cluster and check the reason why your job is not accepted and/or cancelled.

Cheers,

Olivier

Revision history for this message
lorenzo marafatto (lmaraf) said :
#4

Hi Olivier

I have these files

"mg5_script.txt"

launch eewpwm_qed_new -f
launch eewpwm_qed_new_2 -f
launch eewpwm_qed_new_3 -f
launch eewpwm_qed_new_LL -f
launch eewpwm_qed_new_LL_2 -f
launch eewpwm_qed_new_LL_3 -f

"execute.sh"

#!/bin/bash
#Set python version > 3.6
source /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc11-opt/setup.sh
#execute madgrpah-lepcoll
/afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll/bin/mg5_aMC /afs/cern.ch/us
er/l/lmarafat/mg5amcnlo-3.0.1-lepcoll/mg5_script.txt

"submit.sub"

executable  = execute.sh
universe    = vanilla
output      = output.out
error       = error.err
log         = log.log
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
queue

the process goes into held but doesn't go running
CERN help desk told me that
"The issue may be due the use of a symlink."

what could I do?
many thanks again
Lorenzo Marafatto

Il venerdì 20 gennaio 2023 11:50:40 CET, Olivier Mattelaer <email address hidden> ha scritto:

Your question #704470 on MadGraph5_aMC@NLO changed:
https://answers.launchpad.net/mg5amcnlo/+question/704470

    Status: Open => Answered

Olivier Mattelaer proposed the following answer:
Hi,

Sorry, I do not have a condor cluster anymore. In top of that it can
depend of the cluster configuration, so even if your condor script is
fine for a given cluster it might not be for another. My suggestion here
is to check the user documentation of your cluster and check the reason
why your job is not accepted and/or cancelled.

Cheers,

Olivier

--
If this answers your question, please go to the following page to let us
know that it is solved:
https://answers.launchpad.net/mg5amcnlo/+question/704470/+confirm?answer_id=2

If you still need help, you can reply to this email or go to the
following page to enter your feedback:
https://answers.launchpad.net/mg5amcnlo/+question/704470

You received this question notification because you asked the question.

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#5

Which file is a symlink?

Independently, which of the three strategies above are you trying to use?

Looks like you are trying to use the strategy #2, correct?
If yes, how did you setup the cluster information within input/mg5_configuration.txt ?
In that case, you should first check that MG5aMC internal condor submission script is compatible with your cluster.
so you should be able to run as

source /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc11-opt/setup.sh
#execute madgrpah-lepcoll
/afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll/bin/mg5_aMC /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll/mg5_script.txt
in the terminal and check in another terminal that the heavy computation is indeed submitted to the cluster.
(so in essence checking my #1 option first)

If it is #3, how did you specify the number of core in your input/mg5_configuration.txt? and I do not see an equivalent setup for the number of thread to use in your condor script. Is this missing? or are you using a single thread? or something else that I miss for that option?

In all cases, how do you specify the queue on which you sumit your job?
It seems missing in your submission script no? (maybe your cluster has a default value for it, do not know if this is possible).

Cheers,

Olivier

Revision history for this message
lorenzo marafatto (lmaraf) said :
#6

Hi
this is mg5_configuration.txt

################################################################################
#
# Copyright (c) 2009 The MadGraph5_aMC@NLO Development team and Contributors
#
# This file is a part of the MadGraph5_aMC@NLO project, an application which
# automatically generates Feynman diagrams and matrix elements for arbitrary
# high-energy processes in the Standard Model and beyond.
#
# It is subject to the MadGraph5_aMC@NLO license which should accompany this
# distribution.
#
# For more information, visit madgraph.phys.ucl.ac.be and amcatnlo.web.cern.ch
#
################################################################################
#
# This File contains some configuration variable for MadGraph/MadEvent
#
# Line starting by #! are comment and should remain commented
# Line starting with # should be uncommented if you want to modify the default
#    value.
# Current value for all options can seen by typing "display options"
#    after either ./bin/mg5_aMC or ./bin/madevent
#
# You can place this files in ~/.mg5/mg5_configuration.txt if you have more than
#    one version of MG5.
#
################################################################################

#! Allow/Refuse syntax that changed meaning in version 3.1 of the code
#! (Compare to 3.0, 3.1 is back to the meaning of 2.x branch)
#!
# acknowledged_v3.1_syntax = False

#! Prefered Fortran Compiler
#! If None: try to find g77 or gfortran on the system
#!
# fortran_compiler = None
# f2py_compiler_py2 = None
# f2py_compiler_py3 = None

#! Prefered C++ Compiler
#! If None: try to find g++ or clang on the system
#!
# cpp_compiler = None

#! Prefered Text Editor
#!  Default: use the shell default Editor
#!           or try to find one available on the system
#!  Be careful: Only shell based editor are allowed
# text_editor = None

#! Prefered WebBrower
#! If None: try to find one available on the system
# web_browser = None

#! Prefered PS viewer
#!  If None: try to find one available on the system
# eps_viewer = None

#! Time allowed to answer question (if no answer takes default value)
#!  0: No time limit
# timeout = 60

#! Pythia8 path.
#!  Defines the path to the pythia8 installation directory (i.e. the
#!  on containing the lib, bin and include directories) .
#!  If using a relative path, that starts from the mg5 directory
# pythia8_path = ./HEPTools/pythia8

#! MG5aMC_PY8_interface path
#!  Defines the path of the C++ driver file that is used by MG5_aMC to
#!  steer the Pythia8 shower.
#!  Can be installed directly from within MG5_aMC with the following command:
#!     MG5_aMC> install mg5amc_py8_interface
# mg5amc_py8_interface_path = ./HEPTools/MG5aMC_PY8_interface

#! Herwig++/Herwig7 paths
#!  specify here the paths also to HepMC ant ThePEG
#!  define the path to the herwig++, thepeg and hepmc directories.
#!  paths can be absolute or relative from mg5 directory
#!  WARNING: if Herwig7 has been installed with the bootstrap script,
#!  then please set thepeg_path and hepmc_path to the same value as
#!  hwpp_path
# hwpp_path =
# thepeg_path =
# hepmc_path =

#! Control when MG5 checks if he is up-to-date.
#! Enter the number of day between two check (0 means never)
#! A question is always asked before any update
# auto_update = 7

################################################################################
#  INFO FOR MADEVENT / aMC@NLO
################################################################################
# If this file is in a MADEVENT Template. 'main directory' is the directory
# containing the SubProcesses directory. Otherwise this is the MadGraph5_aMC@NLO
 main
# directory (containing the directories madgraph and Template)

#! Allow/Forbid the automatic opening of the web browser  (on the status page)
#!  when launching MadEvent [True/False]
# automatic_html_opening = True
#! allow notification of finished job in the notification center (Mac Only)
# notification_center = True

#! Default Running mode
#!  0: single machine/ 1: cluster / 2: multicore
# run_mode = 1

#! Cluster Type [pbs|sge|condor|lsf|ge|slurm|htcaas|htcaas2] Use for cluster run
 only
#!  And cluster queue (or partition for slurm)
#!  And size of the cluster (some part of the code can adapt splitting according
ly)
# cluster_type = condor
# cluster_queue = madgraph
# cluster_size = 150

#! Path to a node directory to avoid direct writing on the central disk
#!  Note that condor clusters avoid direct writing by default (therefore this
#!  options does not affect condor clusters)
# cluster_temp_path = None

#! path to a node directory where local file can be found (typically pdf)
#! to avoid to send them to the node (if cluster_temp_path is on True or condor)
# cluster_local_path =  None # example: /cvmfs/cp3.uclouvain.be/madgraph/

#! Cluster waiting time for status update
#!  First number is when the number of waiting job is higher than the number
#!  of running one (time in second). The second number is in the second case.
# cluster_status_update = 600 30

#! How to deal with failed submission (can occurs on cluster mode)
#!  0: crash, -1: print error, hangs the program up to manual instructions, N(>0
) retry up to N times.
# cluster_nb_retry = 1

#! How much time to wait for the output file before resubmission/crash (filesyst
em can be very slow)
# cluster_retry_wait = 300

#! Nb_core to use (None = all) This is use only for multicore run
#!  This correspond also to the number core used for code compilation for cluste
r mode
# nb_core = None

#! Pythia-PGS Package
#!  relative path start from main directory
# pythia-pgs_path = ./pythia-pgs

#! Delphes Package
#!  relative path start from main directory
# delphes_path = ./Delphes

#! MadAnalysis4 fortran-based package [for basic analysis]
#!  relative path start from main directory
# madanalysis_path = ./MadAnalysis

#! MadAnalysis5 python-based Package [For advanced analysis]
#!  relative path start from main directory
# madanalysis5_path = ./HEPTools/madanalysis5/madanalysis5

#! ExRootAnalysis Package
#!  relative path start from main directory
# exrootanalysis_path = ./ExRootAnalysis

#! TOPDRAWER PATH
#!  Path to the directory containing td executables
#!  relative path start from main directory
# td_path = ./td

#! lhapdf-config --can be specify differently depending of your python version
#!  If None: try to find one available on the system
# lhapdf_py2 = lhapdf-config
# lhapdf_py3 = lhapdf-config

#! fastjet-config
#!  If None: try to find one available on the system
# fastjet = fastjet-config

#! eMELA-config
#!  If None: try to find one available on the system
# eMELA = eMELA-config

#! MCatNLO-utilities
#!  relative path starting from main directory
# MCatNLO-utilities_path = ./MCatNLO-utilities

#! Set what OLP to use for the loop ME generation
# OLP = MadLoop

#! Set the PJFRy++ directory containing pjfry's library
#! if auto: try to find it automatically on the system (default)
#! if '' or None: disabling pjfry
#! if pjfry=/PATH/TO/pjfry/lib: use that specific installation path for PJFry++
# pjfry = auto

#! Set the Golem95 directory containing golem's library
#! It only supports version higher than 1.3.0
#! if auto: try to find it automatically on the system (default)
#! if '' or None: disabling Golem95
#! if golem=/PATH/TO/golem/lib: use that speficif installation path for Golem95
golem = None #

#! Set the samurai directory containing samurai's library
#! It only supports version higher than 2.0.0
#! if auto: try to find it automatically on the system (default)
#! if '' or None: disabling samurai
#! if samurai=/PATH/TO/samurai/lib: use that specific installation path for samu
rai
# samurai = None

#! Set the Ninja directory containing ninja's library
#! if '' or None: disabling ninja
#! if ninja=/PATH/TO/ninja/lib: use that specific installation path for ninja
ninja = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll/HEPTools/lib #

#! Set the COLLIER directory containing COLLIER's library
#! if '' or None: disabling COLLIER
#! if ninja=/PATH/TO/ninja/lib: use that specific installation path for COLLIER
# Note that it is necessary that you have generated a static library for COLLIER
collier = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll/HEPTools/lib #

#! Set how MadLoop dependencies (such as CutTools) should be handled
#!  > external : ML5 places a link to the MG5_aMC-wide libraries
#!  > internal : ML5 copies all dependencies in the output so that it is indepen
dent
#!  > environment_paths : ML5 searches for the dependencies in your environment
path
# output_dependencies = external

#! SysCalc PATH
#! Path to the directory containing syscalc executables
#! relative path start from main directory
# syscalc_path = ./SysCalc

#! Absolute paths to the config script in the bin directory of PineAPPL
#! (to generate PDF-independent fast-interpolation grids).
# pineappl = pineappl

auto_convert_model = True #

# MG5 MAIN DIRECTORY
mg5_path = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll

# MG5 MAIN DIRECTORY
mg5_path = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll

# MG5 MAIN DIRECTORY
mg5_path = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll

# MG5 MAIN DIRECTORY
mg5_path = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll

# MG5 MAIN DIRECTORY
mg5_path = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll

# MG5 MAIN DIRECTORY
mg5_path = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll

many thanks
Lorenzo

Il martedì 24 gennaio 2023 11:40:49 CET, Olivier Mattelaer <email address hidden> ha scritto:

Your question #704470 on MadGraph5_aMC@NLO changed:
https://answers.launchpad.net/mg5amcnlo/+question/704470

    Status: Open => Answered

Olivier Mattelaer proposed the following answer:
Which file is a symlink?

Independently, which of the three strategies above are you trying to
use?

Looks like you are trying to use the strategy #2, correct?
If yes, how did you setup the cluster information within input/mg5_configuration.txt ?
In that case, you should first check that MG5aMC internal condor submission script is compatible with your cluster.
so you should be able to run as

source /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc11-opt/setup.sh
#execute madgrpah-lepcoll
/afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll/bin/mg5_aMC /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll/mg5_script.txt
in the terminal and check in another terminal that the heavy computation is indeed submitted to the cluster.
(so in essence checking my #1 option first)

If it is #3, how did you specify the number of core in your
input/mg5_configuration.txt? and I do not see an equivalent setup for
the number of thread to use in your condor script. Is this missing? or
are you using a single thread? or something else that I miss for that
option?

In all cases, how do you specify the queue on which you sumit your job?
It seems missing in your submission script no? (maybe your cluster has a default value for it, do not know if this is possible).

Cheers,

Olivier

--
If this answers your question, please go to the following page to let us
know that it is solved:
https://answers.launchpad.net/mg5amcnlo/+question/704470/+confirm?answer_id=4

If you still need help, you can reply to this email or go to the
following page to enter your feedback:
https://answers.launchpad.net/mg5amcnlo/+question/704470

You received this question notification because you asked the question.

Revision history for this message
lorenzo marafatto (lmaraf) said :
#7

Hi again
Does the code mg5_coll create any symlink when it runs?
thanks once more...

Il martedì 24 gennaio 2023 12:40:43 CET, lorenzo marafatto <email address hidden> ha scritto:

Your question #704470 on MadGraph5_aMC@NLO changed:
https://answers.launchpad.net/mg5amcnlo/+question/704470

    Status: Answered => Open

You are still having a problem:
Hi
this is mg5_configuration.txt

################################################################################
#
# Copyright (c) 2009 The MadGraph5_aMC@NLO Development team and Contributors
#
# This file is a part of the MadGraph5_aMC@NLO project, an application which
# automatically generates Feynman diagrams and matrix elements for arbitrary
# high-energy processes in the Standard Model and beyond.
#
# It is subject to the MadGraph5_aMC@NLO license which should accompany this
# distribution.
#
# For more information, visit madgraph.phys.ucl.ac.be and amcatnlo.web.cern.ch
#
################################################################################
#
# This File contains some configuration variable for MadGraph/MadEvent
#
# Line starting by #! are comment and should remain commented
# Line starting with # should be uncommented if you want to modify the default
#    value.
# Current value for all options can seen by typing "display options"
#    after either ./bin/mg5_aMC or ./bin/madevent
#
# You can place this files in ~/.mg5/mg5_configuration.txt if you have more than
#    one version of MG5.
#
################################################################################

#! Allow/Refuse syntax that changed meaning in version 3.1 of the code
#! (Compare to 3.0, 3.1 is back to the meaning of 2.x branch)
#!
# acknowledged_v3.1_syntax = False

#! Prefered Fortran Compiler
#! If None: try to find g77 or gfortran on the system
#!
# fortran_compiler = None
# f2py_compiler_py2 = None
# f2py_compiler_py3 = None

#! Prefered C++ Compiler
#! If None: try to find g++ or clang on the system
#!
# cpp_compiler = None

#! Prefered Text Editor
#!  Default: use the shell default Editor
#!           or try to find one available on the system
#!  Be careful: Only shell based editor are allowed
# text_editor = None

#! Prefered WebBrower
#! If None: try to find one available on the system
# web_browser = None

#! Prefered PS viewer
#!  If None: try to find one available on the system
# eps_viewer = None

#! Time allowed to answer question (if no answer takes default value)
#!  0: No time limit
# timeout = 60

#! Pythia8 path.
#!  Defines the path to the pythia8 installation directory (i.e. the
#!  on containing the lib, bin and include directories) .
#!  If using a relative path, that starts from the mg5 directory
# pythia8_path = ./HEPTools/pythia8

#! MG5aMC_PY8_interface path
#!  Defines the path of the C++ driver file that is used by MG5_aMC to
#!  steer the Pythia8 shower.
#!  Can be installed directly from within MG5_aMC with the following command:
#!     MG5_aMC> install mg5amc_py8_interface
# mg5amc_py8_interface_path = ./HEPTools/MG5aMC_PY8_interface

#! Herwig++/Herwig7 paths
#!  specify here the paths also to HepMC ant ThePEG
#!  define the path to the herwig++, thepeg and hepmc directories.
#!  paths can be absolute or relative from mg5 directory
#!  WARNING: if Herwig7 has been installed with the bootstrap script,
#!  then please set thepeg_path and hepmc_path to the same value as
#!  hwpp_path
# hwpp_path =
# thepeg_path =
# hepmc_path =

#! Control when MG5 checks if he is up-to-date.
#! Enter the number of day between two check (0 means never)
#! A question is always asked before any update
# auto_update = 7

################################################################################
#  INFO FOR MADEVENT / aMC@NLO
################################################################################
# If this file is in a MADEVENT Template. 'main directory' is the directory
# containing the SubProcesses directory. Otherwise this is the MadGraph5_aMC@NLO
 main
# directory (containing the directories madgraph and Template)

#! Allow/Forbid the automatic opening of the web browser  (on the status page)
#!  when launching MadEvent [True/False]
# automatic_html_opening = True
#! allow notification of finished job in the notification center (Mac Only)
# notification_center = True

#! Default Running mode
#!  0: single machine/ 1: cluster / 2: multicore
# run_mode = 1

#! Cluster Type [pbs|sge|condor|lsf|ge|slurm|htcaas|htcaas2] Use for cluster run
 only
#!  And cluster queue (or partition for slurm)
#!  And size of the cluster (some part of the code can adapt splitting according
ly)
# cluster_type = condor
# cluster_queue = madgraph
# cluster_size = 150

#! Path to a node directory to avoid direct writing on the central disk
#!  Note that condor clusters avoid direct writing by default (therefore this
#!  options does not affect condor clusters)
# cluster_temp_path = None

#! path to a node directory where local file can be found (typically pdf)
#! to avoid to send them to the node (if cluster_temp_path is on True or condor)
# cluster_local_path =  None # example: /cvmfs/cp3.uclouvain.be/madgraph/

#! Cluster waiting time for status update
#!  First number is when the number of waiting job is higher than the number
#!  of running one (time in second). The second number is in the second case.
# cluster_status_update = 600 30

#! How to deal with failed submission (can occurs on cluster mode)
#!  0: crash, -1: print error, hangs the program up to manual instructions, N(>0
) retry up to N times.
# cluster_nb_retry = 1

#! How much time to wait for the output file before resubmission/crash (filesyst
em can be very slow)
# cluster_retry_wait = 300

#! Nb_core to use (None = all) This is use only for multicore run
#!  This correspond also to the number core used for code compilation for cluste
r mode
# nb_core = None

#! Pythia-PGS Package
#!  relative path start from main directory
# pythia-pgs_path = ./pythia-pgs

#! Delphes Package
#!  relative path start from main directory
# delphes_path = ./Delphes

#! MadAnalysis4 fortran-based package [for basic analysis]
#!  relative path start from main directory
# madanalysis_path = ./MadAnalysis

#! MadAnalysis5 python-based Package [For advanced analysis]
#!  relative path start from main directory
# madanalysis5_path = ./HEPTools/madanalysis5/madanalysis5

#! ExRootAnalysis Package
#!  relative path start from main directory
# exrootanalysis_path = ./ExRootAnalysis

#! TOPDRAWER PATH
#!  Path to the directory containing td executables
#!  relative path start from main directory
# td_path = ./td

#! lhapdf-config --can be specify differently depending of your python version
#!  If None: try to find one available on the system
# lhapdf_py2 = lhapdf-config
# lhapdf_py3 = lhapdf-config

#! fastjet-config
#!  If None: try to find one available on the system
# fastjet = fastjet-config

#! eMELA-config
#!  If None: try to find one available on the system
# eMELA = eMELA-config

#! MCatNLO-utilities
#!  relative path starting from main directory
# MCatNLO-utilities_path = ./MCatNLO-utilities

#! Set what OLP to use for the loop ME generation
# OLP = MadLoop

#! Set the PJFRy++ directory containing pjfry's library
#! if auto: try to find it automatically on the system (default)
#! if '' or None: disabling pjfry
#! if pjfry=/PATH/TO/pjfry/lib: use that specific installation path for PJFry++
# pjfry = auto

#! Set the Golem95 directory containing golem's library
#! It only supports version higher than 1.3.0
#! if auto: try to find it automatically on the system (default)
#! if '' or None: disabling Golem95
#! if golem=/PATH/TO/golem/lib: use that speficif installation path for Golem95
golem = None #

#! Set the samurai directory containing samurai's library
#! It only supports version higher than 2.0.0
#! if auto: try to find it automatically on the system (default)
#! if '' or None: disabling samurai
#! if samurai=/PATH/TO/samurai/lib: use that specific installation path for samu
rai
# samurai = None

#! Set the Ninja directory containing ninja's library
#! if '' or None: disabling ninja
#! if ninja=/PATH/TO/ninja/lib: use that specific installation path for ninja
ninja = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll/HEPTools/lib #

#! Set the COLLIER directory containing COLLIER's library
#! if '' or None: disabling COLLIER
#! if ninja=/PATH/TO/ninja/lib: use that specific installation path for COLLIER
# Note that it is necessary that you have generated a static library for COLLIER
collier = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll/HEPTools/lib #

#! Set how MadLoop dependencies (such as CutTools) should be handled
#!  > external : ML5 places a link to the MG5_aMC-wide libraries
#!  > internal : ML5 copies all dependencies in the output so that it is indepen
dent
#!  > environment_paths : ML5 searches for the dependencies in your environment
path
# output_dependencies = external

#! SysCalc PATH
#! Path to the directory containing syscalc executables
#! relative path start from main directory
# syscalc_path = ./SysCalc

#! Absolute paths to the config script in the bin directory of PineAPPL
#! (to generate PDF-independent fast-interpolation grids).
# pineappl = pineappl

auto_convert_model = True #

# MG5 MAIN DIRECTORY
mg5_path = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll

# MG5 MAIN DIRECTORY
mg5_path = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll

# MG5 MAIN DIRECTORY
mg5_path = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll

# MG5 MAIN DIRECTORY
mg5_path = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll

# MG5 MAIN DIRECTORY
mg5_path = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll

# MG5 MAIN DIRECTORY
mg5_path = /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll

many thanks
Lorenzo

Il martedì 24 gennaio 2023 11:40:49 CET, Olivier Mattelaer <email address hidden> ha scritto:

Your question #704470 on MadGraph5_aMC@NLO changed:
https://answers.launchpad.net/mg5amcnlo/+question/704470

    Status: Open => Answered

Olivier Mattelaer proposed the following answer:
Which file is a symlink?

Independently, which of the three strategies above are you trying to
use?

Looks like you are trying to use the strategy #2, correct?
If yes, how did you setup the cluster information within input/mg5_configuration.txt ?
In that case, you should first check that MG5aMC internal condor submission script is compatible with your cluster.
so you should be able to run as

source /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc11-opt/setup.sh
#execute madgrpah-lepcoll
/afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll/bin/mg5_aMC /afs/cern.ch/user/l/lmarafat/mg5amcnlo-3.0.1-lepcoll/mg5_script.txt
in the terminal and check in another terminal that the heavy computation is indeed submitted to the cluster.
(so in essence checking my #1 option first)

If it is #3, how did you specify the number of core in your
input/mg5_configuration.txt? and I do not see an equivalent setup for
the number of thread to use in your condor script. Is this missing? or
are you using a single thread? or something else that I miss for that
option?

In all cases, how do you specify the queue on which you sumit your job?
It seems missing in your submission script no? (maybe your cluster has a default value for it, do not know if this is possible).

Cheers,

Olivier

--
If this answers your question, please go to the following page to let us
know that it is solved:
https://answers.launchpad.net/mg5amcnlo/+question/704470/+confirm?answer_id=4

If you still need help, you can reply to this email or go to the
following page to enter your feedback:
https://answers.launchpad.net/mg5amcnlo/+question/704470

You received this question notification because you asked the question.

You received this question notification because you asked the question.

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#8

Hi,

But this does not correspond to any of the strategy, I mentioned above.
In your configuration file, you did not setup the run_mode to run in cluster mode, so you will run in multi-core mode (so my strategy #3 above).

You have not defined nb_core so the default is number of thread available on the machine on which you land (which is bad since your condor script does not seem to reserve a full node).

So your job might be killed simply because you use too much resource on the node and you can impact other user (but if your sys-admin is using cgroup to prevent that). Your job can also be killed because you use too much RAM, what is the default RAM allowed for a job on your cluster/queue/... ? If you land on a node with 128 processor, you might try to run 128 times the executable which can quickly blow up the RAM limit since it is likely set for one job (so I would expect between 1 and 4G of RAM)

>Does the code mg5_coll create any symlink when it runs? thanks once more...

What is mg5_coll we do not have such executable.
But from the command above, we do not create any additional symlink.
But looks like you are using a development version of the code which might not be compatible condor cluster.
(That cluster is quite tricky to handle in general and that specific pdf might not be compatible with such cluster)
Can you try first with an official version of the code?

Cheers,

Olivier

Revision history for this message
lorenzo marafatto (lmaraf) said :
#9

unfortunately I can't use the official version because I need to use the "coll" development version...
many thanks
Lorenzo

Il martedì 24 gennaio 2023 13:30:36 CET, Olivier Mattelaer <email address hidden> ha scritto:

Your question #704470 on MadGraph5_aMC@NLO changed:
https://answers.launchpad.net/mg5amcnlo/+question/704470

    Status: Open => Answered

Olivier Mattelaer proposed the following answer:
Hi,

But this does not correspond to any of the strategy, I mentioned above.
In your configuration file, you did not setup the run_mode to run in cluster mode, so you will run in multi-core mode (so my strategy #3 above).

You have not defined nb_core so the default is number of thread
available on the machine on which you land (which is bad since your
condor script does not seem to reserve a full node).

So your job might be killed simply because you use too much resource on
the node and you can impact other user (but if your sys-admin is using
cgroup to prevent that). Your job can also be killed because you use too
much RAM, what is the default RAM allowed for a job on your
cluster/queue/... ? If you land on a node with 128 processor, you might
try to run 128 times the executable which can quickly blow up the RAM
limit since it is likely set for one job (so I would expect between  1
and 4G of RAM)

>Does the code mg5_coll create any symlink when it runs? thanks once
more...

What is mg5_coll we do not have such executable.
But from the command above, we do not create any additional symlink.
But looks like you are using a development version of the code which might not be compatible condor cluster.
(That cluster is quite tricky to handle in general and that specific pdf might not be compatible with such cluster)
Can you try first with an official version of the code?

Cheers,

Olivier

--
If this answers your question, please go to the following page to let us
know that it is solved:
https://answers.launchpad.net/mg5amcnlo/+question/704470/+confirm?answer_id=7

If you still need help, you can reply to this email or go to the
following page to enter your feedback:
https://answers.launchpad.net/mg5amcnlo/+question/704470

You received this question notification because you asked the question.

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#10

I understand that this is not the physics that you are looking for.
But I would say that it would make sense to start to check that
p p > t t~ does run on your cluster before starting to use new type of PDF which might not have been implemented to support
condor cluster.

I will assign Marco to this thread to check if he knows if the path handling of condor and the emela pdf are compatible.
(i.e. for marco, condor cluster transfer all dependency file on the localfilesystem of the node.
For most PDF, you therefore need to specify to the cluster that you have to transfer the grid as well (as input_files)

Cheers,

Olivier

Revision history for this message
lorenzo marafatto (lmaraf) said :
#11

I think I fixed the problem setting the path just like in the local machine using the remote installation directories
now I am executing the jobs...let's see the results
many thanks
Lorenzo

Il martedì 24 gennaio 2023 14:05:47 CET, Olivier Mattelaer <email address hidden> ha scritto:

Your question #704470 on MadGraph5_aMC@NLO changed:
https://answers.launchpad.net/mg5amcnlo/+question/704470

    Assignee: None => marco zaro

--
You received this question notification because you asked the question.

Can you help with this problem?

Provide an answer of your own, or ask lorenzo marafatto for more information if necessary.

To post a message you must log in.