"Invalid instruction" when passing code to SLURM cluster

Asked by matteo maltoni

Hi!

I'm trying to submit to the lemaitre3 cluster a python3 code which computes matrix elements for four-jet events, but the job fails due to an "invalid instruction" in my code, which I paste at the end of the post. The code runs normally both on my laptop and on the front-end of the cluster.

Can you give me any hints about what's wrong?

Here's my code:

import sys
sys.path.append ('/home/users/m/m/mmaltoni/MG5_aMC_v2_9_2')
sys.path.append ('/home/users/m/m/mmaltoni/MG5_aMC_v2_9_2/HEPTools/lhapdf6_py3/bin')
sys.path.append ('/home/users/m/m/mmaltoni/MG5_aMC_v2_9_2/Standalone_4j/SubProcesses')
import allmatrix2py
import madgraph.various.lhe_parser as lhe_parser
import madgraph.various.misc as misc
import myfunctions as myf
from itertools import permutations
import time
import numpy as np

allmatrix2py.initialise ('/home/users/m/m/mmaltoni/MG5_aMC_v2_9_2/bin/PROC_TopEffTh_0/Cards/param_card.dat')
lhe = lhe_parser.EventFile ("/home/users/m/m/mmaltoni/MG5_aMC_v2_9_2/bin/PROC_TopEffTh_0/Events/run_02/unweighted_events.lhe.gz")
lhapdf = misc.import_python_lhapdf ("/home/users/m/m/mmaltoni/MG5_aMC_v2_9_2/HEPTools/lhapdf6_py3/bin/lhapdf-config")

# Getting a PDF member object
pdf = lhapdf.mkPDF("NNPDF23_lo_as_0130_qed", 0)
ELHC = 6.5e3
FactScale = 5e2

sphercut = 0.36

#-------main

pdg_list = [list of all the possible subprocesses' pdg lists]

me_pos = 0
me_neg = 0

sigma_meas = 0.
sigma_meas_sph = 0.

nev = 0
start_t = time.time ()

for event in lhe:
    nev += 1
    if nev%1000. == 0:
        elapsed = (time.time () - start_t)/60
        minu, sec = myf.time_conversion (elapsed)
        print (nev, minu, 'min', sec, 'sec')
    weight = myf.math.fabs (event.wgt)
    p = []
    p_final = []
    pdgl = []
    #pT = []

    for particle in event:
        mom = [particle.E, particle.px, particle.py, particle.pz]
        p.append (mom)
        pdgl.append (particle.pdg)
        if particle.status == 1:
            p_final.append (mom)

    spher = myf.transv_spher (p_final)

    P = myf.invert_momenta (p)

    #calculating the weighted matrix element
    ans_int = 0.

    for pdgl in pdg_list:
        for pdgl_in in permutations (pdgl[:2]):
            pdfw = pdf.xfxQ(pdgl_in[0], p[0][0]/ELHC, FactScale)*pdf.xfxQ(pdgl_in[1], p[1][0]/ELHC, FactScale)
            ans = 0
            for pdgl_out in permutations (pdgl[2:]):
                pdgl_f = pdgl_in + pdgl_out
                ans += allmatrix2py.smatrixhel (pdgl_f, 1, P, .13, 0., -1)
            ans_int += (ans * pdfw) / 6.

    #Counting events
    sigma_meas += event.wgt * np.sign (ans_int)
    sigma_meas_sph += event.wgt * np.sign (spher - sphercut)

print ('sigma meas:', sigma_meas)
print ('sigma spher:', sigma_meas_sph)

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Where did you compile it ? And on which partition did you executed it?

"Invalid instructions" seems that you are using invalid instruction for the cpu that you are using.
Lemaitre3 has two partitions with two different type of cpu.
The debug partition has older cpu which does not support avx512 type of instructions so if you want to run within that partition then you have to reduce compilation flag or compile directly on those nodes

Cheers,

Olivier

> On 25 Feb 2021, at 18:01, matteo maltoni <email address hidden> wrote:
>
> New question #695742 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/695742
>
> Hi!
>
> I'm trying to submit to the lemaitre3 cluster a python3 code which computes matrix elements for four-jet events, but the job fails due to an "invalid instruction" in my code, which I paste at the end of the post. The code runs normally both on my laptop and on the front-end of the cluster.
>
> Can you give me any hints about what's wrong?
>
> Here's my code:
>
>
> import sys
> sys.path.append ('/home/users/m/m/mmaltoni/MG5_aMC_v2_9_2')
> sys.path.append ('/home/users/m/m/mmaltoni/MG5_aMC_v2_9_2/HEPTools/lhapdf6_py3/bin')
> sys.path.append ('/home/users/m/m/mmaltoni/MG5_aMC_v2_9_2/Standalone_4j/SubProcesses')
> import allmatrix2py
> import madgraph.various.lhe_parser as lhe_parser
> import madgraph.various.misc as misc
> import myfunctions as myf
> from itertools import permutations
> import time
> import numpy as np
>
> allmatrix2py.initialise ('/home/users/m/m/mmaltoni/MG5_aMC_v2_9_2/bin/PROC_TopEffTh_0/Cards/param_card.dat')
> lhe = lhe_parser.EventFile ("/home/users/m/m/mmaltoni/MG5_aMC_v2_9_2/bin/PROC_TopEffTh_0/Events/run_02/unweighted_events.lhe.gz")
> lhapdf = misc.import_python_lhapdf ("/home/users/m/m/mmaltoni/MG5_aMC_v2_9_2/HEPTools/lhapdf6_py3/bin/lhapdf-config")
>
> # Getting a PDF member object
> pdf = lhapdf.mkPDF("NNPDF23_lo_as_0130_qed", 0)
> ELHC = 6.5e3
> FactScale = 5e2
>
> sphercut = 0.36
>
> #-------main
>
> pdg_list = [list of all the possible subprocesses' pdg lists]
>
> me_pos = 0
> me_neg = 0
>
> sigma_meas = 0.
> sigma_meas_sph = 0.
>
> nev = 0
> start_t = time.time ()
>
> for event in lhe:
> nev += 1
> if nev%1000. == 0:
> elapsed = (time.time () - start_t)/60
> minu, sec = myf.time_conversion (elapsed)
> print (nev, minu, 'min', sec, 'sec')
> weight = myf.math.fabs (event.wgt)
> p = []
> p_final = []
> pdgl = []
> #pT = []
>
> for particle in event:
> mom = [particle.E, particle.px, particle.py, particle.pz]
> p.append (mom)
> pdgl.append (particle.pdg)
> if particle.status == 1:
> p_final.append (mom)
>
> spher = myf.transv_spher (p_final)
>
> P = myf.invert_momenta (p)
>
> #calculating the weighted matrix element
> ans_int = 0.
>
> for pdgl in pdg_list:
> for pdgl_in in permutations (pdgl[:2]):
> pdfw = pdf.xfxQ(pdgl_in[0], p[0][0]/ELHC, FactScale)*pdf.xfxQ(pdgl_in[1], p[1][0]/ELHC, FactScale)
> ans = 0
> for pdgl_out in permutations (pdgl[2:]):
> pdgl_f = pdgl_in + pdgl_out
> ans += allmatrix2py.smatrixhel (pdgl_f, 1, P, .13, 0., -1)
> ans_int += (ans * pdfw) / 6.
>
> #Counting events
> sigma_meas += event.wgt * np.sign (ans_int)
> sigma_meas_sph += event.wgt * np.sign (spher - sphercut)
>
>
> print ('sigma meas:', sigma_meas)
> print ('sigma spher:', sigma_meas_sph)
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
matteo maltoni (matteo-maltoni) said :
#2

Hi Olivier,

Thank you for your answer.
I executed the script in my $HOME directory on the cluster, allowing both partitions; you can find my script below (sbatch script_name.sh). This is the error I get:

Task ID: 1
/var/spool/slurmd/job69700787/slurm_script: line 20: 11911 Istruzione non consentita python3 $HOME/MG5_aMC_v2_9_2/Standalone_4j/SubProcesses/matrix.py

How can I reduce compilation flag (for lhapdf, for instance)?

Cheers,

Matteo

P.S. Here's my script:

#!/bin/bash
# Submission script for Lemaitre3
#SBATCH --job-name=fourjet
#SBATCH --array=1-10
#SBATCH --time=02:00:00 # hh:mm:ss
#
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=2625 # megabytes
#SBATCH --partition=batch,debug
#
#SBATCH --<email address hidden>
#SBATCH --mail-type=END,FAIL
#
#SBATCH --comment=fourjet
#SBATCH --output=fourjet.out

echo "Task ID: $SLURM_ARRAY_TASK_ID"

module load Python/3.7.4-GCCcore-8.3.0
python3 $HOME/MG5_aMC_v2_9_2/Standalone_4j/SubProcesses/matrix.py

Revision history for this message
Best Olivier Mattelaer (olivier-mattelaer) said :
#3

The issue is that I'm not sure of the translation of " Istruzione non consentita" in english to be sure that my guess is correct.

One fast way to solve your problem (if I'm correct is to ask for the batch partition only.
Another is to recompile everything on the debug partition since a code compile on the debug partition will work on the batch one.

Cheers,

Olivier

> On 25 Feb 2021, at 19:25, matteo maltoni <email address hidden> wrote:
>
> Istruzione non consentita

Revision history for this message
matteo maltoni (matteo-maltoni) said :
#4

I apologise for the Italian, I totally forgot to translate. Some online forums suggest for it a translation like "Invalid" or "Illegal instruction".

Anyway, asking for the batch partition only solved my problem: the job ran on the cluster with no issues.

Cheers,

Matteo

Revision history for this message
matteo maltoni (matteo-maltoni) said :
#5

Thanks Olivier Mattelaer, that solved my question.