"Broken pipe" when running MadGraph on cluster

Asked by matteo maltoni

Hi,

I need to run a MadGraph process ( p p > j j j j NP==2 using the TopEffTh model) on the lemaitre3 cluster, through a script that I wrote.
The process gets stuck at:

INFO: Creating files in directory P1_gg_gggg
INFO: Computing Color-Flow optimization [33360 term]

and, after a while, my laptop disconnects from the cluster with the message:

client_loop: send disconnect: Broken pipe

This does not happen when I do the same on my laptop.
Do you have any idea about where the problem could be?

I add below the script I use, which I run through the command ./mg5_aMC nameofthescript.txt

import model TopEffTh

generate p p > j j j j QED=0 NP==2
output

launch
set Lambda 5e+03
set RC3phiq 0e+00
set IC3phiq 0e+00
set RCtW 0e+00
set ICtW 0e+00
set RCtG 0e+00
set ICtG 0e+00
set CG 1.1
set CphiG 0e+00
set C13qq 0e+00
set C81qq 0e+00
set C83qq 0e+00
set C8ut 0e+00
set C8dt 0e+00
set C1qu 0e+00
set C1qd 0e+00
set C1qt 0e+00
set nevents 100000
set fixed_ren_scale True
set fixed_fac_scale True
set scale 150
set dsqrt_q2fact1 150
set dsqrt_q2fact2 150
set ptj 50
set drjj 4e-01

launch
set Lambda 5e+03
set RC3phiq 0e+00
set IC3phiq 0e+00
set RCtW 0e+00
set ICtW 0e+00
set RCtG 0e+00
set ICtG 0e+00
set CG 1.1
set CphiG 0e+00
set C13qq 0e+00
set C81qq 0e+00
set C83qq 0e+00
set C8ut 0e+00
set C8dt 0e+00
set C1qu 0e+00
set C1qd 0e+00
set C1qt 0e+00
set nevents 100000
set fixed_ren_scale True
set fixed_fac_scale True
set scale 500.0
set dsqrt_q2fact1 500.0
set dsqrt_q2fact2 500.0
set ptj 200
set drjj 4e-01

launch
set Lambda 5e+03
set RC3phiq 0e+00
set IC3phiq 0e+00
set RCtW 0e+00
set ICtW 0e+00
set RCtG 0e+00
set ICtG 0e+00
set CG 1.1
set CphiG 0e+00
set C13qq 0e+00
set C81qq 0e+00
set C83qq 0e+00
set C8ut 0e+00
set C8dt 0e+00
set C1qu 0e+00
set C1qd 0e+00
set C1qt 0e+00
set nevents 100000
set fixed_ren_scale True
set fixed_fac_scale True
set scale 1000.0
set dsqrt_q2fact1 1000.0
set dsqrt_q2fact2 1000.0
set ptj 500
set drjj 4e-01

launch
set Lambda 5e+03
set RC3phiq 0e+00
set IC3phiq 0e+00
set RCtW 0e+00
set ICtW 0e+00
set RCtG 0e+00
set ICtG 0e+00
set CG 1.1
set CphiG 0e+00
set C13qq 0e+00
set C81qq 0e+00
set C83qq 0e+00
set C8ut 0e+00
set C8dt 0e+00
set C1qu 0e+00
set C1qd 0e+00
set C1qt 0e+00
set nevents 100000
set fixed_ren_scale True
set fixed_fac_scale True
set scale 1500.0
set dsqrt_q2fact1 1500.0
set dsqrt_q2fact2 1500.0
set ptj 750
set drjj 4e-01

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

The cluster is killing your connection.
The easier is to run with a screen shell to ensure that you will not be disconected and that you can re-connect later to the shell:
https://linuxize.com/post/how-to-use-linux-screen/ <https://linuxize.com/post/how-to-use-linux-screen/>

Cheers,

Olivier

> On 18 Feb 2021, at 17:20, matteo maltoni <email address hidden> wrote:
>
> New question #695627 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/695627
>
> Hi,
>
> I need to run a MadGraph process ( p p > j j j j NP==2 using the TopEffTh model) on the lemaitre3 cluster, through a script that I wrote.
> The process gets stuck at:
>
> INFO: Creating files in directory P1_gg_gggg
> INFO: Computing Color-Flow optimization [33360 term]
>
> and, after a while, my laptop disconnects from the cluster with the message:
>
> client_loop: send disconnect: Broken pipe
>
> This does not happen when I do the same on my laptop.
> Do you have any idea about where the problem could be?
>
> I add below the script I use, which I run through the command ./mg5_aMC nameofthescript.txt
>
>
> import model TopEffTh
>
> generate p p > j j j j QED=0 NP==2
> output
>
> launch
> set Lambda 5e+03
> set RC3phiq 0e+00
> set IC3phiq 0e+00
> set RCtW 0e+00
> set ICtW 0e+00
> set RCtG 0e+00
> set ICtG 0e+00
> set CG 1.1
> set CphiG 0e+00
> set C13qq 0e+00
> set C81qq 0e+00
> set C83qq 0e+00
> set C8ut 0e+00
> set C8dt 0e+00
> set C1qu 0e+00
> set C1qd 0e+00
> set C1qt 0e+00
> set nevents 100000
> set fixed_ren_scale True
> set fixed_fac_scale True
> set scale 150
> set dsqrt_q2fact1 150
> set dsqrt_q2fact2 150
> set ptj 50
> set drjj 4e-01
>
> launch
> set Lambda 5e+03
> set RC3phiq 0e+00
> set IC3phiq 0e+00
> set RCtW 0e+00
> set ICtW 0e+00
> set RCtG 0e+00
> set ICtG 0e+00
> set CG 1.1
> set CphiG 0e+00
> set C13qq 0e+00
> set C81qq 0e+00
> set C83qq 0e+00
> set C8ut 0e+00
> set C8dt 0e+00
> set C1qu 0e+00
> set C1qd 0e+00
> set C1qt 0e+00
> set nevents 100000
> set fixed_ren_scale True
> set fixed_fac_scale True
> set scale 500.0
> set dsqrt_q2fact1 500.0
> set dsqrt_q2fact2 500.0
> set ptj 200
> set drjj 4e-01
>
> launch
> set Lambda 5e+03
> set RC3phiq 0e+00
> set IC3phiq 0e+00
> set RCtW 0e+00
> set ICtW 0e+00
> set RCtG 0e+00
> set ICtG 0e+00
> set CG 1.1
> set CphiG 0e+00
> set C13qq 0e+00
> set C81qq 0e+00
> set C83qq 0e+00
> set C8ut 0e+00
> set C8dt 0e+00
> set C1qu 0e+00
> set C1qd 0e+00
> set C1qt 0e+00
> set nevents 100000
> set fixed_ren_scale True
> set fixed_fac_scale True
> set scale 1000.0
> set dsqrt_q2fact1 1000.0
> set dsqrt_q2fact2 1000.0
> set ptj 500
> set drjj 4e-01
>
> launch
> set Lambda 5e+03
> set RC3phiq 0e+00
> set IC3phiq 0e+00
> set RCtW 0e+00
> set ICtW 0e+00
> set RCtG 0e+00
> set ICtG 0e+00
> set CG 1.1
> set CphiG 0e+00
> set C13qq 0e+00
> set C81qq 0e+00
> set C83qq 0e+00
> set C8ut 0e+00
> set C8dt 0e+00
> set C1qu 0e+00
> set C1qd 0e+00
> set C1qt 0e+00
> set nevents 100000
> set fixed_ren_scale True
> set fixed_fac_scale True
> set scale 1500.0
> set dsqrt_q2fact1 1500.0
> set dsqrt_q2fact2 1500.0
> set ptj 750
> set drjj 4e-01
>
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
matteo maltoni (matteo-maltoni) said :
#2

Dear Olivier,

Thank you for the suggestion, now the connection remains stable.

Computations, though, have remained stuck for more than 8 hours at:

INFO: Creating files in directory P1_gg_gggg
INFO: Computing Color-Flow optimization [33360 term]

and no lhe file has been produced.
Do you think it is normal?

Revision history for this message
matteo maltoni (matteo-maltoni) said :
#3

Maybe this can be helpful: when I run the same script on my laptop, MG5 doesn't print the line

INFO: Computing Color-Flow optimization [33360 term]

but just

INFO: Creating files in directory P1_gg_gggg
INFO: Generating Feynman diagrams for Process: g g > g g g g QED=0 NP==2 @1
INFO: Finding symmetric diagrams for subprocess group gg_gggg
INFO: Creating files in directory P1_gg_ggqq

Revision history for this message
Best Olivier Mattelaer (olivier-mattelaer) said :
#4

Hi,

On my laptop this step takes a bit more than 8 min:
INFO: Computing Color-Flow optimization [33360 term]
INFO: Color-Flow passed to 120 term in 493s. Introduce 2544 contraction
and speed up that part of the computation in the fortran code by a factor or ~10 (33360/(2544+120))

You can de-activate that optimization by doing
output --jamp_optim=False

This is a new optimization so I guess that the version on your laptop is older.

Cheers,

Olivier

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#5

This being said with the following patch, this optimization takes only 45s to run.

=== modified file 'madgraph/iolibs/export_v4.py'
--- madgraph/iolibs/export_v4.py 2021-02-15 22:18:16 +0000
+++ madgraph/iolibs/export_v4.py 2021-02-19 19:57:09 +0000
@@ -1453,6 +1453,8 @@
         #misc.sprint(len(all_element))

         self.myjamp_count = 0
+ for key in all_element:
+ all_element[key] = complex(all_element[key])
         new_mat, defs = self.optimise_jamp(all_element)
         if start_time:
             logger.info("Color-Flow passed to %s term in %ss. Introduce %i contraction", len(new_mat), int(time.time()-start_time), len(defs))

Cheers,

Olivier

Revision history for this message
matteo maltoni (matteo-maltoni) said :
#6

Thanks Olivier Mattelaer, that solved my question.