MadGraph5_aMC@NLO

Infinite loop in compilation at -O3 in gfortran

Bug #1998203 reported by Zachary Marshall on 2022-11-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	MadGraph5_aMC@NLO	Fix Released	Undecided	Unassigned

Bug Description

Hi authors,

Running:

import model HC_NLO_X0_UFO-4Fnoyb
generate p p > x0 t b~ j $$ w+ w- [QCD] @0
add process p p > x0 t~ b j $$ w+ w- [QCD] @1

generates fortran code like that in the attached tarball in the latest MG5_aMC version. When this is compiled with -O3 in gfortran 11, we encounter what appears to be an infinite loop in compilation (at least the compilation of this file runs for more than a day). -O2 works just fine, and older versions of gfortran seem to work fine.

We'd be very happy to have your advice on how to proceed here.

Thanks to Nello Bruscino for the original report and Jan Kretzschmar for the code extraction.

Thanks,
Zach

Revision history for this message

Zachary Marshall (zach-marshall) wrote on 2022-11-29:

polynomial.tar.gz Edit (12.0 KiB, application/x-tar)

Revision history for this message

Zachary Marshall (zach-marshall) wrote on 2022-11-29:

One of our super-experts (Scott Snyder) sent the following along:

The problem comes from UPDATE_WL_6_0 and similar, where we have 210 lines
inside a triply-nested loop like:

OUT(J,0,I)=OUT(J,0,I)+A(K,0,I)*B(J,0,K)
OUT(J,1,I)=OUT(J,1,I)+A(K,1,I)*B(J,0,K)

This seems to trigger ~ O(N^3) behavior in the loop
induction variable / strength reduction optimization.

I can avoid the bad behavior if i break up the inner loop by adding

enddo
DO K=1,IN_SIZE

about every 50 lines.

Perhaps this could be suggested to the authors; it looks like it should
be an easy change in write_expanded_wl_updater() in q_polynomial.py.

I reproduced this with gcc12. However, i'm just starting today to look
at gcc13. If the issue's still there, i'll look into reporting
it to the gcc folks.

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2022-11-29: Re: [Bug 1998203] Infinite loop in compilation at -O3 in gfortran

Hi Zach,

My experience with compilation flag is that so far using O3 does not bring that much
but for the computation of the color matrix which is the only place where the code can be auto-vectorised.
Additionally, another speed-up can be obtained by using fast-math within the Source/DHELAS directory.
The fast-math can not be used within other directory since it decrease precision of special function
(which are not used --in general, this depend of the UFO model-- within that directory.

Compiling with -O3 those loop-update file is quite irrelevant in term of speed and if it is time consuming it should be avoided.

Cheers,

Olivier

> On 29 Nov 2022, at 19:10, Zachary Marshall <email address hidden> wrote:
>
> One of our super-experts (Scott Snyder) sent the following along:
>
> The problem comes from UPDATE_WL_6_0 and similar, where we have 210 lines
> inside a triply-nested loop like:
>
> OUT(J,0,I)=OUT(J,0,I)+A(K,0,I)*B(J,0,K)
> OUT(J,1,I)=OUT(J,1,I)+A(K,1,I)*B(J,0,K)
>
> This seems to trigger ~ O(N^3) behavior in the loop
> induction variable / strength reduction optimization.
>
> I can avoid the bad behavior if i break up the inner loop by adding
>
> enddo
> DO K=1,IN_SIZE
>
> about every 50 lines.
>
> Perhaps this could be suggested to the authors; it looks like it should
> be an easy change in write_expanded_wl_updater() in q_polynomial.py.
>
> I reproduced this with gcc12. However, i'm just starting today to look
> at gcc13. If the issue's still there, i'll look into reporting
> it to the gcc folks.
>
> --
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1998203
>
> Title:
> Infinite loop in compilation at -O3 in gfortran
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> Hi authors,
>
> Running:
>
> import model HC_NLO_X0_UFO-4Fnoyb
> generate p p > x0 t b~ j $$ w+ w- [QCD] @0
> add process p p > x0 t~ b j $$ w+ w- [QCD] @1
>
> generates fortran code like that in the attached tarball in the latest
> MG5_aMC version. When this is compiled with -O3 in gfortran 11, we
> encounter what appears to be an infinite loop in compilation (at least
> the compilation of this file runs for more than a day). -O2 works just
> fine, and older versions of gfortran seem to work fine.
>
> We'd be very happy to have your advice on how to proceed here.
>
> Thanks to Nello Bruscino for the original report and Jan Kretzschmar
> for the code extraction.
>
> Thanks,
> Zach
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1998203/+subscriptions
>

Hi Zach,

Compiling with -O3 those loop-update file is quite irrelevant in term of speed and if it is time consuming it should be avoided.

Cheers,

Olivier

> On 29 Nov 2022, at 19:10, Zachary Marshall <1998203@bugs.launchpad.net> wrote:
> 
> One of our super-experts (Scott Snyder) sent the following along:
> 
> The problem comes from UPDATE_WL_6_0 and similar, where we have 210 lines
> inside a triply-nested loop like:
> 
>           OUT(J,0,I)=OUT(J,0,I)+A(K,0,I)*B(J,0,K)
>           OUT(J,1,I)=OUT(J,1,I)+A(K,1,I)*B(J,0,K)
> 
> This seems to trigger ~ O(N^3) behavior in the loop
> induction variable / strength reduction optimization.
> 
> I can avoid the bad behavior if i break up the inner loop by adding
> 
>         enddo
>         DO K=1,IN_SIZE
> 
> about every 50 lines.
> 
> Perhaps this could be suggested to the authors; it looks like it should
> be an easy change in write_expanded_wl_updater() in q_polynomial.py.
> 
> I reproduced this with gcc12.  However, i'm just starting today to look
> at gcc13.  If the issue's still there, i'll look into reporting
> it to the gcc folks.
> 
> -- 
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1998203
> 
> Title:
>  Infinite loop in compilation at -O3 in gfortran
> 
> Status in MadGraph5_aMC@NLO:
>  New
> 
> Bug description:
>  Hi authors,
> 
>  Running:
> 
>  import model HC_NLO_X0_UFO-4Fnoyb
>  generate p p > x0 t b~ j $$ w+ w- [QCD] @0
>  add process p p > x0 t~ b j $$ w+ w- [QCD] @1
> 
>  generates fortran code like that in the attached tarball in the latest
>  MG5_aMC version. When this is compiled with -O3 in gfortran 11, we
>  encounter what appears to be an infinite loop in compilation (at least
>  the compilation of this file runs for more than a day). -O2 works just
>  fine, and older versions of gfortran seem to work fine.
> 
>  We'd be very happy to have your advice on how to proceed here.
> 
>  Thanks to Nello Bruscino for the original report and Jan Kretzschmar
>  for the code extraction.
> 
>  Thanks,
>  Zach
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1998203/+subscriptions
>

Revision history for this message

Zachary Marshall (zach-marshall) wrote on 2022-11-29:

Thanks Olivier! I do see this in the make file:

# For the compilation of the MadLoop file polynomial.f it makes a big difference to use -O3 and
# to turn off the bounds check. These can however be modified here if really necessary.
POLYNOMIAL_OPTIMIZATION = -O3
POLYNOMIAL_BOUNDS_CHECK =

Does your comment imply this is something you want to change or revisit in MadGraph directly? Or is this something you're expecting us to change locally while we're experiencing this issue?

Best,
Zach

Revision history for this message

Zachary Marshall (zach-marshall) wrote on 2022-12-07:

Hi,

Just checking to see if there's news on this?

Thanks,
Zach

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2022-12-12:

Not sure what is the good strategy here.
It seems to be a problem for one compiler and one process.
The comments seem to indicate that I wrongly remember the interest of the -O3 in term of performance.

So yes, modifying it by hand for that specific process seems to be the way to go.
If you hear more about such type of issue then we should think more about a solution.
But likely nothing that I would do for the LTS version. Maybe more some additional entry in the run_card to allow to control such flag and maybe put a timer around that compilation.

Cheers,

Olivier

Revision history for this message

Zachary Marshall (zach-marshall) wrote on 2023-01-08:

Hi Olivier,

Sorry for the delay in replying here. My understanding is that Scott's patch above in #2 is safe, shouldn't cause any change in physics or performance penalty, and will get us through this compilation issue. If that can be added to the development version (I agree, not for the LTS version), that would be great.

Thanks,
Zach

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2023-01-11:

Can you test if the following patch would handle the situation?

diff --git a/madgraph/various/q_polynomial.py b/madgraph/various/q_polynomial.py
index c5f7efb49..df103d73f 100755
--- a/madgraph/various/q_polynomial.py
+++ b/madgraph/various/q_polynomial.py
@@ -590,7 +590,7 @@ C ARGUMENTS
         lines.append(" DO K=0,%d"%(get_number_of_coefs_for_rank(r_2+r_1)-1))
         lines.append(" OUT(J,K,I)=%s"%self.czero)
         lines.append(" ENDDO")
- lines.append(" DO K=1,IN_SIZE")
+ #lines.append(" DO K=1,IN_SIZE")

         # Now we write the lines defining the coefs of OUT(j,*,i) from those
         # of A(k,*,i) and B(j,*,k)
@@ -609,16 +609,26 @@ C ARGUMENTS
                 except KeyError:
                     coef_expressions[new_position]=[new_term,]
         keys = sorted(list(coef_expressions.keys()))
+ max_innerloop = 50
+ line_in_inner = 0
         for coef in keys:
+ if line_in_inner == 0:
+ lines.append(" DO K=1,IN_SIZE")
             value = coef_expressions[coef]
             split=0
             while split<len(value):
                 lines.append("OUT(J,%d,I)=OUT(J,%d,I)+"%(coef,coef)+\
                              '+'.join(value[split:split+self.line_split]))
                 split=split+self.line_split
+ line_in_inner +=1
+ if line_in_inner == max_innerloop:
+ line_in_inner = 0
+ lines.append(" ENDDO")
+

         # And now we simply close the enddo.
- lines.append(" ENDDO")
+ if line_in_inner:
+ lines.append(" ENDDO")
         lines.append(" ENDDO")
         lines.append("ENDDO")
         lines.append("END")

I test that tt~ at NLO was still working, but I need you to test in your case.

Thanks,

Olivier

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2023-01-12:

So I have put the patch in a branch:
https://github.com/mg5amcnlo/mg5amcnlo/commit/a3a69d8e5d4434349512e6dcf3f7dc22f9ce6c43
But I will wait for your tests before pushing this.

Cheers and thanks,

Olivier

Changed in mg5amcnlo:
status:	New → Fix Committed

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2023-05-17:

#10

Any check on this?
Can you confirm if I can merge in the future 3.5.1? (We miss 3.5.0)

Olivier

Changed in mg5amcnlo:
status:	Fix Committed → Fix Released
status:	Fix Released → Fix Committed

Olivier Mattelaer (olivier-mattelaer) on 2024-04-06

Changed in mg5amcnlo:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

polynomial.tar.gz Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.