Will VDIV cause a subsequent BX LR to stall?

Asked by Dan Lewis on 2020-05-27

The VDIV and VSQRT instructions take a long time (14 clock cycles) to execute. Any instruction that needs their result must stall until they complete their execute phase. However, integer operations (e.g., ADD R0,R0,R1) can proceed, overlapping their execution with that of the VDIV or VSQRT. But I'm wondering if that also applies to the BX LR in the following code:

// void foo(float x, float *p) ;
// S0 = x, R0 = p
foo: VSQRT.F32 S0,S0
          VSTR S0,[R0]
          BX LR

If the BX LR doesn't wait for the VSQRT to complete, then what happens if we access the content of *p after returning from the call? I.e., this situation:

         float a, b ;
         foo(a, &b) ;
         ... = b + .... // code that references the content of "b".


Question information

English Edit question
GNU Arm Embedded Toolchain Edit question
No assignee Edit question
Last query:
Last reply:
john (jkovach) said : #1


Common sense suggests that VSTR S0,[R0] should wait for VSQRT.F32 S0,S0 to finish. If it doesn't, the contents of S0 will not be valid when saving it to memory.

Launchpad Janitor (janitor) said : #2

This question was expired because it remained in the 'Open' state without activity for the last 15 days.