variable assigned to register

Asked by jdobry

Hello,

We are using code like this:

__attribute__((always_inline))
inline void svcSendMessage (U32 a, U32 b)
{
  register U32 r0 __asm__("r0") = a;
  register U32 r1 __asm__("r1") = b;
  __asm__ __volatile__ ("svc %0"
      : /* output */
      : "i" 12345, "r" (r0), "r" (r1) /* input(s) */
      : /* list of clobbered registers */);
}

It is "C" wrapper to SVC system call with 2 params in r0 and r1. But there is problem. It works fine with 4.9 2014q3. But not with 4.8 2014q3 (optimalization -O2 is OK, but -O1 and -Oq are sometimes broken).

Because it is SVC wrapper, we can't allow to select register by compiler as usualy for inlined asm.

Where is problem? How to solve this?

PS: I know about possibility to write wrapper in ASM as normal function. But it take longer time ( 2-16 ticks for function call and return depended to branch predictor) I realy want to write inline version for C.

Jiri

Question information

Language:
English Edit question
Status:
Solved
For:
GNU Arm Embedded Toolchain Edit question
Assignee:
No assignee Edit question
Solved by:
jdobry
Solved:
Last query:
Last reply:
Revision history for this message
jdobry (jdobry) said :
#1

I must update my post. I was use bad example. Here is correct exemples

======================================================================================================
//This code is broken on 4.8 2014q3 (-O1 or -Og) and correct at 4.9 2014q4
__attribute__((always_inline))
inline void svcSendMessage (U32 a, U32 *b)
{
  register U32 r0 __asm__("r0") = a;
  register U32 *r1 __asm__("r1") = b;
  __asm__ __volatile__ ("svc %0"
      : /* output */
      : "i" 12345, "r" (r0), "r" (r1) , "m" (*b)/* input(s) */
      : /* list of clobbered registers */);
}

======================================================================================================
//This code is correct on 4.8 2014q3 (-O1 or -Og) and correct at 4.9 2014q4
__attribute__((always_inline))
inline void svcSendMessage (U32 a, U32 *b)
{
  register U32 r0 __asm__("r0") = a;
  volatile register U32 *r1 __asm__("r1") = b;
  __asm__ __volatile__ ("svc %0"
      : /* output */
      : "i" 12345, "r" (r0), "r" (r1) , "m" (*r1)/* input(s) */
      : /* list of clobbered registers */);
}
//Problem is that look like black magic. See to strange usage of volatile. See to "m" assembler input. It works for (*r1) but not for "same" *b

======================================================================================================
//This code is correct on 4.8 2014q3 (-O1 or -Og) bud not in 4.9 2014q4 (see https://answers.launchpad.net/gcc-arm-embedded/+question/260787)
__attribute__((always_inline))
inline void svcSendMessage (U32 a, U32 *b)
{
  register U32 r0 __asm__("r0") = a;
  register U32 *r1 __asm__("r1") = b;
  __asm__ __volatile__ ("svc %0"
      : /* output */
      : "i" 12345, "r" (r0), "r" (r1) /* input(s) */
      : /* list of clobbered registers */);
}
//It no use "m"

Revision history for this message
jdobry (jdobry) said :
#2

Here is example to demonstrate problem. IT is possible reproduce with:
  - 4.8 2014q3 and optimalizaton -O1 and -Og
  - 4.9 2014q4 + 4.9 2015q1 and optimalizaton -O1 (only, not with -Os)

File test.h
---------------------------------------------------------------------------------
inline void svcTest (int a, int *b)
{
  register int p1 __asm__("r0") = a;
  register int *p2 __asm__("r1") = b;
  __asm__ __volatile__ ("svc %0"
      : /* output */
      : "i" (12), "r" (p1), "r" (p2), "m" (*p2)/* input(s) */
      : /* list of clobbered registers */);
}

---------------------------------------------------------------------------------
file test.c
---------------------------------------------------------------------------------
#include <stddef.h>

#include "test.h"

void fooBar (int x, const int *y)
{
  do
  {
    svcTest(33, NULL);
    x--;
  }
  while (x>0);
}

---------------------------------------------------------------------------------
Compile with 4.9 2015q1 (broken example)
arm-none-eabi-gcc -mcpu=cortex-m3 -mthumb -g -Wa,-a,-ad -O1 test.c -c -Wall
And here is broken result http://pastebin.com/2JPXiRiu see to output line 62. It ignores asigment variable p2 to r1 and use r2 register

Compile with 4.9 2015q1 (OK example)
arm-none-eabi-gcc -mcpu=cortex-m3 -mthumb -g -Wa,-a,-ad -O2 test.c -c -Wall
And here is OK result http://pastebin.com/2JPXiRiu see to output line 66

Revision history for this message
jdobry (jdobry) said :
#3

fix:
 IT is possible reproduce with:
  - 4.8 2014q3 and optimalizaton -O1 and -Og
  - 4.9 2014q4 + 4.9 2015q1 and optimalizaton -O1 (only, not with -Og)

Revision history for this message
jdobry (jdobry) said :
#4

And one more fix. I was copy bad URL for OK compilation. Here is correct link
http://pastebin.com/1kFgw5GG see to line 66

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) said :
#5

Hi Jbodry,

I cannot reproduce this behavior with our 4.8 2014-q3 release. I suppose it depends on the function in which this is inlined. Can you give us a full testcase for us to reproduce? I tried compiling just the first example you give in your comment and also compiling it with a caller with reversed arguments (to see if GCC will correctly exchange it when inlining). It worked flawlessly.

Best regards.

Revision history for this message
jdobry (jdobry) said :
#6

OK, here is easy to use testcase to reproduce problem:

---------------------------------------------------------------------------------
File test.h
---------------------------------------------------------------------------------
inline void svcTest (int a, int *b)
{
  register int p1 __asm__("r0") = a;
  register int *p2 __asm__("r1") = b;
  __asm__ __volatile__ ("svc %0"
      : /* output */
      : "i" (12), "r" (p1), "r" (p2), "m" (*p2)/* input(s) */
      : /* list of clobbered registers */);
}

---------------------------------------------------------------------------------
file test.c
---------------------------------------------------------------------------------
#include <stddef.h>

#include "test.h"

void fooBar (int x, const int *y)
{
  do
  {
    svcTest(33, NULL);
    x--;
  }
  while (x>0);
}

---------------------------------------------------------------------------------
Compilation (tested with downloaded windows versions of 4.9. 2015q1 and 4.8 2014q3) ...

arm-none-eabi-gcc -mcpu=cortex-m3 -mthumb -g -Wa,-a,-ad -O1 test.c -c -Wall
.... produce code like this

   movs r1, #33
   movs r2, #0
   mov r0, r1
   svc #12

SVC is called with r0=33 (correct). But incorrect r1=33 and useless r2=0

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) said :
#7

Indeed, I can reproduce the problem. It looked very similar to [1] so I tried the patch that my colleague Hale Wang has been working on [2] and it solve the problem! However, the GCC maintainer have deemed the patch to intrusive for being accepted in the current development version (GCC development is in a stabilization phase before release) so it will go in only in the next development version (GCC 6). Given that several users reported the problem, we might consider doing a backport for our next toolchain release.

[1] https://bugs.launchpad.net/gcc-arm-embedded/+bug/1411655
[2] https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01593.html

Revision history for this message
jdobry (jdobry) said :
#8

Hi Thomas,

Many thanks for links. I was not able find it in GCC maillist. Mainly because I wasn't able to identify and reproduce where is exactly problem till 2015-03-23.

It is EXACTLY what I need to know.
I follow links and I found solution for next development version here
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01593.html

This allow me to made "backport" (same code, difference is just line numbers) for 4.9 2015q1 and here is result:
http://pastebin.com/sfs8hvrb

After 4.9 2015q1 tool recompilation, I get working compiler. Little bit not optimal, but working.

Here is correct result for -O1
    movs r4, #33
    movs r2, #0
    mov r0, r4
    mov r1, r2
    svc #12

Here is correct result for -O2, -O3
    movs r2, #0 ; compiler set this, but not use anywhere
    movs r0, #33
    movs r1, #0
    svc #12

Here is correct result for -Og (optimal, better than -O2 and -O3 -It is surprise)
    movs r0, #33
    movs r1, #0
    svc #12

Again, many thanks!
Jiri

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) said :
#9

Hi Jdobry,

As for the extra movs r2, #0 I believe it comes from the "m" (*b)/* input(s) */ part. If I read GCC documentation [1] correctly this should be done using clobbers:

"The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters)."

[1] https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers

Try listing memory in clobbers and removing that last input to see what GCC is doing.

Best regards.

Revision history for this message
jdobry (jdobry) said :
#10

Hi Thomas

You are right. With "memory" clobber it create optimal result for this example. But create strong memory barrier and this have impact to optimize real more complex code.

Have a nice day,
Jiri

PS: one more thanks for right links.