4.9 version and inline asm optimizer problem?

Asked by jdobry

Hello,

I found problem with optimizer, when I try use inline asm. Here is exaple of code (in two files !)

===================================================
File test.h:
===================================================
__attribute__((always_inline)) inline void svcTest (int *var)
{
  register int *input1 __asm__("r0") = var;
  __asm__ __volatile__ ("svc 123" : : "r" (input1));
}

===================================================
And file test.c:
===================================================
#include "test.h"

void fooBar(void)
{
  int var;
  var = 1234;
  svcTest(&var);
}

=================================================================
And here is problem. This is CORRECT result from 4.8 2014q3 ane earlier (only instructions) :
=================================================================
fooBar:
        sub sp, sp, #8
        movw r3, #1234
        add r0, sp, #8
        str r3, [r0, #-4]!
        svc 123
        add sp, sp, #8
        bx lr

=================================================================
This is BROKEN result from 4.9 2014q4 (only instructions) :
=================================================================
fooBar:
        sub sp, sp, #8
        add r0, sp, #4
        svc 123
        add sp, sp, #8
        bx lr

=================================================================
Where is "var = 1234;"? This is the bug. Or is it my fault?

Code is compiled by: "arm-none-eabi-gcc.exe -mcpu=cortex-r4 -mthumb -mthumb-interwork -Wall -std=gnu99 -Wa,-a,-ad -Og -c test.c"

I try -O1 -O2, -Os and -Og and all have this problem. Only "-O0" not, but result is too slow and too hungry for RAM. Compilation for cortex-m4 produce same results

Have a nice day,
Jiri

Question information

Language:
English Edit question
Status:
Solved
For:
GNU Arm Embedded Toolchain Edit question
Assignee:
No assignee Edit question
Solved by:
jdobry
Solved:
Last query:
Last reply:
Revision history for this message
Hale Wang (hale-wang) said :
#1

The behavior of gcc 4.9 is correct (better than gcc 4.8).

Because you just use the address of var in the inlined svc function. There is no expression that use the value of var, so the "movw r3, #1234" and "str r3, [r0, #-4]!" instructions are redundant.

If you want to store the address of constant "1234" in r0 and push r0 to stack, you can change your test.h as following (suggested by Terry):

File test.h:
===================================================
__attribute__((always_inline)) inline void svcTest (int var)
{
  volatile int a = var;
  register int *input1 __asm__("r0") = (int *) &a;
  __asm__ __volatile__ ("svc 123" : : "r" (input1));
}
===================================================

This example use the volatile to keep the compiler from removing the unused constant '1234'.

And the assembly code is generated as you wish:

=================================================================
fooBar:
        sub sp, sp, #8
        add r0, sp, #8
        movw r3, #1234
        str r3, [r0, #-4]!

        svc 123

        add sp, sp, #8
        bx lr

=================================================================

Revision history for this message
jdobry (jdobry) said :
#2

Sorry, this is not solution!
You change parameter pass into svcTest from indirect to value. It is not possible because in real world it is not pointer to integer, but pointer to complex structure.

Another example for this problem is here (incomplete)

struct abcd {
  int a;
  int b;
  int c;
  void (*d)(void);
}

void funcD (void)
{
  ....
}

void fooBar(void)
{
  struct abcd abcd;
  abcd.a = 123;
  abcd.b = 123;
  abcd.c = 123;
  abcd.d = funcD;
  svcTest(&abcd);
  abcd.a = 456; // !!! this line is lost on 4.9 !!!!
  svcTest(&abcd);
}

Revision history for this message
jdobry (jdobry) said :
#3

I found workaround for this moment. It need to add "memory" keyword into asm linellike this
  __asm__ __volatile__ ("svc 123" : : "r" (input1) : "memory");

But it is only workaround, because this keyword inform compiler, that asm can change memory. But this call only read memory without changes. In another word code will be unoptimized. Example:

test.h:
struct ab {
  int a;
  int b;
};

inline void svcTest (const struct abc * const var)
{
  register const struct abc *input1 __asm__("r0") = var;
  __asm__ __volatile__ ("svc 123" : : "r" (input1) : "memory");
}

test.c:
#include "test.h"

void foo(void)
{
  struct ab ab;
  abcd.a = 1234;
  abcd.b = 1234;
  svcTest(&abcd);
  bar(abcd.b); // this line will read value from memory, but we have it in register!
}

Revision history for this message
Hale Wang (hale-wang) said :
#4

The key point here is that only the address is used but not the value. In this case, the value stored in this address is always redundant.

In other words, whatever value is stored in this address does not affect the result of the procedue.

Revision history for this message
Hale Wang (hale-wang) said :
#5

You can try the following instructions which means you need to read the value in the memory whose address is stored in r0.

inline void svcTest (const struct abc * const var)
{
  register const struct abc *input1 __asm__("r0") = var;
  __asm__ __volatile__ ("svc 123" : : "r" (input1) : "memory" (*value));
}

Revision history for this message
Hale Wang (hale-wang) said :
#6

Sorry for typo error. The 'value' should be 'var'.

Revision history for this message
Hale Wang (hale-wang) said :
#7

Sorry for mistake. The command should be (no clobbers):

__asm__ __volatile__ ("svc 123" : : "r" (input1) , "m" (*var));

Revision history for this message
jdobry (jdobry) said :
#8

Eureka!
Thanks Hale for directing, I have probably final and bulet-proof solution. See to this example:

test.h
struct ab {
  int a;
  int b;
};

inline void svcTest (const struct ab * const var)
{
  register const struct ab *input1 __asm__("r0") = var;
  __asm__ __volatile__ ("svc 123" : : "r" (input1), "m" (*var));
}

test.c
#include "test.h"

void foo(void)
{
  struct ab ab;
  ab.a = 1234;
  ab.b = 4321;
  svcTest(&ab);
  ab.b = ab.a;
  svcTest(&ab);
}

And here is result:

sub sp, sp, #8
movw r3, #1234
str r3, [sp]
movw r2, #4321
str r2, [sp, #4]
mov r0, sp
svc 123
str r3, [sp, #4] // use register, compiler known, that memory is readed, but not modified
svc 123
add sp, sp, #8
bx lr