ARM Thumb code size increase with 4.9

Asked by Mahavir

I observed code size increase with 4.9 toolchain, compared with 4.8.x versions for THUMB2 with -Os.

E.g. Test program (sets bit field in struct)

Code generated with 4.8.x toolchain
-----------------------------------------
        ldr r3, [pc, #8]
        ldr r2, [r3, #16]
        bfi r2, r0, #1, #3
        str r2, [r3, #16]
        bx lr
       .word <addr>

Code generated with 4.9 toolchain
---------------------------------------

     ldr r2, [pc, #16]
     and.w r0, r0, #7
     ldr r3, [r2, #16]
     bic.w r3, r3, #14
     orr.w r0, r3, r0, lsl #1
     str r0, [r2, #16]
     bx lr
     .word <addr>

Flags used are
----------------
-mcpu=cortex-m3 -mthumb -Os

Does anyone else also observed same? Overall increase seems substantial one.

Similar Reference
-------------------
http://comments.gmane.org/gmane.comp.gcc.bugs/417969

Any help/pointers would be highly appreciated.

Thanks...Mahavir

Question information

Language:
English Edit question
Status:
Solved
For:
GNU Arm Embedded Toolchain Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Terry Guo (terry.guo) said :
#1

Would you please provide a rather complete case for us to reproduce? For below simple case, I think 4.9 generates expected code:

terguo01@terry-pc01:tmp$ cat x.c
typedef struct st
{
  unsigned short a;
  unsigned int b : 24;
  unsigned int c : 4 ;
  unsigned int d : 3 ;
  unsigned int f : 1 ;
  unsigned int g;
} P;

P p;

void
foo ()
{
  p.d = 4;
}

arm-none-eabi-gcc -mthumb -mcpu=cortex-m3 -Os x.c -S

foo:
 @ args = 0, pretend = 0, frame = 0
 @ frame_needed = 0, uses_anonymous_args = 0
 @ link register save eliminated.
 ldr r3, .L2
 movs r1, #4
 ldrb r2, [r3, #7] @ zero_extendqisi2
 bfi r2, r1, #4, #3
 strb r2, [r3, #7]
 bx lr
.L3:
 .align 2
.L2:
 .word p
 .size foo, .-foo
 .comm p,12,4
 .ident "GCC: (GNU Tools for ARM Embedded Processors) 4.9.3 20141119 (release) [ARM/embedded-4_9-branch revision 218278]"

Revision history for this message
Mahavir (mahavirpj) said :
#2

Hello Terry,

Thanks for reply.

Digging more, it seems like issue is with `volatile` access.

Please see following example:
----------------------------------

$ cat y.c
#include <stdint.h>

struct tmp {
 uint32_t dummy;
 union {
  struct {
   uint32_t xyz : 1;
   uint32_t mode: 3;
   uint32_t res : 28;
  } bf;
  uint32_t wordval;
 } reg;
};

void set_mode(int mode)
{
 volatile struct tmp *t = (struct tmp *) 0x1000;
 t->reg.bf.mode = mode;
}

$ arm-none-eabi-gcc -c -mthumb -mcpu=cortex-m3 -Os y.c
$ arm-none-eabi-gcc -v
gcc version 4.9.3 20141119
$ arm-none-eabi-size y.o
   text data bss dec hex filename
     22 0 0 22 16 y.o

$ arm-none-eabi-gcc -c -mthumb -mcpu=cortex-m3 -Os y.c
$ arm-none-eabi-gcc -v
gcc version 4.8.4 20140725
$ arm-none-eabi-size y.o
   text data bss dec hex filename
     14 0 0 14 e y.o

Same is true if your test snippet is changed for volatile access to struct bit-fields.

Thanks...Mahavir

Revision history for this message
Terry Guo (terry.guo) said :
#3

It is the impact of option -fstrict-volatile-bitfields, you can turn it off by -fno-strict-volatile-bitfields. But recommend to keep the option on. See detailed explanation from: https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/Code-Gen-Options.html#index-fstrict-volatile-bitfields-2633.

Revision history for this message
Andreas Fritiofson (andreas-fritiofson) said :
#4

I would expect the -fstrict-volatile-bitfields option to only affect how bitfields are loaded and stored. In the example, the exact same load and store instructions are generated. The only difference is how the intermediate register operations are optimized.

With the standard 4.9 options, the bit field is inserted with an "and, bic, orr" sequence while the 4.8 default generates the equivalent single "bfi" instruction.

Why is -fstrict-volatile-bitfields generating worse code for register operations? Surely it doesn't have to do that?

Revision history for this message
Mahavir (mahavirpj) said :
#5

Hi Terry,

Comment from Andreas looks valid. So, is that an issue?

Thanks.

Revision history for this message
Terry Guo (terry.guo) said :
#6

Sorry for the delay and thanks for comments. Now I get your point and have same feeling that there is something we can improve. I will dig it more.

Revision history for this message
Joey Ye (jinyun-ye) said :
#7

Reported as GCC bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65067, where please follow discussion and status.

Revision history for this message
Terry Guo (terry.guo) said :
#8

Issue is fixed in 5.0 now as https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00264.html, but won't be back ported to 4.9 because it is not a bug fix.

The 5.0 2015Q4 major release will have this fix.

Revision history for this message
Terry Guo (terry.guo) said :
#9