GNU Arm Embedded Toolchain

Bug: incorrect 'mul' used for "invariant division with multiplication" optimisation

Asked by John Adriaan on 2019-09-14

SYNOPSIS

gcc is attempting to use the "invariant division using multiplication" optimisation to avoid a 'div' instruction. The idea is that instead of "divide by 3", you can "multiply by 0xaaaaaaab then shift right 33 places". Only, the generated code is wrong in two respects.

THE PROJECT

To demonstrate the bug, I made a Mickey Mouse project:

/* heap.h */
struct Node {
unsigned canary;
unsigned size;
unsigned data;
}; // Node
extern struct Node heap[];
extern struct Node heap_end;

/* heap.c */
#include "heap.h"
struct Node heap[1024];
struct Node heap_end;

/* main.c */
#include "heap.h"
int main() {
int size = &heap_end - heap;
return size;
} // main()

THE DESCRIPTION

The compiler sees that the two pointers-to-Node are subtracted, so it takes the difference between the addresses then divides by 12. Only, it actually divides by 4 with a shift, then (attempts to) divide by 3 using the above optimisation.

The two problems are:
1) It uses "mul.w", which is a "32-bit by 32-bit multiply, keeping the lower 32-bit result". That immediately throws away the desired top half! It should use "umull" instead, which is 32x32=>64.
2) There is no attempt to do the 33-bit shift right. If "umull" had have been used, then the upper 32-bit result could have been shifted right once, then used as the result.

If I change the size of Node to 8 or 16, then no explicit division is required (a shift will do) and the correct code is produced. It's a Node size of 12 that's the problem.

THE HOST

Windows 10 64-bit, latest update.

THE COMPILERS

I used the one that came with STM32CubeIDE, and then updated to the latest. Both produced identical incorrect code.

* arm-none-eabi-gcc (GNU Tools for STM32 7-2018-q2-update.20190328-1800) 7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907]
* arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 8-2019-q3-update) 8.3.1 20190703 (release) [gcc-8-branch revision 273027]

THE COMPILE LINE

Note I get identical results whether the compile is -O0 or -O3

arm-none-eabi-gcc "../project/main.c" -mcpu=cortex-m4 -std=gnu11 -g3 -c -O0 -ffunction-sections -fdata-sections -Wall -Wextra -Wswitch-default -Wswitch-enum -Wconversion -fno-tree-loop-distribute-patterns --version -fstrict-volatile-bitfields -fno-strict-aliasing -fwrapv -fstack-usage -MMD -MP -MF"Project/main.d" -MT"Project/main.o" --specs=nano.specs -mfpu=fpv4-sp-d16 -mfloat-abi=hard -mthumb -o "project/main.o"

Ditto for heap.c

THE ASSEMBLER LISTING

int main() {
8000000: b480 push {r7}
8000002: b083 sub sp, #12
8000004: af00 add r7, sp, #0
int size = &heap_end - heap;
8000006: 4a07 ldr r2, [pc, #28] ; (8000024 <main+0x24>)
8000008: 4b07 ldr r3, [pc, #28] ; (8000028 <main+0x28>)
800000a: 1ad3 subs r3, r2, r3
800000c: 109b asrs r3, r3, #2
800000e: 4a07 ldr r2, [pc, #28] ; (800002c <main+0x2c>)
8000010: fb02 f303 mul.w r3, r2, r3
8000014: 607b str r3, [r7, #4]
return size;
8000016: 687b ldr r3, [r7, #4]
} // main()
8000018: 4618 mov r0, r3
800001a: 370c adds r7, #12
800001c: 46bd mov sp, r7
800001e: f85d 7b04 ldr.w r7, [sp], #4
8000022: 4770 bx lr
8000024: 20003000 .word 0x20003000
8000028: 20000000 .word 0x20000000
800002c: aaaaaaab .word 0xaaaaaaab

Question information

Language:: English Edit question

Status:: Solved

For:: GNU Arm Embedded Toolchain Edit question

Assignee:: No assignee Edit question

Solved by:: john

Solved:: 2019-09-16

Last query:: 2019-09-16

Last reply:: 2019-09-16

Link existing bug