Address access optimizations

Asked by Michael Steinberg

Hello there,

I was playing around with some templates to configure the pins of an stm f4. I uploaded the offending code to http://pastebin.com/Yaj9Qme2 .
It is some bad template "magic", granted, I like playing around. Rest assured I was attacked for the bare existance of this source code by multiple parties already!
I looked at the assembly output with -Os and -O3. Two observations:
1: -Os actually produces more code than -O3, because it seems not catch the static nature of the code. Is that because the compiler can not know in how many compilation units the code might be referenced and thus assumes multiple use, so runtime calls will produce smaller code?

2: -O3. The template soup gets reduced to six stores (which obviously is very good). However the compiler includes six absolute addresses for the stores, instead of one or maybe two addresses and using relative writes subsequently. Is the address-optimization pass coming to early to catch the template-generated code? Can I do something about it?

The code is rather lengthy but has no external dependency, so it should compile right away.

Best regards,
Michael

Question information

Language:
English Edit question
Status:
Solved
For:
GNU Arm Embedded Toolchain Edit question
Assignee:
No assignee Edit question
Solved by:
Zhenqiang Chen
Solved:
Last query:
Last reply:
Revision history for this message
Terry Guo (terry.guo) said :
#1

I changed the "void playground_main()" to "int main()", so that I can build your code into a complete binary. Here are the code size of final binaries:

O2 text: 6728
O3 text: 6728
Os text: 6892

By the way, they have same data size.

Indeed the code size doesn't become smaller for Os. If I build them with recent gcc 5.0, I got worse results:

O2 text: 6916
O3 text: 6916
Os text: 7076

Thanks for reporting. I will look into this.

Revision history for this message
Zhenqiang Chen (zhenqiang-chen) said :
#2

Thanks for the report.

1. The -Os and -O3 size difference is due to inline policy difference in early inline pass.

2. For -O3, we should use offset mode. I will investigate this.

Revision history for this message
Best Zhenqiang Chen (zhenqiang-chen) said :
#3

For -O3/-O2, please try option -fno-schedule-insns.

Revision history for this message
Michael Steinberg (decimad) said :
#4

-fno-schedule-insns does the trick! I'm investigating for side-effects now.

Revision history for this message
Michael Steinberg (decimad) said :
#5

Thanks Zhenqiang Chen, that solved my question.