Bad BLX instruction on Cortex M4 with libm
I'm on a SAM4l Cortex M4 (armv7-m) using powf() from libm and I'm hitting a bad instruction.
The partial disassemble is:
...
136de: e9cd 0106 strd r0, r1, [sp, #24]
136e2: d133 bne.n 1374c <powf+0x214>
136e4: f001 ed3c blx 15160 <____errno_veneer>
136e8: 2321 movs r3, #33 ; 0x21
...
And disassembly of with the code at ___errno_venner:
00015160 <____errno_veneer>:
15160: e51ff004 ldr pc, [pc, #-4] ; 15164 <____errno_
15164: 20001c20 .word 0x20001c20
Running a BLX to a label seems to be the issue here and causes a fault due to unreadable instruction.
The real question, which looks to be the same fault for the question here (https:/
I'd appreciate any insight you have.
Thanks
Branden
Question information
- Language:
- English Edit question
- Status:
- Answered
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Revision history for this message
|
#1 |
Hi Branden,
Can you post the command line used to compile and link and add a '-v' to it and post the output here?
Cheers,
Andre
Revision history for this message
|
#2 |
Hi Andre,
The command line used is:
```
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -T/home/
```
The output after adding `-v` is:
```Using built-in specs.
COLLECT_
COLLECT_
Target: arm-none-eabi
Configured with: /build/
Thread model: single
gcc version 5.3.0 (Arch Repository)
COMPILER_
LIBRARY_
COLLECT_
/usr/lib/
```
Cheers,
Amit (Branden and I are collaborating on the same codebase)
Revision history for this message
|
#3 |
Ok so I am not familiar with Arch Linux' distribution of the arm-none-eabi toolchain and I cant reproduce this with our 2015Q1 release. Could you try downloading our Linux tarball and use that instead?
Also out of interested, could you perhaps do the following, this might help in finding out where the distribution you are using is going wrong:
Locate the powf being used by adding '-Wl,-Map=map' to your command line. If look in 'map' for '(powf), the line above it will show where the libm.a that was used was taken from and which object file in it contained powf.
Im suspecting it will have been: '/usr/lib/
If thats the case then do the following:
$cd /tmp/something
$arm-none-eabi-ar x /usr/lib/
$arm-none-
Then inspect dump.D to make sure powf calls __errno using bl and not blx. If it uses blx then Im guessing the multilib was not correctly configured/built in this distribution or a different version might have been used.
There is also the possibility that the linker will have messed up the relocation, could you also post the output of:
$arm-none-eabi-ld --version
Revision history for this message
|
#4 |
Thanks for the quick response.
Yes, sorry, I posted the output from my box (where I just used the Arch Linux packages). Other collaborators (including Branden, I think?) are using the binaries from here. Arch builds arm-none-eabi more or less from source unmodified, but I'll try with the binaries you provide ASAP for consistency.
Meanwhile, I followed your instructions:
* The map file shows that it's getting `powf` from /usr/arm-
* I dumped the contents of lib_a-wf_pow.o as you suggested and all calls to __errno are done with a BL, _not_ a BLX instruction.
* My arm-none-eabi --version: GNU ld (GNU Binutils) 2.25.1
Amit
Revision history for this message
|
#5 |
Alright, I'm here on Ubuntu running arm-none-eabi-gcc version 4.9.3 20150529 and arm-none-eabi-ld 2.24.0.20150921. The command line flags are the same as Amit posted above.
```
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -T/home/
Using built-in specs.
COLLECT_
COLLECT_
Target: arm-none-eabi
Configured with: /home/build/
Thread model: single
gcc version 4.9.3 20150529 (release) [ARM/embedded-
COMPILER_
LIBRARY_
COLLECT_
/home/
```
Creating the Map file and searching through it I find powf was included from:
`/home/
I also confirm that a dump of lib_a-wf_pow.o shows `bl 0 <__errno>` rather than the blx to errno_veneer as above.
Revision history for this message
|
#6 |
So yeh it seems the multilib was configured correctly, the GNU ld version I have here however is slightly higher: 2.25.90.
Let me know what the results were with the tarball.
Cheers,
Andre
Revision history for this message
|
#7 |
Alright, I updated to the tarball from the homepage, arm-non-eabi-gcc 5.2.1 20151202 and arm-none-eabi-ld 2.25.90.20151217. The problem is still the same.
```
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -T/home/
Using built-in specs.
COLLECT_
COLLECT_
Target: arm-none-eabi
Configured with: /home/build/
Thread model: single
gcc version 5.2.1 20151202 (release) [ARM/embedded-
COMPILER_
LIBRARY_
COLLECT_
/home/
```
Searching the map file shows I am now taking powf from:
`/home/
Revision history for this message
|
#8 |
Quick update...
After playing around some with a bare-bones reproduction of the bug, it seems to be related to the fact that we declare `__errno` as `__attribute__ ((weak)) int __errno`.
1. When I get rid of the weak attribute, the compiler correctly outputs BL instructions.
2. That definition for __errno is incorrect anyway, since it's meant to be a function.
3. When re-doing it as a function, the weak attribute has no negative affect.
I'm still not sure why the result of defining errno as an integer (instead of a function) was that the compiler spit out incorrect instructions.
Amit
Revision history for this message
|
#9 |
Andre,
So our code seems more or less fixed right now (more or less because we're not 100% sure what the body of the __errno function should be, but that's not directly related to this question).
It seems like the actual behavior is still a bug of some sort, at least in the sense that output is an odd "error message" for _our_ bug :)
I don't know if it's helpful, but this is my guess about what might be going on:
We defined __errno as an int, so `&__errno` gets compiled to a pointer with an even memory location. So, when ____errno_veneer is linked, the toolchain (linker? compiler?) thinks the __errno "function" is an ARM (not thumb) function (because thumb functions are referenced by odd pointers). That assumption gets propogated up and we end up with a BLX instruction.
FWIW, the actual body of __errno_veneer also contained a non-thumb instruction (some sort of 32-bit LDR instruction).
Amit
Revision history for this message
|
#10 |
Hi Amit,
This sounds like a bug we had seen before in a previous version of Binutils. I believed this was fixed in our 5-2015-q4-major , could you possibly try to reproduce it using this latest release and see if it still shows that behavior with the sources that were triggering the blx instruction?
Cheers,
Andre
Revision history for this message
|
#11 |
Andre
I tried 5-2015-q4-major and posted the results in response #7 above. The error still exists. Was there some other information you want me to get with that version?
Thanks
Branden
Revision history for this message
|
#12 |
Hi Amit,
Looking at ld's code the scenario you describe could indeed happen. When the linker process a relocation for a call from Thumb code to ARM code (based on the least significant bit value), it changes the bl in blx. Thumb only targets are not considered when doing this because a call to ARM mode would fail anyway.
Regarding the wrong instruction in __errno_veneer, please open a new question if this comes from a library provided by this toolchain and we will look into it.
Best regards.
Can you help with this problem?
Provide an answer of your own, or ask Branden Ghena for more information if necessary.