float to double conversion in sqrtf

Asked by benjamin

Hello,

while I was trying to optimize my code for a Cortex-M4 MCU to use only float variables to take advantage of the FPU, I noticed that still float to/from double conversion and double operations functions were still used. So I investigated a little bit with objdump and I found that sqrtf was calling __aeabi_f2d somewhere in it's code.

Is there a reason to use doubles inside a function that takes and returns a float? If it's not necessary, it's time and space consuming. And why not using the VSQRT.F32 instruction on Cortex-M4 to calculate the square root? In my program, the FPU is used for all the common operations (add, multiply, etc) except for the square root so I had to implement a C/ASM function to do it with the FPU.

Question information

Language:
English Edit question
Status:
Solved
For:
GNU Arm Embedded Toolchain Edit question
Assignee:
No assignee Edit question
Solved by:
Joey Ye
Solved:
Last query:
Last reply:
Revision history for this message
Terry Guo (terry.guo) said :
#1

I can reproduce what you reported. The sqrt takes double type as argument, that's why there is float/double conversion functions. The sqrtf takes float type. But there is no fpu instruction even when use sqrtf. I am looking into this.

Revision history for this message
Terry Guo (terry.guo) said :
#3

I am a little bit wrong about "But there is no fpu instruction even when use sqrtf". Indeed there is fsqrts instruction when you call sqrtf rather than sqrt. You may also notice that there is still call to library function sqrtf in code after the fsqrts instruction. This is to ensure the final code is IEEE compatible. To be IEEE compatible, one should check the fpu status register and set errno if there is problem after execution of fsqrts instruction. That's why we call lib function sqrtf.
 If everything is ok, the lib function won't be called. To not generate such call, you need option like "-fno-math-errno" or "-ffast-math".

Some conclusions are:

1). lib function sqrt takes double type arguments and return double type result.
2). lib function sqrtf takes float type arguments and return float type result.
3). those lib functions strictly follow the IEEE and maintain consistent behavior in different platforms.
4). if there is math error during the operation, the ERRNO should be set to be IEEE compatible. The lib function will do it while the fsqrts won't do this.

Revision history for this message
benjamin (blackswords) said :
#4

thanks you for investigating this.

Adding a call to sqrtf in my code increases the program size by 2520 bytes (even with -ffast-math). It's huge when you know that the FPU can do this operation by its own.

In fact, calling sqrtf will use at least these functions (I didn't check the all arm-none-eabi-nm output) :
__aeabi_d2f
 __truncdfsf2
__aeabi_ddiv
__divdf3
__aeabi_dmul
__muldf3
 __adddf3
__aeabi_dadd
__aeabi_dsub
__subdf3

I don't understand is why doubles are used inside sqrtf which deals with floats.

The C code below uses only 54 bytes of flash memory with no call to other functions and no float/double conversions :

float vsqrtf(float op1) {
 if(op1 <= 0.f)
  return 0.f;

 float result;
 __ASM volatile ("vsqrt.f32 %0, %1" : "=w" (result) : "w" (op1) );
 return (result);
}

I don't know how to manage errno but I don't think it will use much more space.

Revision history for this message
Terry Guo (terry.guo) said :
#5

I am kind of confused. If you compile below small case:

terguo01@terry-pc01:tmp$ /work/terguo01/tools/gcc-arm-none-eabi-4_8-2014q1/bin/arm-none-eabi-gcc -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -O2 -S y.c -ffast-math
terguo01@terry-pc01:tmp$ vi y.s
terguo01@terry-pc01:tmp$ cat y.c
#include<math.h>

float
mysqrt (float f)
{
  return sqrtf(f);
}

The generated assembly code will be:
mysqrt:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        fsqrts s0, s0
        bx lr
        .size mysqrt, .-mysqrt

The fsqrts instruction will be used and no need of extra libraries. What kind of options are you using?

Revision history for this message
benjamin (blackswords) said :
#6

this is the list of options I have for GCC (quite the same for G++) :

-O0 -g3 -Wall -ffunction-sections -fdata-sections -Wall -std=gnu99 -Wa,-adhlns="$@.lst" -c -fmessage-length=0 -mcpu=cortex-m4 -mthumb -mfloat-abi=softfp -mfpu=fpv4-sp-d16 -g -ggdb

I don't use optimizations because sometimes weird behaviors happen if I do, and I don't know why (I'm really not an expert of compilers nor assembly)

Revision history for this message
benjamin (blackswords) said :
#7

I just did a quick test and in fact, optimizations need to be turned to use the FPU instruction. The exact same code without optimizations call the sqrtf function in the library

Revision history for this message
benjamin (blackswords) said :
#8

I noticed the same behavior (float/double conversions) in expf function. So a call to expf will add these functions (and maybe some others) :

__aeabi_d2f
__truncdfsf2
__adddf3
__aeabi_dadd
__aeabi_dsub
__subdf3

Since the exponential can't be calculated with the FPU directly, the library functions has be to called and I am okay with that but, why uses doubles inside single precision functions?

Maybe on a Cortex-M4 is not a big deal considering the memory available but on a smaller MCU (Cortex-M0 for example), adding all these unnecessary functions will consume a lot of useful memory.

Is there a good reason to use doubles inside sqrtf and expf? Maybe there is a reason for that but I would like to know it

Revision history for this message
Best Joey Ye (jinyun-ye) said :
#9

Benjamin,

The reason why sqrtf uses f2d, as Terry explained, it is to handle exceptions. The implementation in newlib convert input single to double and pass it to a common utility function shared by single and double. For more detail please checkout newlib source code at: https://sourceware.org/cgi-bin/cvsweb.cgi/~checkout~/src/newlib/libm/math/wf_sqrt.c?rev=1.2&content-type=text/plain&cvsroot=src

I understand that pull-in the whole bunch of stuff increase code size dramatically. I also noticed that you stay with -O0, probably to keep debug-ability. My recommendation will be to use -Og, which not only uses sqrt.f32 instruction, but also keep the program debugable as much as possible.

Hope it helpful.

- Joey

Revision history for this message
benjamin (blackswords) said :
#10

Okay, thanks you for the clear explanation, the link and the optimization advice.

I was very instructive

Revision history for this message
benjamin (blackswords) said :
#11

Thanks Joey Ye, that solved my question.

Revision history for this message
benjamin (blackswords) said :
#12

Well, just to know, is there a solution to skip the error handling part without recompiling newlib?

Revision history for this message
Joey Ye (jinyun-ye) said :
#13

As it is in the same function. I cannot think of a way to skip it without recompilation.

Revision history for this message
benjamin (blackswords) said :
#14

Ok, and is it complex to recompile newlib?

Revision history for this message
Joey Ye (jinyun-ye) said :
#15

Please follow How-to-build-tools.pdf in the release website.