C++ temporary object storage

Asked by Jonathan Dumaresq

Hi,

We have observed that our stack usage (using -fstack-usage) is growing really fast when temporary c++object is needed. It's look like the compiler doesn't reuse those temporary object.

here a small example:

uint32_t * A(const std::string &a, const std::string &b)
{
 return 0;
}

void B()
{
 A("test", "test2");
// A("test", "test2");
// A("test", "test2");
// A("test", "test2");
}

The stak usage for B will increase by 16 bytes every call to function A.

When I look at the dissassembly, I see that the std::string object are constructed and destructed at every call of the A function. But the problem I see is that the same temporary object are not reused.

In that case, only 2 temporary object of std::string is needed.

Is it possible to optimize this feature ?

I have looked at different optimisation level and I see that the stack is growing on every call of A.

Regards

Jonathan

Question information

Language:
English Edit question
Status:
Answered
For:
GNU Arm Embedded Toolchain Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Jonathan Dumaresq (jdumaresq) said :
#1

I'm using 4.92014Q4 compiler.

Jonathan

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) said :
#2

Hi Jonathan,

I tried your example with the 4.9 2014Q4 toolchain release and see the same sub sp, #52 at the beginning no matter how many call to A when compiling with -O3 or even -O2. To compile I had to only declare A via a prototype otherwise GCC would optimize it to 0. Is there something I missed? Can you share with us the command line you used to compile as well as the assembly output you get?

Best regards.

Revision history for this message
Jonathan Dumaresq (jdumaresq) said :
#3

hi Thomas,

Here the information requested­ .

Here the full code source. I have changed it to simplify my test.

#include <cstdio>
#include <cstring>
#include <cstdint>
void C();
void D();
void E();

class myObject
{
private:
 char buf[8];
public:
 myObject(const char *p)
 {
  strcpy(buf, p);
 }

 const char *c_str() const
 {
  return buf;
 }

};

uint32_t * A(const myObject &a, const myObject &b);
uint32_t * B(const char *a, const char *b);

int main(void)
{
   C();
   D();
   while(1);
}

uint32_t * A(const myObject &a, const myObject &b)
{
 printf("%s%s", a.c_str(), b.c_str());
 return 0;
}

uint32_t * B(const char *a, const char *b)
{
 printf("%s%s", a, b);
 return 0;
}

void C()
{
 myObject t("t"), tt("tt");
 A(t, tt);
 A(t, tt);
}

void D()
{
 B("t", "tt");
 B("t", "tt");
}

void E()
{
 A("t", "tt");
 A("t", "tt");
 A("t", "tt");
 A("t", "tt");
}

The compiling line.

arm-none-eabi-g++ -mcpu=cortex-m3 -mthumb -Og -fsigned-char -ffunction-sections -fdata-sections -fno-inline-functions -fstack-usage -Wunused -Wuninitialized -Wall -Wextra -Wmissing-declarations -Wconversion -Wpointer-arith -Wpadded -Wshadow -Wlogical-op -Waggregate-return -Wfloat-equal -g3 -std=gnu++11 -fabi-version=0 -fno-exceptions -fno-rtti -fno-use-cxa-atexit -fno-threadsafe-statics -Wabi -Wctor-dtor-privacy -Wnoexcept -Wnon-virtual-dtor -Wstrict-null-sentinel -Wsign-promo -Wa,-adhlns="src/main.o.lst" -MMD -MP -MF"src/main.d" -MT"src/main.o" -c -o "src/main.o" "../src/main.cpp"

I have done some more test and its look like the reuse of the stack is only done when the optimisation is in -O2. and -O3.

I try all the flag that are activated when O2 is used and no one is produce the correct stack reuse. It's probably an internal optimisation that is use.

Do you think that this can be activated with a specific flag ?

Regards

Jonathan

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) said :
#4

Hi Jonathan,

You can compile with -Q -O2 --help=optimizers and then the same with -Og and diff the result. This will tell you all the difference between Og and O2 (the manual might not be up-to-date). My guess is that this optimization is due to -foptimize-sibling-calls.

Best regards.

Revision history for this message
Jonathan Dumaresq (jdumaresq) said :
#5

Hi,

I try the optimisation flag and this change nothing on the stack usage. In the curent code, the SP is reserving #64 bytes on E() method.

I have already did this for the diff with the -Q command. And no flags seam to reproduce the optimisation level -O2. Is it possible that -O2 add some optimizer that cannot be enabled with flags ?

Regards

Jonathan

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) said :
#6

Indeed, I tried myself and activating all pass from -Og doesn't change a thing. Yes some optimization are not part of a pass and can thus not be enabled manually. In this case it might also be that an optimization is *disabled* by -Og rather than *enabled* by -O2. It seems that your only choice here is to use -O2.

Best regards,

Thomas

Revision history for this message
Jonathan Dumaresq (jdumaresq) said :
#7

Hi,

I think for an embedded compiler this should be always enabled. Event with no optimisation at all. I checked and in C, the same problem is visible for small scope variable. The stack pointer is initialised to the total amount of variable this is used by a function, event if the variable is within a small scope.

Do you think that this can be enabled by default on this toolchain ?

Regards

Jonathan

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) said :
#8

Hi Jonathan,

Running a compiler without any optimization enabled is useful to have a fast development and debug cycle. In both these cases you will want the compilation to be as fast as possible as you will compile very often. There is also the case of running the compiler as a commit hook for instance. Therefore I don't think any optimization should be enabled by default (-O0).

Best regards.

Revision history for this message
Jonathan Dumaresq (jdumaresq) said :
#9

Hi,

That look logical to me that with -O0 should not do any optimisation. But if we can enable this feature with a flag will be very usefull.

In my situation, we use an RTOS and when we compile in debug mode, we need to have 3 times the stack defined. This is anoying for us.

First I think that using -fstack-reuse=reuse-level https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html should do this, but it doesn't look so.

Is this a bug ?

Regards

Jonathan

Can you help with this problem?

Provide an answer of your own, or ask Jonathan Dumaresq for more information if necessary.

To post a message you must log in.