[cortex-m7][c++] Optimisation breaks access to the packed structure in inlined constructor

Asked by Michał Fita

Host platform: Windows 7
Target platform: Cortex-M7
Affected GCC versions: 5.3.1, 5.4.1 (this one used for examples)

Compiler parameters:
-ggdb -ffunction-sections -fdata-sections -std=gnu++14 -fno-rtti -specs=nosys.specs -mcpu=cortex-m7 -mthumb -Wall -Wextra -Werror -Wno-unused-function -Wno-maybe-uninitialized -Wno-unused-parameter
 (-Og and -O0 used appropriately)
Linker flags (indirect through gcc; doesn't matter probably):
-Wl,-gc-sections -specs=nosys.specs -mcpu=cortex-m7 -mthumb

We have C++ project targeted to Cortex-M7 and we faced problem with the code compiled at the -Og optimisation level. I tried to narrow down the particular optimisation, but enabling and disabling (binary search) some sets of particular optimisation that differ -Og from -O0, but with no luck. Further digging shown that if we disable structure packing, the problem doesn't appear.

We have very long packed structure:
#define PACKED __attribute__((packed))
typedef struct _DomainModel {
    const struct pb_rtti *__rtti;
    pb_size_t __parent_offset;
    uint8_t __child_index;
    uint8_t __isSetOrDirtyMask[8]; // bit mask of fields that have been set/changed
// [...]
    pb_size_t m_NodeIdentifiers_count;
    uint16_t m_NodeIdentifiers[5];
// [...]
} PACKED DomainModel;

Whole FieldArray template class for clarity (after all those years Lauchpad still doesn't offer any form of formatting):
typedef pb_size_t uint16_t;

template <typename T_Field>
class FieldArray
{
   const pb_size_t &m_count;
   T_Field *m_array;

public:

   FieldArray(const pb_size_t &count, T_Field *data) :
      m_count(count),
      m_array(data)
   {}

   size_t size() const
   {
      return m_count;
   }

   const T_Field& operator[](size_t index) const
   {
      return m_array[index];
   }

   class ConstIterator;
   const ConstIterator begin() const { return ConstIterator(*this); }
   const ConstIterator end() const { return ConstIterator(*this, m_count); }

   /*
    * Iterator class
    */
   class ConstIterator
   {
      const FieldArray<T_Field> &m_fieldArray;
      mutable pb_size_t m_currentIndex;

   public:
      explicit ConstIterator(const FieldArray<T_Field> &fieldArray, pb_size_t startingIndex = 0) :
         m_fieldArray(fieldArray),
         m_currentIndex(startingIndex)
      {
      }

      bool operator==(const ConstIterator &rhs) const { return m_currentIndex == rhs.m_currentIndex; }
      bool operator!=(const ConstIterator &rhs) const { return !(operator==(rhs)); }
      const T_Field operator*() const { return m_fieldArray[m_currentIndex]; }
      const ConstIterator& operator++() const { m_currentIndex++; return *this; }
      const ConstIterator operator++(int) const
      {
         ConstIterator temp = *this;
         ++(*this);
         return temp;
      }
   };
};

This is the problematic method that is optimised wrong:
   const FieldArray<uint16_t> DomainModelAccessor::GetNodeIdentifiers() const
   {
       return FieldArray<uint16_t>(m_pData->m_NodeIdentifiers_count, m_pData->m_NodeIdentifiers);
   }

Correct disassembly for unpacked structure:
   const FieldArray<uint16_t> DomainModelAccessor::GetNodeIdentifiers() const
   {
       return FieldArray<uint16_t>(m_pData->m_NodeIdentifiers_count, m_pData->m_NodeIdentifiers);
0x004201ac ldr r3, [r1, #4]
0x004201ae add.w r1, r3, #28
0x004201b2 adds r3, #30
0x004201b4 str r1, [r0, #0]
0x004201b6 str r3, [r0, #4]
   }
0x004201b8 bx lr
0x004201ba nop

Incorrect disassembly for packed structure:
   const FieldArray<uint16_t> DomainModelAccessor::GetNodeIdentifiers() const
   {
0x004201ac sub sp, #8
       return FieldArray<uint16_t>(m_pData->m_NodeIdentifiers_count, m_pData->m_NodeIdentifiers);
0x004201ae ldr r3, [r1, #4]
0x004201b0 adds r3, #26
0x004201b2 add.w r1, sp, #6
0x004201b6 str r1, [r0, #0]
0x004201b8 str r3, [r0, #4]
   }
0x004201ba add sp, #8
0x004201bc bx lr
0x004201be nop

There is stack pointed involved and the indirect arithmetic seem to be wrong for m_pData->m_NodeIdenfiers_count. The m_pData is pointer to the DomainModel in the memory (for the sake of example static global variable).

This is the code that calls GetNodeIdentifiers(), seem to be the same in both cases:
   volatile uint8_t rtsCount = 0;
0x00420422 strb.w r4, [sp, #31]
   DomainModelAccessor modelAccessor;
0x00420426 add r0, sp, #20
0x00420428 bl 0x42019c <DomainModelAccessor::DomainModelAccessor()>

   auto identifiers = modelAccessor.GetNodeIdentifiers();
0x0042042c add r1, sp, #20
0x0042042e add r0, sp, #12
0x00420430 bl 0x4201ac <DomainModelAccessor::GetNodeIdentifiers() const>
0x00420434 strh.w r4, [sp, #8]
0x00420438 ldr r3, [sp, #12]
0x0042043a ldrh r3, [r3, #0]
0x0042043c ldrh.w r2, [sp, #8]

   for (auto nodeId = identifiers.begin(); nodeId != identifiers.end(); ++nodeId)
0x00420440 cmp r2, r3
0x00420442 beq.n 0x420458 <main()+68>
   {
      rtsCount++;
0x00420444 ldrb.w r3, [sp, #31]
0x00420448 adds r3, #1
0x0042044a uxtb r3, r3
0x0042044c strb.w r3, [sp, #31]
0x00420450 adds r2, #1
0x00420452 strh.w r2, [sp, #8]
0x00420456 b.n 0x420438 <main()+36>
   }

The wrong value here is R3 holding the m_count field of identifiers variable (in my case it's 66 instead of 0). When we faced this problem for the first time the code optimised code with packed structures iterated couple thousand times leading to weird results (for the sake of an example everything else than addition was removed from the loop).

I believe it's a bug related to overcome limitations of unaligned access, however I cannot tell if it's ARM specific or an upstream problem. I can raise bugs in both cases with full working examples (I can't attach files to questions) if you confirm.

Question information

Language:
English Edit question
Status:
Answered
For:
GNU Arm Embedded Toolchain Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Andre Vieira (andre-simoesdiasvieira) said :
#1

Hi Michal,

I couldnt quite get your examples to work, so I tried compiling some getters for your packed data structure, using the same options. The resulting code seems to access the member fields just fine. Could you please post a reduced, but working example? Feel free to attach them as files.

Kind Regards,
Andre

Revision history for this message
Michał Fita (michal.fita) said :
#2

Questions doesn't let me attach any files (I tried to attach something to
the e-mail reply, let's see it works). The trick to get this result in
assembler is to have inlined constructor accepting a reference to this
structure member. I didn't found any other way to reproduce similar result.

If the file went in successfully, this is full project in VisualGDB/VS2015
I was using to diagnose the problem (for GCC 5.4.1).

Regards / Pozdrawiam,
Michał Fita

2016-08-22 14:17 GMT+01:00 Andre Vieira <
<email address hidden>>:

> Your question #345145 on GNU ARM Embedded Toolchain changed:
> https://answers.launchpad.net/gcc-arm-embedded/+question/345145
>
> Status: Open => Needs information
>
> Andre Vieira requested more information:
> Hi Michal,
>
> I couldnt quite get your examples to work, so I tried compiling some
> getters for your packed data structure, using the same options. The
> resulting code seems to access the member fields just fine. Could you
> please post a reduced, but working example? Feel free to attach them as
> files.
>
> Kind Regards,
> Andre
>
> --
> To answer this request for more information, you can either reply to
> this email or enter your reply at the following page:
> https://answers.launchpad.net/gcc-arm-embedded/+question/345145
>
> You received this question notification because you asked the question.
>

Revision history for this message
Michał Fita (michal.fita) said :
#3

Attachments in e-mails aren't visible here so they probably land in /dev/null.

So, the link to the downloadable archive containing the project: https://drive.google.com/file/d/0B5Z9jSVYjczFellTNENwQ1c0QzA/view?usp=sharing

Revision history for this message
Andre Vieira (andre-simoesdiasvieira) said :
#4

Hi Michal,

I am now able to reproduce the generated code you refer to and it does seem weird. I can see m_pData->m_NodeIdentifiers is correctly addressed, 26 bytes into the struct, but the stackpointer operations for m_pData->m_NodeIdentifiers_count do seem weird...

I will try to look into this.

Cheers,
Andre

Revision history for this message
Michał Fita (michal.fita) said :
#5

Hi Andre,

Thanks. I'm glad you've managed to reproduce the problem. It hit us quite badly fortunately caught before started doing more harm. Now we have structure packing disabled, but it requires more memory which is a precious resource in our project. Surprisingly -Og was enough to trigger wrong code generation.

Please keep me updated on progress of your research. Thank you.

Cheers,
Michał

Revision history for this message
Andre Vieira (andre-simoesdiasvieira) said :
#6

Hi Michal,

I wasnt quite able to identify the problem myself, so I reduced the problem and opened a bugzilla ticket for it: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77388

The problem shows up with -O1 for any member type larger than a 'char'.

Cheers,
Andre

Revision history for this message
Michał Fita (michal.fita) said :
#7

Regarding Richard's explanation Am I correctly understanding the alignment mismatch here is caused by the alignment assumption related to the type of the A::A() argument, like every reference to type bigger than char is word aligned?

If this can be detected as a problem emission of a warning would be nice to have. Is there a way to express our intention so that compiler would handle this scenario with correct addresses?

I'm not happy compiler silently allows that, what works in unit tests in MSVC++ and then fails on target in G++.

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) said :
#8

Hi Michal,

Normally, a short should have 2byte alignment and an int 4byte alignment. When using the packed attribute, you are telling the compiler it can throw away that alignment requirement to save as much space as possible by avoiding padding. In the reduced example Andre posted, a reference to a short from a packed structure is passed to a function that expects a simple reference to short and thus expects 2byte alignment. Because the structure is packed, the reference you pass might not be 2byte aligned and thus the compiler creates a temporary that respect the condition.

In doing so, you end up extending the lifetime of the temporary for as long as the object A is alive which is undefined behavior.

I hope my explanation was clear enough.

Best regards.

Can you help with this problem?

Provide an answer of your own, or ask Michał Fita for more information if necessary.

To post a message you must log in.