char * stack variable not initialized at any optimization level

Asked by Roland King

Tracking down an issue on a cortex-m4 I've distilled down to the following code

#include <stdint.h>

struct UART
{
    __volatile uint32_t padding[200];
    __volatile uint32_t TXD;
};

#define NRF_UARTE0 ((__volatile__ struct UART*)0x40002000)

int main(void)
{
    char str[] = "hello\n";
    NRF_UARTE0->TXD = (uint32_t)(str);
}

The original code was somewhat longer but TXD is a pointer to a buffer to be sent out of a UART, the full code, after setting TXD, starts the UART and waits for completion.

Compiled with -Os (or -O1, O2, O3) like this

gcc-arm-none-eabi-4_9-2015q2/bin/arm-none-eabi-gcc -Os -pedantic -mcpu=cortex-m4 -mthumb -o main.o -c main.c

yields the following assembler

00000000 <main>:
   0: 4b02 ldr r3, [pc, #8] ; (c <main+0xc>)
   2: b082 sub sp, #8
   4: f8c3 d320 str.w sp, [r3, #800] ; 0x320
   8: b002 add sp, #8
   a: 4770 bx lr
   c: 40002000 .word 0x40002000

8 bytes have been allocated on the stack for 'str' and then the sp is written to r3+0x320 which is the TXD field. However the stack is never initialized with the 'hello' string. In the full code the UART is then started and sends random garbage from the uninitialized memory.

The .o file doesn't even contain the string 'hello', it's been completely removed.

It's rather gross to initialize an on-stack string like that, but as far as I can tell it's valid C, the memory is allocated on stack, the address is written to TXD, the compiler has no way of knowing what that write does and so should initialize the memory correctly as the code specifies before using it.

if you make str volatile, it initializes properly, which makes no more sense. There are of course other workarounds like using a char* but I think this code as-written should work.

Bug I should file, or my imperfect understanding of C?

Question information

Language:
English Edit question
Status:
Solved
For:
GNU Arm Embedded Toolchain Edit question
Assignee:
No assignee Edit question
Solved by:
Tejas Belagod
Solved:
Last query:
Last reply:
Revision history for this message
Tejas Belagod (belagod-tejas) said :
#1

Hi Roland, sorry for the delay in getting back to you on this. When your code is compiled, the compiler has no way to figure out that the address you're writing to is a peripheral that is memory mapped. It does not know what the functionality of that particular address is - all it sees is that an address is being written with the address of a local object ('str' in this case) that will go out of scope when main returns. Therefore, as long as address 0x40002000 has no use in 'main' it doesn't matter what str points to. When main returns, 'str' will go out of scope anyway and its stack will effectively have garbage - and that's why the compiler does not fill up the stack space with "hello".

OTOH, when you use 0x40002000 in main, now it knows that the address is used before main goes out of scope and it fills the stack with 'hello'. For eg. if you add this line to the end of main, you will see 'hello' appear in the object.

   bar ((__volatile__ struct UART*)0x40002000);

$ arm-none-eabi-gcc -c -Os -pedantic -mcpu=cortex-m4 -mthumb -o uart-49.o uart.c
$ strings uart-49.o
@hello
..

Revision history for this message
Roland King (rols-x) said :
#2

Thanks for the answer - that's a persuasive one however I'm still not convinced it's right.

First off, although the compiler doesn't know 0x40002000 is a hardware address neither does it know that it's not. It's been asked to set the address of an initialized block of memory into it, it's passing the address, but it's not initializing it. I could understand if it didn't set the memory address at all and removed the code entirely, if it uses the address of the data, it should initialize it first.

Secondly, just changing the definition of the variable from char[] to volatile char [] changes the behaviour and the memory is initialized properly. If the compiler was deciding the contents were unused in one case, just adding volatile shouldn't affect that choice.

It does also allocate the memory on the stack for the variable, if you make the string longer you'll see the sub sp #8 change to sub sp #12, #16 etc. It allocates enough memory for the data, but doesn't init it. If it considers that pointer unused why allocate it at all? If it allocates it incase something should write to it, equally it should initialize it incase something reads from it, which in this case is exactly what happens. Just to be clear what the rest of the real code looked like, it was like this

    NRF_UARTE0->TXD = (uint32_t)(str);
    NRF_UARTE0->START_TX=1
    while( NRF_UARTE0->END_TX == 0 ) {}

where START_TX and END_TX are other registers which start the UART transmission and signal its completion. This clocks out the contents of str and in this case clocked out uninitialized garbage.

This issue actually showed up when a project built with Keil failed optimized with gcc and was eventually tracked down to this distilled case. I ran it through Clang as well which also leaves the initialization intact. Not proof of anything, those compilers may just be extra conservative. I don't have a non-ARM gcc to see what it does in say x86.

I can believe the compiler has decided the initialization is unnecessary because the data is unused, but it's a wrong decision which causes incorrect execution. I believe the code is valid and should execute correctly optimized or not, the fact that the address of that data is written to somewhere, and the compiler has no idea what's there, memory, a register, a piece of hardware, should mean that it cannot make the determination the data is unused, because it cannot know what it just wrote it to, and must therefore both allocate space for it and initialize it.

Revision history for this message
Best Tejas Belagod (belagod-tejas) said :
#3

I had to dive into C90 for this one.

Regarding your first point about - though you do set the address (TXD = str), you don't explicitly access it (no read /write). Now the C standard 6.7.3 (para 6) says that

"An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously.114) What constitutes an access to an object that has volatile-qualified type is implementation-defined."

Because there is no explicit access to str, the compiler construes(implementation-defined) this as a write and so just allocates the buffer on the stack and hopes some asynchronous thread will write to it. This could explain why there is stack space allocated, but not filled. Though I will try to verify this by delving deeper into gcc's implementation-defintion for such cases and this is indeed the case.

Regarding your second point. I think your code violates the standard here because of 6.7.3(para 5)

"If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined. If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type, the behavior is undefined.113)"

So, if you're assigning volatile-qualified 'TXD' with unqualified 'str' and then use it as an lvalue, the program behavior is undefined.

With footnote 113 saying:

"113) This applies to those objects that behave as if they were defined with qualified types, even if they are never actually defined as objects in the program (such as an object at a memory-mapped input/output address)."

which applies to the constant address 0x40002000 which is not really an object in the program.

Revision history for this message
Joey Ye (jinyun-ye) said :
#4

Roland,

Typical UART designs we've seen work in a way that char is fed into UART one by one. For the UART you are using, may we know which design it is from? Also can it be fed char one by one too?

Regards,
Joey

Revision history for this message
Roland King (rols-x) said :
#5

Tejas, apologies for making you dive into the C90 spec and thank you for both answers. I'm not surprised that this falls under the category of implementation defined or even undefined, that's why I asked the question. The code in question is an edge case, and rather a horrid piece of test code, in any sane case the compiler would be doing the right thing here and saving space.

My point 2 was bad. I was trying various 'small changes' to see if I could discern what caused the compiler to elide, or not elide, the string init. That was one of the changes which caused it not to be dropped. There were others, including, if I remember correctly, just adding a piece of assembler at the end of the function which put 0 into r0. None of that however is very helpful and just confuses the whole issue so, sorry for that.

Joey. The chip in this case is the new Nordic nRF52. Yes it's quite usual to have a single or double buffered register you put one character at a time in, the previous incarnation, the nRF51 has exactly that. The '52' now has DMA on it so you can just point the register at a chunk of memory, tell it how long it is, start the task and it will automatically churn it out and tell you when it's done. My experience is that any of the chips which use DMA have registers which take an address in memory and take data from, or put data to, that chunk of memory. Usually they would of course be long-lived heap-allocated buffers, not a piece of stack memory, this code was distilled down from someone just testing where they made a local string and tried to write it and were surprised when, optimized and compiled with gcc, they got random data.

You can still use the old one-character-in-a-register way of sending but DMA is much better, and a long-awaited feature and the old char-at-a-time method is deprecated.

The nRF52 uses DMA for lots of peripherals now, so there's a lot of writing memory addresses to registers where there wasn't before. That's why it was worth understanding what happens under optimization and why as someone else is probably going to do the same thing and ask the same question.

I did search for something I could annotate the register with which is better than 'volatile' which hinted it was a hardware register, or even better a hardware register which pointed to memory, which might tell the compiler to be ultra-extra-conservative, but I didn't find one.

Revision history for this message
Roland King (rols-x) said :
#6

Thanks Tejas Belagod, that solved my question.