Generate PIC for TEXT but absolute addresses for DATA/BSS

Asked by Antoine

Hello,

I am using a Cortex m4 CPU with a flash starting from 0x0 and RAM from 0x2000 0000.

I would like to have several firmware partitions in the flash (e.g. at 0x00000, 0x10000, 0x20000,...) where different versions of the same firmware are written (by jtag or user update), and let the bootloader choose the preferred partition to boot on it.

This would mean having the text section being position independent, but data/ram at a fixed position.

I have been trying many options ( -fpic -fPIC -pie -fPIE -fno-plt -msingle-pic-base -mpic-register=X / even not working -mpic-on v6) but impossible to achieve the right behavior.

The best would be to get rid of any indirection or GOT for data/bss, but I cannot even get correct addressing.
- with -fpiX all static/global variables are relative to PC, so code will point to 0x2000 0000, 0x2001 0000, 0x2002 0000... depending on the partition run.
- with -msingle-pic-base -mpic-register=X, global variables are OK but static still using PC relative.
- the solution working the best is actually to use no flags and compile for partition 0x0 : all calls are PC relative in thumb mode, except when using function pointers (=> BLX reg). I then have to OR these pointers on the fly with the upper part of the PC. I also have to apply the same workaround to all pointers to rodata like literal string, which is quite difficult to maintain.

I can see however that I am not the first with this kind of problems:
https://answers.launchpad.net/gcc-arm-embedded/+question/253272
https://answers.launchpad.net/gcc-arm-embedded/+question/236744
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00630.html

Is what I want to do not feasible currently, or am I missing some features? does a new feature enabling this would interest other people?

Global options I am using and GCC version:
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -mabi=aapcs
gcc version 6.2.1 20161205 (release) [ARM/embedded-6-branch revision 243739] (GNU Tools for ARM Embedded Processors)

Antoine

Question information

Language:
English Edit question
Status:
Expired
For:
GNU Arm Embedded Toolchain Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Leo Havmøller (leh-p) said :
#1

From your description there is no need to use position independence at all.
Please elaborate on why you think this is needed.

Revision history for this message
Antoine (acalando) said :
#2

Let me give more details.

If I do not use any PIC flags, and I compile my FW for a 0 based text segment, but I write it in partition at 0x10000:
- function calls seem OK since thumb mode (mainly?) use pc relative jumps
- use of function pointers are not working directly
- same problem with string literals or anything located in .rodata

For the two last cases, there is a workaround: to use a wrapper function which will convert the addresses, e.g. :

printf(wrapper("my string %d"), 123);

with wrapper() here returning (PC&0xFFFF0000)|(address&0x0000FFFF).

The problem is that this solution is quite complicated to use. I would need to test the FW in a partition different of 0, and for each bug check the disassembly to see if/where the wrapper is needed. And bugs may frequently go undetected if another FW with .rodata objects at the same addresses is written in partition 0. It is also possible that I discover new cases not working later while adding new functions putting a strain on different C features.

If I use any of -fpiX options, all the pointers to .text/.rodata sections will work fine... but, in this case, this is the addressing to .data/.bss which is not working anymore (I can expand on this if needed).

The problem here is that the -fpiX options have been implemented for Von Neumann architectures, with .text/.data/.bss going in the same piece of RAM, but this implementation is not making any sense for Harvard architectures where .text and .data/.bss are completely independent.

Revision history for this message
Leo Havmøller (leh-p) said :
#3

> I compile my FW for a 0 based text segment, but I write it in partition at 0x10000
If the code is to be located at 0x10000, then build it for 0x10000 - don't fight the toolchain ;-)

Revision history for this message
Antoine (acalando) said :
#4

The final location of the image is not known at compilation time: when updating it through UART, this is the running FW which chooses on which partition the new image will be flashed, depending on which partition is empty/broken/the oldest.

Revision history for this message
Leo Havmøller (leh-p) said :
#5

> The final location of the image is not known at compilation time: when updating it through UART, this is the running FW which chooses on which partition the new image will be flashed, depending on which partition is empty/broken/the oldest.
So build the firmware for both 0x10000, 0x20000 etc. and have the running firmware ask for the appropriate one.

Revision history for this message
Antoine (acalando) said :
#6

This discussion is going nowhere.

The point here is to discuss gcc support for generating position independent code for architectures with separate ROM and RAM (the usual case in embedded systems).

What I understand from your answers is that it is not possible with the current version.

So now what about future versions ? I could see that options like -mno-pic-data-is-text-relative has been removed, does it mean that gcc is going backward regarding features related to PIC for embedded and any patch will be refused?

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) said :
#7

Hi Antoine,

The toolchain support separate ROM and RAM just fine. All you need is specify a different LMA from the VMA in the linker script. That sets absolute address to be the one in VMA, the bootup code would then need to be aware not to use any absolute address (like pointers) since the bootup code needs to refer to the LMA prior to the data being copied into RAM.

Revision history for this message
Antoine (acalando) said :
#8

Hi Thomas,

Sorry, but this is not working on my side, and the problem is before the linker. Let's take an example:

-------armpic.c---------------------------------
int foo = 0;

void bar(void) { foo = 1; }

------compiled with-----------------------------
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -mabi=aapcs -fPIC -Os -c armpic.c

------objdump-----------------------------
00000000 <bar>:
   0: 4b03 ldr r3, [pc, #12] ; (10 <bar+0x10>)
   2: 4a04 ldr r2, [pc, #16] ; (14 <bar+0x14>)
   4: 447b add r3, pc
   6: 589b ldr r3, [r3, r2]
   8: 2201 movs r2, #1
   a: 601a str r2, [r3, #0]
   c: 4770 bx lr
   e: bf00 nop
  10: 00000008 .word 0x00000008
  14: 00000000 .word 0x00000000

We can see here that the address of foo is calculated by computing the the GOT address with PC + (offset at 10),
then using GOT + (offset at 14). But this obviously cannot work with different values of the PC: the offset which will be set by the linker will allow a PC around 0x0000xxxx to find the GOT around 0x2000yyyy, but for a PC around 0x0001xxxx, it will search for the GOT in 0x2001yyyy, and this make no sense here.

Note that if I add the options "-msingle-pic-base -mpic-register=r10" it will resolve the problem by using r10 as the GOT... except that this will not work for local variables ("static int ...") which would be still addressed as PC relative.

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) said :
#9

Hi Antoine,

I was talking in the general case. It's perfectly possible to have VMA != LMA. What the linker will do is account for the address once the data is in RAM in all computation instead of the address at startup.

Now as to your actual problem, I don't quite understand what is the issue with local variables. They should be access from the SP or FP, not from the PC. Can you give an example?

Revision history for this message
Antoine (acalando) said :
#10

Sorry, this was unclear, I actually meant "global variables with internal linkage" when writing "local".

But here is an example:
---armpic.c-----------------
static int foo = 0;
void set(int x) { foo = x; }
int get(void) { return foo; }

----compile-----------------
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -mabi=aapcs -fPIC -msingle-pic-base -mpic-register=r10 -Os -c armpic.c

----objdump (set() only, get() is almost the same)-----------------
00000000 <set>:
   0: 4b01 ldr r3, [pc, #4] ; (8 <set+0x8>)
   2: 447b add r3, pc
   4: 6018 str r0, [r3, #0]
   6: 4770 bx lr
   8: 00000002 .word 0x00000002

So here foo is simply addressed with PC relative offset.

The options "-msingle-pic-base -mpic-register=r10" are useless here, but for a global variable without the "static", it would allow to use r10 instead of GOT found by PC offset as explain above.

Another comment as I am not sure my explanations are clear: when checking options for the ARM compiler, I found -fropi and -fno-rwpi :
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0774g/sam1445439435970.html

I could not test them (I do not have this compiler), but it seems to be exactly the feature I am missing: use PIC addressing for .text, and normal addressing for .data/.bss (or optionally PIC, but PC independent).

Revision history for this message
Launchpad Janitor (janitor) said :
#11

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

Revision history for this message
Drew (drewcg) said :
#12

Antoine, did you ever find a solution to your issue? I am working on a project which requires the same FLASH structure, a bootloader downloads the image to partition A or partition B and then boots the system.

Revision history for this message
Antoine (acalando) said :
#13

Hey,

Well the solution explained in the thread was working for me more or less. I am now in travel, back to my pc in 10 days if you need more precise details.

Antoine

On September 7, 2018 4:43:26 PM GMT+03:00, Drew <email address hidden> wrote:
>Your question #585437 on GNU Arm Embedded Toolchain changed:
>https://answers.launchpad.net/gcc-arm-embedded/+question/585437
>
>Drew posted a new comment:
>Antoine, did you ever find a solution to your issue? I am working on
>a
>project which requires the same FLASH structure, a bootloader downloads
>the image to partition A or partition B and then boots the system.
>
>--
>You received this question notification because you asked the question.

Revision history for this message
Launchpad Janitor (janitor) said :
#14

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

Revision history for this message
Antoine (acalando) said :
#15

I am getting multiple requests via mail about this problem from people trying to achieve the same thing, so here is an answer I gave which might be useful. There has been no other progress on my side ; and I do not know today if latest gcc propose better solutions.
---
Well, I partly resolved the problem. I just compile my code without any special flag for gcc, and it works almost out of the box.

The problem is for some absolute address, for instance functions pointer or static strings. For this I need to use some special tricks to convert to the right adresses. This works ok and I can use 4 partitions in my fw and quickly write them in flash and run from them, but this is not really satisfying: one problem is you can only use debugger for partition 0, to avoid addresses mismatch and the other is that you often forget to use the right macro at the right moment and therefore waste time on stupid bugs. And of course there is a slight overhead in terms of memory and execution.

I guess your problem is you are using some absolute address somewhere (e.g. the thread start address) and you forgot to convert it first.

Below is the kind of macros I am using. The base idea is to get the current partition from the MSBs of the current PC, and to OR them with the absolute addresses compiled for partition 0/

Good luck,
Antoine

#define PART_FLASH_SIZE 0x80000
#define PART_COUNT 4
#define PART_SIZE (PART_FLASH_SIZE/PART_COUNT)
#define PART_MASK_SLOW (~(PART_SIZE - 1))
// Allows ARM optimization, as 0xFFFF0000 not fitting in an immediate
#define PART_MASK 0x000F0000

// Convert flash compilation address to flash runtime address
#define PART_ADDR0(addr) ((void*)(part_pc()|((uint32_t)(addr))))

// Convert any address to runtime address if in flash
#define PART_ADDR(addr) part_addr((void*)addr)

// Get partition address from runtime address
#define PART_BASE(addr) ((void*)(PART_MASK&(uint32_t)(addr)))

// Get partition address as in from PC
static inline uint32_t part_pc(void)
{
        uint32_t ret;
        asm ( "mov %0, pc\t\n"
                "and %0, %0, %1" : "=r" (ret) : "I" (PART_MASK));
        return ret;
}

static inline void *part_addr(void *addr_in)
{
        uint32_t addr = (uint32_t) addr_in;
        if (addr < PART_FLASH_SIZE)
                addr |= part_pc();
        return (void*) addr;
}

Revision history for this message
Robert Palmer (robert1356) said :
#16

Has anyone figured this out? I'm trying to do something VERY similar, but for a different purpose. I'm trying to do something similar to shared library. I have "plug-ins" that need to be used by the main embedded application. There are 5 slots for these plugins in FLASH, just like your application image slots. The issue that is causing hang-ups is the globals variables. I also just noticed that with the ARM Thumb that both the position-independent and non-position-independent seem to both use PC relative addressing. If you have solved this, please let me know.