different compiler output on windows and linux when optimizing for size

Asked by Christian Frey on 2019-10-09

Hello everyone

I was getting different output files when compiling on linux vs windows. In the end I was able to narrow it down to a single function, that apparently produces different output on windows and linux if I optimize for size (-Os).

I checked with version 8.3.1 of the toolchain:
arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 8-2019-q3-update) 8.3.1 20190703 (release) [gcc-8-branch revision 273027]

The code for the function in question is a routine to multiply 128bit by 32bit values (it's not written by me, but at first glance the implementation should work):
// begin testcode
#include <stdint.h>

typedef union psp_128_bit_union
{
   uint64_t LLW[2];
   uint32_t LW[4];
   uint16_t W[8];
   uint8_t B[16];
} PSP_128_BIT_UNION, * PSP_128_BIT_UNION_PTR;

uint32_t _psp_mul_128_by_32
   (
      PSP_128_BIT_UNION_PTR m_ptr,
      uint32_t mul,
      PSP_128_BIT_UNION_PTR r_ptr

   )
{ /* Body */
   PSP_128_BIT_UNION tmp;
   uint64_t w,r;
   uint32_t w0;
   unsigned int i;

   tmp.LLW[0] = 0;
   r = 0;
   if (!mul || (!m_ptr->LLW[0] && !m_ptr->LLW[1])) {
      tmp.LLW[1] = 0;
   } else if (mul == 1) {
      *r_ptr = *m_ptr;
      return r;
   } else {
      for ( i = 0; i < 3; i++ ) {
         w = (uint64_t)mul * (uint64_t)m_ptr->LW[i];
         w0 = (uint32_t)w;
         tmp.LW[i] += w0;
         tmp.LW[i+1] = (w >> 32) + (tmp.LW[i] < w0);
      } /* Endfor */

      w = (uint64_t)mul * (uint64_t)m_ptr->LW[3];
      w0 = (uint32_t)w;
      tmp.LW[3] += w0;
      r = (w >> 32) + (tmp.LW[3] < w0);
   } /* Endif */

   *r_ptr = tmp;
   return r;
} /* Endbody */
// end testcode

The source-file was built then disassembled with the following commands:
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -Os -g -Wall -Wextra -pedantic -c test.c -o test-Os.o
arm-none-eabi-objdump -d test-Os.o > test-Os.lss

The outputs are slightly different, although both seem to be correct. The difference is:
on windows:
  5e: 6023 str r3, [r4, #0]
  60: bf2c ite cs
  62: 2301 movcs r3, #1
  64: 2300 movcc r3, #0
  66: 444b add r3, r9
  68: 4295 cmp r5, r2
  6a: f844 3f04 str.w r3, [r4, #4]!

on linux:
  5e: f844 3b04 str.w r3, [r4], #4
  62: bf2c ite cs
  64: 2301 movcs r3, #1
  66: 2300 movcc r3, #0
  68: 444b add r3, r9
  6a: 4295 cmp r5, r2
  6c: 6023 str r3, [r4, #0]

Since it was mentioned here before that different output on different platforms is considered a bug, I hope that someone could either confirm my findings or point out a mistake a might have made.

Regards,
Chris

Question information

Language:
English Edit question
Status:
Solved
For:
GNU Arm Embedded Toolchain Edit question
Assignee:
No assignee Edit question
Solved by:
Joey Ye
Solved:
2019-10-10
Last query:
2019-10-10
Last reply:
2019-10-10
Best Joey Ye (jinyun-ye) said : #1

Christian,

Thanks for report this issue. I just confirmed it an issue in GCC8 release. The coming GCC9 release should have it fixed.

Thanks,
Joey

Christian Frey (freyc) said : #2

Thanks Joey Ye, that solved my question.