test results and acceptance level

Asked by Thomas Giesel

How exactly to you run the GCC test suite and how do you decide if unexpected failures are harmless? The “# of unexpected failures 51” in [1] looks disturbing.

In question [1] Terry Guo answered "I am sorry that the .sum and .log files are not approved to share publicly yet." I don’t know the reason for this, possibly legal and liability issues, because the toolchain and test suite are built from sources written by third parties but not by ARM.

However, it would be great to know: What is your acceptance level for a binary release? And does “Tested on a variety of Cortex-M0/M0+/M3/M4/M7/A9 boards” mean that this acceptance level needs to be reached for at least one board with each of these cores? What level of issues are worth to be listed in release.txt?

Thank you very much for your effort.

[1] https://answers.launchpad.net/gcc-arm-embedded/+question/232721

Question information

Language:
English Edit question
Status:
Solved
For:
GNU Arm Embedded Toolchain Edit question
Assignee:
No assignee Edit question
Solved by:
Thomas Preud'homme
Solved:
Last query:
Last reply:
Revision history for this message
Best Thomas Preud'homme (thomas-preudhomme) said :
#1

Hi Thomas,

There are many cases where it is ok for a test to fail. First, there could be meaningless on some target but not be disabled on those. For instance, some atomic tests make no sense for ARMv6-M processors. It could also be that the check in a test is too strict and fails to consider some valid cases. Finally, the test could have been failing since the beginning of support for a target. While in this case the test shows a problem exist, it might not be an issue if the feature being tested is not important for users of this target. I don't have a real example in mind but think about an optimization for 64bit divisions, which is probably an uncommon thing for embedded targets like Cortex-M0 cores.

For all these reasons, the way we look at tests is to compare with previous releases to catch regressions. If the bug was there previously it means it's a known issue, and either there isn't enough interest to fix it, or it's a difficult one. In any case, it means a given release is no worse than the previous one (and so the quality only improves over time). Beside testsuite, we also run a few benchmark and run them on several boards. This is where the "tested on a variety of (...)" comes from: benchmarks are compiled without errors and run fine on boards with these CPUs. We might even test on a given CPU with several boards.

As for sharing the test results, it is something we want to do but requires to review them so that it is too heavyweight and rule out automation. However we can tell you if we have the same number of failures.

I hope I answered your question.

Best regards.

Revision history for this message
Thomas Giesel (skoe) said :
#2

Thank you for the explanation.