Comment 10 for bug 897583

Revision history for this message
In , David Kastrup (dak) wrote :

I agree that the real fix is to force an upgrade of the compiler to a fixed version. However, Ubuntu 11.10 has been released and is in circulation, so we can't reasonably implement that solution until the buggy compilers have had a reasonable chance to be replaced everywhere.

I have reported this bug to Ubuntu. If you are right that it can't be found in 4.6 proper, they will have acquired it via distribution specific patches. What that means for stability and security of the entire current Ubuntu code base, one can only guess.

Regarding Lilypond, we have chosen to use -fno-optimize-sibling-calls based on the gcc version number instead of an actual test, without consideration of the architecture. Tracking this bug down has cost us several weeks of developer time and brought down our build infrastructure for a while until the first workaround, -fkeep-inline-functions, has been discovered by chance. Lilypond is a C++ application with considerable parts written in Guile, so segfaults usually are a problem of forgetting garbage collection protection measures. As far as I know, I am the only active programmer with a system programming background. When the bug manifests itself in a segfault, the responsible function is no longer visible in the stack backtrace. This makes finding the culprit extremely unfunny. In our case, the problem was exacerbated because the last visible caller in the stack backtrace made its call via a function pointer table, this table was a C++ vector, and accessing the vector in gdb was not possible because operator[] had been inlined. Specifying -fkeep-inline-function, which is according to its documentation supposed to _only_ additionally emit (unused) inline function instantiations that could have been used for accessing that table in the debugger, made the bug disappear.

There is no sane reason that -fkeep-inline-functions turns off sibling call optimization, but while sabotaging the debugging of this problem, it at least gave us a workaround.

So we simply can't afford dealing with this kind of situation more than once. We don't have the skill sets. In contrast, the positive results of this optimization are negligible for us since we don't employ systematic call chaining (like a P code interpreter using function pointer tables likely would).