Build killed with signal 15 after 150 minutes of inactivity

Asked by Nico Schlömer

I'm getting build errors of the type

> Build killed with signal 15 after 150 minutes of inactivity

on some of the builds of the large numerical library I'm maintaining. All builds are listed at <https://launchpad.net/~nschloe/+archive/trilinos-nightly/+packages>, successful are all Quantal builds, unsuccessful because of the timeout are amd64 on Lucid, for example (not i386 though).
There are otherwise no build errors to be seen.

What could be the reason for the failure? How to debug this?

Question information

Language:
English Edit question
Status:
Solved
For:
Launchpad itself Edit question
Assignee:
No assignee Edit question
Solved by:
Nico Schlömer
Solved:
Last query:
Last reply:
Revision history for this message
William Grant (wgrant) said :
#1

Launchpad will terminate a build if it fails to write anything to stdout or stderr for 150 minutes. This is sometimes caused by swap thrashing, often by attempts to link pathologically large C++ binaries. I'd check the peak memory usage on a lucid build on your local machine, and then compare it to quantal -- perhaps quantal's mpicxx uses less RAM, or perhaps the quantal build just built on a builder with more RAM.

Revision history for this message
Nico Schlömer (nschloe) said :
#2

The memory requirements are indeed substantial, and it seems that the quantal build was just lucky enough to land on a builder with more RAM.

What could I do to systematically engage this behavior? Is there a way to restrict the selection of builders, perhaps?

Revision history for this message
William Grant (wgrant) said :
#3

There's no way to choose from a subset of the available builders. It's extremely rare to see even huge projects run into this limitation; are you sure there's not something wrong, and that you can't split it into two smaller calls?

Revision history for this message
Nico Schlömer (nschloe) said :
#4

> are you sure there's not something wrong

Nope, not sure. I would like to find out where exactly the origin of the issue lies, but I don't know how to debug this. (I can't reproduce the stalling locally.)

Revision history for this message
William Grant (wgrant) said :
#5

Have you tried building in a VM with perhaps only 4GiB or even 2GiB of RAM?

Revision history for this message
Nico Schlömer (nschloe) said :
#6

I have not, but this may be a good way to check the memory usage of the build process.
For now, I got lucky enough for all the build runs to pass <https://launchpad.net/~nschloe/+archive/trilinos-nightly/+packages> by clicking "retry this build" until I landed on a machine with sufficient memory.

Bottom line: We now know that memory is the problem, and why the build process sometimes hangs and sometimes doesn't. I'll close this as "Problem solved".

Thanks for your support, @wgrant.