branch scanner rlimit failures cause the next branch to be incorrectly scanned and fail

Bug #786804 reported by Jean-Paul Calderone
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Aaron Bentley

Bug Description

r2675 of https://code.launchpad.net/~divmod-dev/divmod.org/trunk was never properly scanned. On #launchpad:

  < wgrant> 2011-05-21 20:24:08 INFO Updating branch scanner status: 2675 revs
  < wgrant> Fatal Python error: deletion of interned StaticTuple failed
  < wgrant> Aborted
  < wgrant> Looks like bzrlib exploded.
  < wgrant> Oh, no, it's actually a MemoryError.
  < wgrant> exarkun: A bug would be good. What happened here is that a kernel branch caused the scanner to hit its rlimit, but scan_branches.py didn't notice, so it continued trying to execute more jobs, and yours was next.

I guess divmod.org/trunk will probably be fixed by the next commit, but it would be good if launchpad handled this case better on its own.
OOPS-1965SMS9

Related branches

Changed in launchpad:
status: New → Triaged
importance: Undecided → Critical
tags: added: oops
summary: - branch scanner fails with MemoryError leaving branch page in
- intermediate state
+ branch scanner rlimit failures cause the next branch to be incorrectly
+ scanned and fail
description: updated
Revision history for this message
Robert Collins (lifeless) wrote :

this is fallout from a change we made to stop things swapping and taking down the machine; the fallout is a regression because previously only the problematic branches got stomped on.

tags: added: regression
tags: removed: regression
tags: added: regression
Revision history for this message
Martin Pool (mbp) wrote :

The fallout is from bug 690021.

It seems there are a few possibilities:

1- outright revert the imposition of the ulimit
2- make the ulimit come from a feature flag and then configure it to unlimited (probably good anyhow, assuming it's easy to check flags at the time it's needed)
3- make sure it's the problem branch that gets killed, not the following one

I would be inclined to do 2 and then 3.

Something like this could be behind <https://bugs.launchpad.net/bugs/761664> - although not this particular change, because that bug was reported before my ulimit change landed.

Revision history for this message
Robert Collins (lifeless) wrote :

The ulimit merely changes the failure mode, its a good thing to have. I would focus directly on 3 here.

Revision history for this message
Aaron Bentley (abentley) wrote :

Since the branch scans are jobs, it seems reasonable to use the TwistedJobRunner to isolate each job as a process, then set the rlimit for the jobs themselves.

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → Aaron Bentley (abentley)
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → Fix Committed
Aaron Bentley (abentley)
tags: added: qa-untestable
removed: qa-needstesting
William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.