override mismatch race needs to be fixed

Bug #180218 reported by Adam Conrad
22
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Colin Watson

Bug Description

I'm spending a large chunk of my day today fixing component override mismatches between architectures (54 in total), and I think it's high time we fixed the race. A rather simple approach is outlined below (forgive me if I get terminology wrong, I know conceptually how it works, but haven't rooted around in the internals):

As I understand it, if an ftpmaster changes a component override while a build is "in the wild", the build in question stands a chance of inheriting the old override, instead of the new one, this can be fixed by doing the following:

1) When change-override.py is run, it needs to act not only on active publishee records, but on pending publisher records and, most importantly, on queue data (as if the user had also run "queue override"), by walking the appropriate suite's queues for binaries/sources matching the request and acting on them as well.

2) When a new upload comes in, we need to check queue data, pending publisher records, and active publisher records (in that priority order) to divine which overrides we need to assign to the incoming upload.

With those two changes, while there may still be a tiny race (perhaps completely removable with table or row locking) during the actual change-override.py run itself, we eliminate the huge window of opportunity for mismatched components that we currently seem to have.

For the record, this isn't just an aesthetic issue, having mismatched overrides completely breaks DAK for our security and autotest buildds.

Related branches

Celso Providelo (cprov)
Changed in soyuz:
assignee: nobody → cprov
importance: Undecided → High
milestone: none → 1.2.1
status: New → Triaged
Changed in soyuz:
milestone: 1.2.1 → 1.2.2
Celso Providelo (cprov)
Changed in soyuz:
milestone: 1.2.2 → 1.2.3
status: Triaged → Confirmed
Revision history for this message
Adam Conrad (adconrad) wrote :

I noticed another misbehaviour (IMO) of change-override.py today. When you invoke it with -S (source and binaries), it only acts on binaries tied to the most recent published source. For instance, assuming we have this published:

foo/main (1.2.3): source, amd64, powerpc, i386, lpia, sparc
foo/main (1.2.2): ia64
foo/main (1.2.1): hppa

If I invoke "change-override -s hardy -c universe -S foo", I'll get the following:

foo/universe (1.2.3): source, amd64, powerpc, i386, lpia, sparc
foo/main (1.2.2): ia64
foo/main (1.2.1): hppa

This is incorrect, as all "foo" binaries in a given suite (in this case, hardy) should always be in the same component, regardless of version. Asking to move one should move them all.

Revision history for this message
Celso Providelo (cprov) wrote :

Okay, after discussing the problem big-picture with Julian we have agreed on the following implementation plan:

 1. Make {IDSSPR, IDASBPR}.current_published to return the the latests PUBLISHED or PENDING publication in its context
      * it will reduce the ancestry race-condition upload-time and override-time.
      * Yes, the property name is horrible, but we don't have time to fix it right now.

 2. Before accepting a upload (source or binary any arch) we will also try to find a ancestry in ACCEPTED queue, if found we will use its overrides.
     * it will propagate queue binary overrides across all architectures.

 3. We will lookup the full binary chain by name before overriding, instead of using IDSSPR.binaries directly
     * It will fix the inconsistencies for FTBFS.

I hope it is still sounding correct during the implementation.

Changed in soyuz:
status: Confirmed → In Progress
Revision history for this message
Celso Providelo (cprov) wrote :

First implementation task committed in RF 5933.

Celso Providelo (cprov)
Changed in soyuz:
milestone: 1.2.3 → 1.2.4
Revision history for this message
Celso Providelo (cprov) wrote :

The most important part was done, the changes upload time will be postponed to the end of 2.0 cycle.

Changed in soyuz:
milestone: 1.2.4 → 1.2.6
status: In Progress → Triaged
Celso Providelo (cprov)
Changed in soyuz:
milestone: 1.2.6 → none
status: Triaged → Confirmed
Revision history for this message
William Grant (wgrant) wrote :

Any sign of a fix for this? 2.0 is well over and this bug still eats binaries silently.

Changed in soyuz:
milestone: none → pending
Curtis Hovey (sinzui)
Changed in soyuz:
assignee: Celso Providelo (cprov) → nobody
William Grant (wgrant)
tags: added: package-overrides
Colin Watson (cjwatson)
Changed in launchpad:
status: Triaged → In Progress
assignee: nobody → Colin Watson (cjwatson)
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
Colin Watson (cjwatson)
tags: added: qa-ok
removed: qa-needstesting
Colin Watson (cjwatson)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.