Knit inventory corrupt in bzr.dev with bzr.dev r3452 (KnitCorrupt)

Bug #234748 reported by Matt Nordhoff
4
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
High
Unassigned

Bug Description

bzr.dev, packs, Ubuntu Gutsy, ext3, Python 2.5.1..

I was just running "bzr check" on my copy of bzr.dev for fun, and this happened:

KnitCorrupt: Knit inventory corrupt:
  sha-1 548f019618118527ae97bb092643be0d187c0cf1
  of reconstructed text does not match
  expected 89853621e92f00ae1bb075139284c4df9a0434d0
  for version <email address hidden>

Note that that was the first or one of the first versionedfiles it was checking, according to the progress bar.

I ran "bzr check" a second time and got the same thing. Unfortunately, I didn't think to run it without plugins.

sabdfl on IRC was also running it at the same time, with a similar result:

KnitCorrupt: Knit inventory corrupt:
  sha-1 a7b104d9ee824caed0e59a2989446419e230c4c2
  of reconstructed text does not match
  expected 873bfb499cc42b19241ac60c9736568abe2e4518
  for version <email address hidden>

Revision history for this message
Matt Nordhoff (mnordhoff) wrote :
Revision history for this message
Matt Nordhoff (mnordhoff) wrote :

I pushed my copy of bzr.dev to a server 4-6 weeks ago, and have been updating it since. "bzr check" fails on the same revision that my local branch does. (Same versions of everything, except I upgraded it to Hardy as soon as it came out.)

BTW, I'm pretty sure I've run "bzr check" on my local branch over the last few months without problems. When upgrading to packs, I always check/reconcile/upgrade/maybe check/reconcile/check, but I may not have on bzr.dev...

Revision history for this message
Andrew Bennetts (spiv) wrote :

Confirming this bug: jml just encounted the exact same exception (i.e. the same SHA-1 digests on the same inventory). I think we should track this down and fix it for 1.6.

Changed in bzr:
importance: Undecided → High
milestone: none → 1.6
status: New → Confirmed
Revision history for this message
Andrew Bennetts (spiv) wrote :

Also, jml reports that this affects more than "bzr check". It happens to him when branching to a standalone branch (although branching into his existing repo works ok).

Revision history for this message
Martin Pool (mbp) wrote :

I have a repository where check fails.

When I get the inventory through r.get_inventory_xml(bad_revid1) I get a text back without an error, and it has the hash reported above as 'expected', for both of the revision ids quoted above.

If I run repo.get_inventory_weave.check() that also passes, and asking the weave for the correct version has no problems.

Andrew suggests the problem may be the optimizations for getting multiple versions of a file in a single roundtrip may be incorrect.

Revision history for this message
Martin Pool (mbp) wrote :

In fact we can see that check_weaves() has previously run

        self.inventory_weave.check(progress_bar=self.progress)

and passed - it's only when it's getting the revision trees later on that we hit problems. In _iter_inventories it's trying to get all the inventories for the repository in one go. (Despite the name they're all loaded at once, then iterated.)

Running just r._generate_text_key_index() does reproduce the problem.

_get_content_maps was called with

self = KnitVersionedFile(file:///home/mbp/bzr/.bzr/repository/inventory)
version_ids = ['<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>']

It turns out the retrieved (incorrect?) version is missing a final newline compared to the expected version; this may indicate a problem in this region of get_content_maps:

            content.cleanup_eol(copy_on_mutate=multiple_versions)

Revision history for this message
Martin Pool (mbp) wrote :

A smaller reproduction of the problem:

bad_version_ids = ['<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>']
r.get_inventory_weave().get_content_maps(bad_version_ids)

whereas asking for any one of those revisions individually does work ok.

and the minimum reproduction is

iw._get_content_maps(['<email address hidden>',
 '<email address hidden>'])

Asking for the same version twice, which should cause multiple_versions=True, does not provoke a problem. And asking for them in the opposite order does not cause a problem either.

The problem is definitely a bug in extraction of multiple related versions of the same text...

Revision history for this message
Andrew Bennetts (spiv) wrote :

Judging from gannotate this was introduced in r3224.1.15 (<email address hidden>), which changed:

- if 'no-eol' in self._index.get_options(version_id):
- if multiple_versions:
- content = content.copy()
- content.strip_last_line_newline()
+ content.cleanup_eol(copy_on_mutate=multiple_versions)

copy_on_mutate=True avoids mutating _lines, but still mutates the content object itself.

Revision history for this message
Martin Pool (mbp) wrote :

This fixes the aliasing bug when getting several texts simultaneously.

Martin Pool (mbp)
Changed in bzr:
status: Confirmed → In Progress
Revision history for this message
Andrew Bennetts (spiv) wrote :

Clarification: jml's KnitCorrupt on branching is not this bug. That is due to "KnitCorrupt: Knit 86/quilt.txt-20050309044946-00df099b997fed5e corrupt: line-delta from stream for version <email address hidden>
0b458 references missing parent <email address hidden>", which IIRC is problem from ancient bzr versions that "bzr reconcile" ought to fix. jml got the error described in this bug when attempting "bzr reconcile".

So this bug only seems to occur on "bzr check"/"bzr reconcile" after all.

John A Meinel (jameinel)
Changed in bzr:
status: In Progress → Fix Committed
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 234748] Re: Knit inventory corrupt in bzr.dev with bzr.dev r3452 (KnitCorrupt)

Did this get a test case added? I have mass changes in this area pending
and knowing its tested would be very useful...

-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
John A Meinel (jameinel) wrote :

Last I saw this wasn't merged (Fix Released) pending a test case being added. So we'll have to ask Martin where he got to.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 234748] Re: Knit inventory corrupt in bzr.dev with bzr.dev r3452 (KnitCorrupt)

On Fri, May 30, 2008 at 7:07 AM, John A Meinel <email address hidden> wrote:
> Last I saw this wasn't merged (Fix Released) pending a test case being
> added. So we'll have to ask Martin where he got to.

Here's where I got to. Thanks to Robert for working out a test with me.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
TimeHorse (timehorse) wrote :

I have also noticed this problem in the python repository, though I am not seeing it now. When it does occur, not only does "bzr check" fail, but getting negative revisions does too, e.g. "bzr log -r -1" would also generate a similar error.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 234748] Re: Knit inventory corrupt in bzr.dev with bzr.dev r3452 (KnitCorrupt)

On Fri, 2008-05-30 at 14:45 +0000, TimeHorse wrote:
> I have also noticed this problem in the python repository, though I am
> not seeing it now. When it does occur, not only does "bzr check" fail,
> but getting negative revisions does too, e.g. "bzr log -r -1" would also
> generate a similar error.

Please do file a separate bug after testing with Martins patch - the
patch does fix this particular case, but you may have encountered
something else.

-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
Robert Collins (lifeless) wrote :

On Wed, 2008-05-28 at 06:29 +0000, Martin Pool wrote:
> This fixes the aliasing bug when getting several texts simultaneously.
>
> ** Attachment added: "234748-check-failure.diff"
> http://launchpadlibrarian.net/14758759/234748-check-failure.diff

Martin and I debugged in detail the reason the patch works, I thought I
would brain-dump it.

knit records have a canonical serialised form:
line\n
line\n
...
line\n

when committing a change to a file the *canonical* form is used.

So changing the file "foo\n" to be "foo" results in a diff between
["foo\n"] and ["foo\n"]. The resulting changes (an empty delta) are then
written to a knit record, and the end of line status written to the
index. Call these texts A and B.

'fixing up' the content record changes the canonical form to be non
canonical *when* the user's text is missing a trailing newline. We have
code to strip an inappropriate trailing newline (the canonical->user
translation, in the text() and annotate() methods), but we don't have
code to *add a missing newline* (the user->canonical transition) outside
of the _add method (the one way user data is added to a knit). So what
the fixup call does is to cause any further use of the content record by
knit internals to be operating on the user representation of the lines,
not the internal one; and this breaks a couple of things.

The simplest thing it breaks is _add. _add is meant to do a diff on the
canonical form. The lack of a trailing \n in the canonical form causes
the basis text to never match on the last line of the file *even* if the
file is entirely unmodified. Our sha1 duplicate check bails out before
then, which is why its not apparent, but a one line change in a file
with no trailing EOL will create a delta with 2 lines changed - the one
that changed and the last line. This will mess up annotate in annotated
knits, as well as causing extra work in unannotated knits. It *also* has
the impact of preventing bzr creating the data needed to reproduce this
bug, because instead of an empty delta for a \n-only change to the last
line of a file, a delta is created if the basis is missing a \n
(regardless of the actual change).

The breakage that causes this bug is indeed aliasing; what happens is
that older bzr's that don't suffer this bug correctly omit a delta for a
change to the last line of a file that solely adds the trailing \n. If
you then ask for either side of such a change - for instance [B, A]
above, when B is extracted, it's delta is applied onto A, but that does
not alter the last line (remember the logic expected canonicalised
form), and when .text() is called the user is given a user form if that
is different (if the \n is meant to be stripped). And we then give the
user the \n-less last line preserved from the basis, even though it is
meant to have a \n, which causes the sha1 mismatch.

-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
Martin Pool (mbp) wrote :

Fix in the attached branch and up for review; not in 1.6b1 but should be in b2 or rc1

Revision history for this message
John A Meinel (jameinel) wrote :

merged into bzr.dev 3475

Changed in bzr:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.