Removing Nuclear Waste

Asked by Charles Logston

We had an inadvertent add and commit of a directory containing many large binaries. I thought we were fine since we caught it in time, but even after a bzr uncommit it's obvious that this history remains in the repository. Trying to figure out what options we have. Starting fresh would be very undesirable as we have a fairly rich history we'd like to maintain, and we have 10 or so branches that need to be merged back in.

Any advice is appreciated.

Question information

Language:
English Edit question
Status:
Solved
For:
Bazaar Edit question
Assignee:
No assignee Edit question
Solved by:
Charles Logston
Solved:
Last query:
Last reply:
Revision history for this message
Martin Pool (mbp) said :
#1

Probably the best thing is to make a new repository, then branch into it all of the branches that you want to keep. That will copy across the data needed for them, and leave everything else behind.

Revision history for this message
Charles Logston (charleslogston) said :
#2

Thanks for your response Martin. I'm afraid your suggestion isn't working for me. Some more detail perhaps:

projectname-trunk is in a shared repository (no trees). It had nuclear waste which was then uncommitted. Branching from projectname-trunk (which is located remotely) to my development machine results in a standalone tree/repo the size of which is very large. It seems like it's still getting this uncommitted data somehow. Specifically, size of branches before problem were 90MB, after binary directory commit they were 500MB, and now that we've uncommitted that binary directory we're looking at around 240MB. 190MB are taken up in .bzr/repository/packs. This is on a fresh branch.

I'm on 1.12 now. Going to try 1.11 to make sure I see the same behavior. Version on the remote "central" host is 1.10.

Revision history for this message
Charles Logston (charleslogston) said :
#3

Downgraded to 1.10 on my machine just to rule out a behavior change in 1.12, and I'm still seeing what I described above.

Revision history for this message
James Westby (james-w) said :
#4

On Tue, 2009-03-03 at 23:00 +0000, Charles Logston wrote:
> New question #63001 on Bazaar:
> https://answers.launchpad.net/bzr/+question/63001
>
> We had an inadvertent add and commit of a directory containing many large binaries.
> I thought we were fine since we caught it in time, but even after a bzr uncommit
> it's obvious that this history remains in the repository. Trying to figure out what
> options we have. Starting fresh would be very undesirable as we have a fairly rich
> history we'd like to maintain, and we have 10 or so branches that need to be merged back in.

This is a different case, and much easier to solve.

As you uncommitted the revision in question it is no longer referenced.
However, an "uncommit" doesn't delete the data. This means all that
unwanted data is still stored in the repository, but is not referenced
by anything.

bzr currently doesn't support garbage collection directly, however it is
easy enough to simulate, if a little tedious. You create a new
repository, branch all of your branches in to it, and then replace the
old with the new.

  bzr init-repo /tmp/new-repo

  bzr branch old-repo/branch1 /tmp/new-repo
  bzr branch old-repo/branch2 /tmp/new-repo
  .
  .

  mv old-repo /tmp
  mv /tmp/new-repo old-repo

and you will see that "new-repo" is smaller than "old-repo".

If the mistake happened in a standalone branch then simply branching
that somewhere else, removing the old one, and branching back would
fix it.

In all of this you should be careful to preserve uncommitted changes
in any working trees that you have.

Thanks,

James

Revision history for this message
Charles Logston (charleslogston) said :
#5

Thanks to both Martin and James for great answers. I just discovered that someone branched from trunk while it was in its nuclear waste state, and that this branch was merged back in after we purged the waste. I think we have it all under control now. Thanks again guys.