Converted cvs repository is over 20 times bigger

Asked by Matthew A. Brannigan

So I've converted a CVS repository that is 5GB and the resulting bzr repository is about 135GB after doing pack and removing obsoletes. I used bzr 2.3.3, cvs2bzr (2.3.0) and bzr fast-import 0.11.0dev. Now this particular repository contains mostly compress tar/zip files of open source software (and a punch of scripts/makefiles). It's about 1500 non-text files (including deleted files) that if I totaled them correctly, is about 3GB of actual disk space.

I'm not expecting super-efficient storage for the files, but the increase of repository size seems to be so large that I must have done something wrong. There are a handful of branches (~15). A bzr info of the repository created says it is a shared repository.

Any thoughts? Should I create the shared repository first and not let bzr fast-import do that? Many of the branches are closed (I mostly want the current lines of development, not closed branches), should I try to import just them?

matt

Question information

Language:
English Edit question
Status:
Answered
For:
Bazaar Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
John A Meinel (jameinel) said :
#1

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 9/17/2011 4:00 AM, Matthew A. Brannigan wrote:
> New question #171433 on Bazaar:
> https://answers.launchpad.net/bzr/+question/171433
>
> So I've converted a CVS repository that is 5GB and the resulting
> bzr repository is about 135GB after doing pack and removing
> obsoletes. I used bzr 2.3.3, cvs2bzr (2.3.0) and bzr fast-import
> 0.11.0dev. Now this particular repository contains mostly compress
> tar/zip files of open source software (and a punch of
> scripts/makefiles). It's about 1500 non-text files (including
> deleted files) that if I totaled them correctly, is about 3GB of
> actual disk space.
>

In my experience, converting from CVS generally results in smaller
repository size. If you have a lot of pre-compressed content, then I
would expect it to be ~ the same.

My guess is that for whatever reason the conversion did *not* create a
shared repository. Which means that each branch ended up with
yet-another-copy of the history.

You can confirm it by doing:

find . -name repository

Which should find "path/to/branch/.bzr/repository".

What you want, is a shared repository at the top of all the branches.
For example,:

 root/
   .bzr/repository/
       shared-storage <= presence indicates $root is a shared repo
   branch1/
      .bzr/branch
   branch2/
      .bzr/branch

I don't know if 'cvs2bzr' requires you to do:

 bzr init-repo $ROOT

before you do the import for it to not create a new repository at each
branch.

> I'm not expecting super-efficient storage for the files, but the
> increase of repository size seems to be so large that I must have
> done something wrong. There are a handful of branches (~15). A
> bzr info of the repository created says it is a shared repository.
>
> Any thoughts? Should I create the shared repository first and not
> let bzr fast-import do that? Many of the branches are closed (I
> mostly want the current lines of development, not closed branches),
> should I try to import just them?
>
> matt
>

Check and see if you have a lot of repositories, if you do, then you
need to create the shared repository first. You also might want to do:

 bzr init-repo --no-trees

So that each of those branches also doesn't end up creating a full
working tree. You could also check for that with:

 find . -name checkout

If you have lots of working trees (and say only 1 shared repository),
you could run:

  bzr remove-tree

in each of those directories.

In general, for a shared location like this, you should end up with
only 1 'repository', and 1 branch per CVS branch, and no working trees
in the shared location. You can have as many working trees on your
development locations as you want.

If you want more help, you could do "ls -R" and post the results,
possibly off list if you don't want to make it public.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk50hJoACgkQJdeBCYSNAANMNACfZaFtQiDUZcozThXcOds7+Acl
4xAAn0FWc44LaslV+E0NX+O2azyJt08+
=ZqUd
-----END PGP SIGNATURE-----

Revision history for this message
Matthew A. Brannigan (mbrannig) said :
#2

Well I can confirm I did get a shared repository --- I have a zero length file called shared-storage in $ROOT/.bzr/repository and the branch directories are correct (i.e. no repository directories in them).

Here is a santinized ls -lR: (is there a way to attach info?)

./BRANCH_R5/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_R5/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 52 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 1083 Sep 16 01:40 tags

./BRANCH_R5/.bzr/branch/lock:
total 0

./BRANCH_R5/.bzr/branch-lock:
total 0

./BRANCH_R5/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 164 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_R5/.bzr/checkout/lock:
total 0
./RNA_R2/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:40 checkout

./RNA_R2/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:40 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:40 format
-rw-r--r-- 1 user user 54 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 312 Sep 16 01:40 tags

./RNA_R2/.bzr/branch/lock:
total 0

./RNA_R2/.bzr/branch-lock:
total 0

./RNA_R2/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:40 conflicts
-rw-r--r-- 1 user user 163 Sep 16 01:40 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:40 format
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 0 Sep 16 01:40 views

./RNA_R2/.bzr/checkout/lock:
total 0
./BRANCH_R13/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_R13/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 52 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 147 Sep 16 01:40 tags

./BRANCH_R13/.bzr/branch/lock:
total 0

./BRANCH_R13/.bzr/branch-lock:
total 0

./BRANCH_R13/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 164 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_R13/.bzr/checkout/lock:
total 0
./BRANCH_4_5_0/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_4_5_0/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 52 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 200 Sep 16 01:40 tags

./BRANCH_4_5_0/.bzr/branch/lock:
total 0

./BRANCH_4_5_0/.bzr/branch-lock:
total 0

./BRANCH_4_5_0/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 164 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_4_5_0/.bzr/checkout/lock:
total 0
./BRANCH_R15/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_R15/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 53 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 138 Sep 16 01:40 tags

./BRANCH_R15/.bzr/branch/lock:
total 0

./BRANCH_R15/.bzr/branch-lock:
total 0

./BRANCH_R15/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 164 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_R15/.bzr/checkout/lock:
total 0
./BRANCH_R5_NBRANCHRTEL/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_R5_NBRANCHRTEL/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 44 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 864 Sep 16 01:40 tags

./BRANCH_R5_NBRANCHRTEL/.bzr/branch/lock:
total 0

./BRANCH_R5_NBRANCHRTEL/.bzr/branch-lock:
total 0

./BRANCH_R5_NBRANCHRTEL/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 165 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_R5_NBRANCHRTEL/.bzr/checkout/lock:
total 0
./BRANCH_BRANCH_LEGACY/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_BRANCH_LEGACY/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 54 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 474 Sep 16 01:40 tags

./BRANCH_BRANCH_LEGACY/.bzr/branch/lock:
total 0

./BRANCH_BRANCH_LEGACY/.bzr/branch-lock:
total 0

./BRANCH_BRANCH_LEGACY/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 165 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_BRANCH_LEGACY/.bzr/checkout/lock:
total 0
./.bzr:
total 16
-rw-r--r-- 1 user user 147 Sep 15 16:29 README
-rw-r--r-- 1 user user 35 Sep 15 16:29 branch-format
drwxr-xr-x 2 user user 4096 Sep 15 16:29 branch-lock
drwxr-xr-x 7 user user 4096 Sep 16 07:42 repository

./.bzr/branch-lock:
total 0

./.bzr/repository:
total 192
-rw-r--r-- 1 user user 163827 Sep 16 01:39 fastimport-id-map
-rw-r--r-- 1 user user 54 Sep 15 16:29 format
drwxr-xr-x 2 user user 4096 Sep 16 07:42 indices
drwxr-xr-x 2 user user 4096 Sep 16 07:42 lock
drwxr-xr-x 2 user user 4096 Sep 16 07:43 obsolete_packs
-rw-r--r-- 1 user user 149 Sep 16 07:42 pack-names
drwxr-xr-x 2 user user 4096 Sep 16 07:42 packs
-rw-r--r-- 1 user user 0 Sep 15 16:29 shared-storage
drwxr-xr-x 2 user user 4096 Sep 16 20:01 upload

./.bzr/repository/indices:
total 12812
-rw-r--r-- 1 user user 3723428 Sep 16 07:42 b7fbc43c9aa8c8f4f6d7512938fcb5e1.cix
-rw-r--r-- 1 user user 102578 Sep 16 07:42 b7fbc43c9aa8c8f4f6d7512938fcb5e1.iix
-rw-r--r-- 1 user user 102762 Sep 16 07:42 b7fbc43c9aa8c8f4f6d7512938fcb5e1.rix
-rw-r--r-- 1 user user 72 Sep 16 07:42 b7fbc43c9aa8c8f4f6d7512938fcb5e1.six
-rw-r--r-- 1 user user 9146196 Sep 16 07:42 b7fbc43c9aa8c8f4f6d7512938fcb5e1.tix

./.bzr/repository/lock:
total 0

./.bzr/repository/obsolete_packs:
total 0

./.bzr/repository/packs:
total 141010520
-rw-r--r-- 1 user user 144253755127 Sep 16 07:42 b7fbc43c9aa8c8f4f6d7512938fcb5e1.pack

./.bzr/repository/upload:
total 0
./BRANCH_R11_FUGU/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_R11_FUGU/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 54 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 334 Sep 16 01:40 tags

./BRANCH_R11_FUGU/.bzr/branch/lock:
total 0

./BRANCH_R11_FUGU/.bzr/branch-lock:
total 0

./BRANCH_R11_FUGU/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 165 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_R11_FUGU/.bzr/checkout/lock:
total 0
./BRANCH_R5_DB/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_R5_DB/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 44 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 864 Sep 16 01:40 tags

./BRANCH_R5_DB/.bzr/branch/lock:
total 0

./BRANCH_R5_DB/.bzr/branch-lock:
total 0

./BRANCH_R5_DB/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 164 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_R5_DB/.bzr/checkout/lock:
total 0
./BRANCH_R12/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_R12/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 52 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 74 Sep 16 01:40 tags

./BRANCH_R12/.bzr/branch/lock:
total 0

./BRANCH_R12/.bzr/branch-lock:
total 0

./BRANCH_R12/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 165 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_R12/.bzr/checkout/lock:
total 0
./BRANCH_R5_XBIVIBRANCH/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_R5_XBIVIBRANCH/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 44 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 864 Sep 16 01:40 tags

./BRANCH_R5_XBIVIBRANCH/.bzr/branch/lock:
total 0

./BRANCH_R5_XBIVIBRANCH/.bzr/branch-lock:
total 0

./BRANCH_R5_XBIVIBRANCH/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 165 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_R5_XBIVIBRANCH/.bzr/checkout/lock:
total 0
./BRANCH_R3/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_R3/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 54 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 312 Sep 16 01:40 tags

./BRANCH_R3/.bzr/branch/lock:
total 0

./BRANCH_R3/.bzr/branch-lock:
total 0

./BRANCH_R3/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 163 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_R3/.bzr/checkout/lock:
total 0
./FAILBRANCHPEN/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./FAILBRANCHPEN/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 52 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 65 Sep 16 01:40 tags

./FAILBRANCHPEN/.bzr/branch/lock:
total 0

./FAILBRANCHPEN/.bzr/branch-lock:
total 0

./FAILBRANCHPEN/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 165 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./FAILBRANCHPEN/.bzr/checkout/lock:
total 0
./trunk/.bzr:
total 16
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock

./trunk/.bzr/branch:
total 20
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 49 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 4911 Sep 16 01:40 tags

./trunk/.bzr/branch/lock:
total 0

./trunk/.bzr/branch-lock:
total 0
./BRANCH_4_9_0/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_4_9_0/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 55 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 65 Sep 16 01:40 tags

./BRANCH_4_9_0/.bzr/branch/lock:
total 0

./BRANCH_4_9_0/.bzr/branch-lock:
total 0

./BRANCH_4_9_0/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 164 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_4_9_0/.bzr/checkout/lock:
total 0
./TAG.FIXUP/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:40 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:40 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:40 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:40 checkout

./TAG.FIXUP/.bzr/branch:
total 56
-rw-r--r-- 1 user user 0 Sep 16 01:40 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:40 format
-rw-r--r-- 1 user user 44 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 44831 Sep 16 01:40 tags

./TAG.FIXUP/.bzr/branch/lock:
total 0

./TAG.FIXUP/.bzr/branch-lock:
total 0

./TAG.FIXUP/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:40 conflicts
-rw-r--r-- 1 user user 164 Sep 16 01:40 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:40 format
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 0 Sep 16 01:40 views

./TAG.FIXUP/.bzr/checkout/lock:
total 0
./BRANCH_R2/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_R2/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 54 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 186 Sep 16 01:40 tags

./BRANCH_R2/.bzr/branch/lock:
total 0

./BRANCH_R2/.bzr/branch-lock:
total 0

./BRANCH_R2/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 162 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_R2/.bzr/checkout/lock:
total 0
./Sourcefire/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:40 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:40 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:40 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:40 checkout

./Sourcefire/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:40 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:40 format
-rw-r--r-- 1 user user 52 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 88 Sep 16 01:40 tags

./Sourcefire/.bzr/branch/lock:
total 0

./Sourcefire/.bzr/branch-lock:
total 0

./Sourcefire/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:40 conflicts
-rw-r--r-- 1 user user 163 Sep 16 01:40 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:40 format
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 0 Sep 16 01:40 views

./Sourcefire/.bzr/checkout/lock:
total 0
./BRANCH_4_9_1/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_4_9_1/.bzr/branch:
total 20
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 55 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 4112 Sep 16 01:40 tags

./BRANCH_4_9_1/.bzr/branch/lock:
total 0

./BRANCH_4_9_1/.bzr/branch-lock:
total 0

./BRANCH_4_9_1/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 165 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_4_9_1/.bzr/checkout/lock:
total 0
./KEN_BRANCH/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./KEN_BRANCH/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 52 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 136 Sep 16 01:40 tags

./KEN_BRANCH/.bzr/branch/lock:
total 0

./KEN_BRANCH/.bzr/branch-lock:
total 0

./KEN_BRANCH/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 164 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./KEN_BRANCH/.bzr/checkout/lock:
total 0
./BRANCH_R5_NETSEC/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_R5_NETSEC/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 44 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 864 Sep 16 01:40 tags

./BRANCH_R5_NETSEC/.bzr/branch/lock:
total 0

./BRANCH_R5_NETSEC/.bzr/branch-lock:
total 0

./BRANCH_R5_NETSEC/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 163 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_R5_NETSEC/.bzr/checkout/lock:
total 0
./BRANCH_4_10_0/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_4_10_0/.bzr/branch:
total 20
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 49 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 5065 Sep 16 01:40 tags

./BRANCH_4_10_0/.bzr/branch/lock:
total 0

./BRANCH_4_10_0/.bzr/branch-lock:
total 0

./BRANCH_4_10_0/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 164 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_4_10_0/.bzr/checkout/lock:
total 0
./BRANCH_R5_LBRANCHCALE/.bzr:
total 20
-rw-r--r-- 1 user user 147 Sep 16 01:39 README
drwxr-xr-x 3 user user 4096 Sep 16 01:40 branch
-rw-r--r-- 1 user user 35 Sep 16 01:39 branch-format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 branch-lock
drwxr-xr-x 3 user user 4096 Sep 16 01:39 checkout

./BRANCH_R5_LBRANCHCALE/.bzr/branch:
total 16
-rw-r--r-- 1 user user 0 Sep 16 01:39 branch.conf
-rw-r--r-- 1 user user 39 Sep 16 01:39 format
-rw-r--r-- 1 user user 44 Sep 16 01:40 last-revision
drwxr-xr-x 2 user user 4096 Sep 16 01:40 lock
-rw-r--r-- 1 user user 864 Sep 16 01:40 tags

./BRANCH_R5_LBRANCHCALE/.bzr/branch/lock:
total 0

./BRANCH_R5_LBRANCHCALE/.bzr/branch-lock:
total 0

./BRANCH_R5_LBRANCHCALE/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 163 Sep 16 01:39 dirstate
-rw-r--r-- 1 user user 40 Sep 16 01:39 format
drwxr-xr-x 2 user user 4096 Sep 16 01:39 lock
-rw-r--r-- 1 user user 0 Sep 16 01:39 views

./BRANCH_R5_LBRANCHCALE/.bzr/checkout/lock:
total 0

Revision history for this message
Martin Pool (mbp) said :
#3

It looks a lot like you have a checkout (a working copy) for every single branch, which would certainly account for a lot of disk usage. Unless you especially want to have them around, you can get rid of them with 'bzr remote-tree'.

If cvs2bzr always generates these and doesn't have an option to avoid it, I'd say that's a bug.

Revision history for this message
John A Meinel (jameinel) said :
#4

Martin-

It does look like it created a bunch of checkouts, but ISTR that it creates empty checkouts. (the idea being you can then go to any directory and just "bzr up" to get the latest, though I don't know that it is easier that 'bzr co .')

You can tell because of the size of the dirstate files:
./BRANCH_R5_LBRANCHCALE/.bzr/checkout:
total 16
-rw-r--r-- 1 user user 27 Sep 16 01:39 conflicts
-rw-r--r-- 1 user user 163 Sep 16 01:39 dirstate

If it is only 163 bytes, that certainly isn't storing the history there.

It really is an import issue, because you can see:
./.bzr/repository/packs:
total 141010520
-rw-r--r-- 1 user user 144253755127 Sep 16 07:42 b7fbc43c9aa8c8f4f6d7512938fcb5e1.pack

Which is 134GB for just the one .pack file.

I certainly don't know the details of the history here, but obviously something weird is going on. Here are some possibilities that I can think of:

1) The large tar.gz files aren't being linked-over-history properly. IIRC, in CVS when you have lots of branches, it can be difficult to link files in different branches back to the same ancestry. For example, if a tar.gz shows up in 2 branches without showing up in HEAD, then I think it can show up as independently added history. If we were to assign a unique file-id to each of those newly added files, and those files were larger than our 2MB cross-file-compression-group size, then we would store the full content for each copy of the file.

If this is the problem, then you could use 'bzr inventory --show-ids' in different branches, especially at old versions in history, and see if something that looks like it should be the same file is actually stored with a different id.

The possible fixes are:
  a) Teach cvs2bzr to treat these as the same file. It is ok if they get different revisions (with the same content). As long as it says "foo.tar.gz in branch A is the same file as foo.tar.gz in branch B" we should be able to try to delta compress it, and find that the version in B is exactly the same as the version in A.

  b) Play with the values in bzrlib/groupcompress.py. Specifically around line 1822:
            if (prefix == max_fulltext_prefix
                and end_point < 2 * max_fulltext_len):
                # As long as we are on the same file_id, we will fill at least
                # 2 * max_fulltext_len
                start_new_block = False
            elif end_point > 4*1024*1024:
                start_new_block = True
=> elif (prefix is not None and prefix != last_prefix
                  and end_point > 2*1024*1024):
                start_new_block = True
            else:
                start_new_block = False
            last_prefix = prefix
            if start_new_block:
   If the file ids are different between branches, we are probably hitting that case. I see 2 possible changes here:
   i) Just set the numbers to be much bigger. If the tar.gz files are 10MB, you could set it to 20MB instead of 2MB, etc.
   ii) Compare 'end_point' versus end_point before we do the delta compression. I'm not sure on the specifics, but the idea is that if we are getting *really* good delta compression, allow the compressor to make a group bigger than normal. If the contents are identical, then the delta should be something like 65k:1 smaller than the full text. So we could add:

  elif (prefix is not None and prefix != last_prefix):
    if end_point < 2*1024*1024:
      start_new_block = False
    elif (end_point - old_end_point) * 10000 < len(new_text):
     # we got better than 10000:1 delta compression for this text, leave it in the group
     start_new_block = False
    else:
      start_new_block = True

2) It is possible we aren't finding delta matches in the large files because of: 'bzr.groupcompress.max_bytes_to_index'. This is a configuration option that sets how accurately we compute the delta map. To avoid large memory usage, if we get text content that is 'large' we sample less than every byte. The default value is to effectively sample 1MB of each text, and use that for computing possible delta locations. So if you have a 20MB file, only 1/20th of the file gets sampled.

 a) You can change that value with: "bzr config bzr.groupcompress.max_bytes_to_index=104857600" (to set it to 100MB).
 b) I don't really expect this to be the problem, because we sample all bytes on one side of the delta comparison. So while we only sample 1/20th of the original source file, we would have to have a case where each of those ranges doesn't match anything in the target file, but then the intermediate text does match, otherwise you would still get delta compression.
 c) The nice thing, though, is that you can do 'bzr config... && bzr pack' and test it easily.

Can you help with this problem?

Provide an answer of your own, or ask Matthew A. Brannigan for more information if necessary.

To post a message you must log in.