BZR Problem-- out of memory with pull
I am trying to download a launchpad branch for the first time on a Windows system (Vista)
Running from the command line
bzr branch lp:leo-editor
fails with the message
7275kB 117kB/s | Fetching revisions:Inserting stream:Estimate 44471/71492
problem -- bzr: out of memory
Use -Dmem_dump to dump memory to a file.
This works -r version works:
bzr branch lp:leo-editor -r100
and iterating up the revision numbers I am successful with
bzr branch lp:leo-editor -r3085
but fail again with the same memory problem at
bzr branch lp:leo-editor -r3086
I assume this is one bad or too large revision
I was able to pull the whole revision set on a Linux Ubuntu machine
Is BZR on Windows really this fickle?
Any workarounds such as skipping this revision?
Question information
- Language:
- English Edit question
- Status:
- Answered
- For:
- Bazaar Edit question
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Revision history for this message
|
#1 |
The first thing to check is that you're using the current bzr release
- probably 2.5b2 is the best bet
Revision history for this message
|
#2 |
Upgraded to 2.5b2, which is much faster, but shows exact same problem
at the exact same place -r3086
On Wed, Oct 19, 2011 at 12:55 AM, Martin Pool
<email address hidden> wrote:
> Your question #175127 on Bazaar changed:
> https:/
>
> Status: Open => Answered
>
> Martin Pool proposed the following answer:
> The first thing to check is that you're using the current bzr release
> - probably 2.5b2 is the best bet
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https:/
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https:/
>
> You received this question notification because you asked the question.
>
Revision history for this message
|
#3 |
There doesn't seem to be anything particularly surprising in that revision.
Seeing what's in your .bzr.log for one of these failed branch commands might be helpful, running `bzr version` will tell you where to find that.
If you do:
> bzr branch -r 3000 lp:leo-editor
> bzr pull -d leo-editor -r 3100
> bzr pull -d leo-editor -r 3200
...and so on, does that then get you past this problem?
Revision history for this message
|
#4 |
> bzr pull -d leo-editor -r 3100
does not solve it. any r30xx > 3086 shows a memory error
Here is the log for a 2.5b2 run
Wed 2011-10-19 05:57:59 -0400
0.094 bazaar version: 2.5b2
0.094 bzr arguments: [u'pull', u'-r3086']
0.136 looking for plugins in C:/Users/
0.136 looking for plugins in C:/Progra~
0.200 encoding stdout as sys.stdout encoding 'cp437'
0.251 opening working tree 'C:/leo-
8.777 Using fetch logic to copy between CHKInventoryRep
8.778 fetching: <SearchResult search:
15.631 25 bytes left on the HTTP socket
17.982 25 bytes left on the HTTP socket
18.622 25 bytes left on the HTTP socket
19.649 25 bytes left on the HTTP socket
21.070 25 bytes left on the HTTP socket
22.487 25 bytes left on the HTTP socket
25.396 25 bytes left on the HTTP socket
30.031 25 bytes left on the HTTP socket
35.774 Adding the key (<bzrlib.
39.257 Transferred: 5577kB (143.8kB/s r:5555kB w:22kB)
39.257 Traceback (most recent call last):
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\fetch.pyo", line 76, in __init__
File "bzrlib\fetch.pyo", line 103, in __fetch
File "bzrlib\fetch.pyo", line 131, in _fetch_
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
File "bzrlib\
MemoryError
BZR 2.4.1 gave a simular error:
82.656 Adding the key (<bzrlib.
Revision history for this message
|
#5 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Long story short:
If at all possible, get an account on Launchpad, and use 'bzr
launchpad-login $USERNAME' so that 'bzr branch lp:leo-editor' will use
bzr+ssh instead of http. I can confirm that branching over http on
Windows fails, but branching over bzr+ssh succeeds.
There is a file in the repository, which is a little bit more than
900MB when decompressed. This file is referenced in a revision which
is no longer in the ancestry of lp:leo-editor. My guess is he
committed it by mistake, pushed it to Launchpad, realized his error,
uncommitted it, and then started a new branch. However, the data is
still there in the repository.
The best way to get rid of it is to get someone to branch it into
another location (which won't include it) delete that repository, and
push the 'clean' branch back into place.
Long version:
On 10/20/2011 1:15 PM, mdb wrote:
> Question #175127 on Bazaar changed:
> https:/
>
> Status: Answered => Open
>
> mdb is still having a problem:
>> bzr pull -d leo-editor -r 3100
>
> does not solve it. any r30xx > 3086 shows a memory error
>
> Here is the log for a 2.5b2 run
>
...
> 8.777 Using fetch logic to copy between
> CHKInventoryRep
>
>
and
> CHKInventoryRep
...
>
>
>
File "bzrlib\
> MemoryError
>
>
> BZR 2.4.1 gave a simular error: 82.656 Adding the key
> (<bzrlib.
> 7346774, 1802272) to an LRUSizeCache failed. value 906728419 is
> too big to fit in a the cache with size 41943040 52428800
>
The fact that you are accessing the repository over http might be
relevant.
35.774 Adding the key (<bzrlib.
0x02E521D0>, 7346774, 1802272) to an LRUSizeCache failed. value
906728419 is too big to fit in a the cache with size 41943040 52428800
^- This is also a bit suspicious. It indicates there is a compressed
record blob that is 906MB in size (when uncompressed, I believe).
(..., 7346774, 1802272)
Says that there is a compressed blob at offset 7,346,774 of 1,802,272
bytes long. (it doesn't say which file, unfortunately).
And the error message indicates that when that record is uncompressed,
it expands to about 906MB in memory.
Now, when I run the same command (over http), I seem to get a bit
farther, but I do eventually get at out-of-memory error:
215.657 Adding the key (<bzrlib.
0x02F43810>, 7346774, 1802272) to an LRUSizeCache failed. value
906728419 is too big to fit in a the cache with size 41943040 52428800
215.775 25 bytes left on the HTTP socket
217.046 Adding the key (<bzrlib.
0x02F43810>, 7346774, 1802272) to an LRUSizeCache failed. value
906728419 is too big to fit in a the cache with size 41943040 52428800
219.508 Transferred: 180768kB (825.0kB/s r:180616kB w:152kB)
[ 3868] 2011-10-20 14:27:16.482 INFO: Process status after command:
[ 3868] 2011-10-20 14:27:16.482 INFO: WorkingSize 101940 KiB
[ 3868] 2011-10-20 14:27:16.482 INFO: PeakWorking 627252 KiB
[ 3868] 2011-10-20 14:27:16.482 INFO: PagefileUsage 98604 KiB
[ 3868] 2011-10-20 14:27:16.483 INFO: PeakPagefileUsage 886576 KiB
[ 3868] 2011-10-20 14:27:16.483 INFO: PrivateUsage 98604 KiB
[ 3868] 2011-10-20 14:27:16.483 INFO: PageFaultCount 1365748
219.523 Traceback (most recent call last):
File "C:\dev\
exception_
return the_callable(*args, **kwargs)
File "C:\dev\
ret = run(*run_argv)
File "C:\dev\
run_argv_aliases
return self.run(
File "C:\dev\
return self._operation
File "C:\dev\
self.cleanups, self.func, *args, **kwargs)
File "C:\dev\
_do_with_cleanups
result = func(*args, **kwargs)
File "C:\dev\
source_
File "C:\dev\
create_
File "C:\dev\
self.cleanups, self.func, self, *args, **kwargs)
File "C:\dev\
_do_with_cleanups
result = func(*args, **kwargs)
File "C:\dev\
result_
File "C:\dev\
find_
File "C:\dev\
write_locked
result = unbound(self, *args, **kwargs)
File "C:\dev\
find_
File "C:\dev\
self.__fetch()
File "C:\dev\
self.
File "C:\dev\
_fetch_
stream, from_format, [])
File "C:\dev\
insert_stream
src_format, is_resume)
File "C:\dev\
insert_
self.
File "C:\dev\
insert_
for _ in self._insert_
File "C:\dev\
_insert_
bytes = record.
File "C:\dev\
get_bytes_as
self.
File "C:\dev\
_prepare_
self.
File "C:\dev\
_ensure_content
self._content = zlib.decompress
MemoryError
Note especially:
Transferred: 180768kB (825.0kB/s r:180616kB w:152kB)
and
PeakWorking 627252 KiB
Now, if I use 'bzr+ssh://' instead of http:, things seem a lot happier:
It completes successfully in 2m35s with:
Transferred: 82638kB (537.6kB/s r:82637kB w:1kB)
PeakWorking 91672 KiB
That's 91MB peak, instead of 600+MB peak.
Now, my guess is that 'lp:leo-editor' has a bit of junk data in it.
Something that compresses entirely too well (like a 900MB file of all
0 bytes). Further, that data might not actually be referenced in the
real history, but just be present in a revision that was pushed into
the repository, but then that revision was uncommitted/etc.
When you access the data via HTTP, the bzr client has to download the
whole blob, unpack it locally, and then use whatever content it
actually wanted from the blob. In contrast, if you use 'bzr+ssh://'
the server side can notice "oh, you only want bytes XX and YY from
that blob, I'll unpack it on my side, and then only send you the bytes
you are actually going to use".
The reason we get multiple lines about "value ... is too big" is
because we see that we have a large object, we decide not to cache it,
and then we have to download it again, and notice the same thing
again. However, if I just change the max size to 2GB, it never
complains, but it still goes OOM on Windows trying to extract the
content. (It happens to be in a different spot, but still goes OOM.)
One problem on Windows is that you rarely get to use all 2GB of
addressable memory as a contiguous block, because it maps all sorts of
DLLs into the middle of your virtual address space. (If you allocate
1MB chunks, you can usually get close to the 2GB mark.) A 32-bit Linux
server, on the other hand, usually lets you get much closer to a 3GB
addressable space. So the Launchpad servers don't have a problem
uncompressing and then throwing away the bytes from lp:leo-editor.
Note that this isn't strictly about http, though it plays a role. If I
mirror the bytes from Launchpad down exactly as they are, and then try
to do "bzr branch local-leo-editor --no-tree test" it fails for the
same reasons, albeit a lot faster.
I did a little bit more digging, and I found the record that is
specifically problematic:
spellpyx.
<email address hidden>
spellpyx.
<email address hidden>
7346774 1802272 904890145 904926181
spellpyx.
<email address hidden>
spellpyx.
<email address hidden>
7346774 1802272 2361701 904890145
^- This says, specifically, that there is a spellpyx.txt file, created
by edreamleo, which is at bytes from 2361701 to 904890145 in the
uncompressed block.
Logging that revision we see:
Edward K. Ream 2010-08-06
revision-
completed first draft of chapter 4
M leo/doc/LeoDocs.leo
M leo/doc/
M leo/doc/
M leo/doc/intro.txt
M leo/doc/
M leo/plugins/
Note that the revision does *not* have a revno, which means it isn't
in the ancestry of the lp:leo-editor branch.
Just writing the minimal python code to try to extract that file gives
me OOM on Windows (during zlib decompression). However, I can get the
raw zlib compressed bytes, and then use an iterating decompressor
(decompress the next 10k bytes, write them to disk, grab the next, etc.)
That leaves me with a 865MB file on disk. Then playing tricks with
truncating, etc, lets me get access to the raw content, which ends up
being...
b'b"b\'
b'b\'b"
b'b\'b\
b'b\'b\
b'b\'b\
...
And a *whole* lot more backslash characters. About 3400 lines of '\'
with the last line being 393,239 bytes long.
I have the feeling it was a generated file, and the generator went
horribly, horribly wrong.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAk6
aPAAnizsakZHoD5
=/LIR
-----END PGP SIGNATURE-----
Can you help with this problem?
Provide an answer of your own, or ask mdb for more information if necessary.