Repository.get_stream_1.19 blocks for a long time before sending

Bug #526825 reported by Martin Pool
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
High
Unassigned

Bug Description

If you try to branch from lp:mysql-server into an empty 2a repository, then you get bug 388269 but you also get a multi-minute delay before the server starts to stream data, which looks to the user like the process is either hung or not showing a progress indicator. The bzr log on the client shows:

109.527 hpss call w/body: 'Repository.get_stream_1.19', '~mysql/mysql-server/mysql-trunk/', 'Bazaar repository format 2a (needs bzr 1.16 or later)\n' ('search\<email address hidden>-'...)
109.527 63 bytes
508.836 result: ('ok',)
508.837 42 byte part read
508.878 48 byte part read
530.368 13169574 byte part read
530.425 inserting substream: texts
966.875 8206666 byte part read
1249.593 9735616 byte part read
1553.636 8098813 byte part read
1683.444 6894780 byte part read
1902.080 19520443 byte part read

During that 400s delay before we get the start of the reply, strace shows that the client is just blocked in read(2), waiting for something to come back from ssh. The server needs to be fixed to send smaller chunks.

There are probably multiple dupes of this.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 526825] [NEW] Repository.get_stream_1.19 blocks for a long time before sending

On Wed, 2010-02-24 at 03:41 +0000, Martin Pool wrote:
> Public bug reported:
>
> If you try to branch from lp:mysql-server into an empty 2a repository,
> then you get bug 388269 but you also get a multi-minute delay before the
> server starts to stream data, which looks to the user like the process
> is either hung or not showing a progress indicator. The bzr log on the
> client shows:

bzr should be showing traffic activity, according to your log 13MB of
data is copied in that 400 second window.

> During that 400s delay before we get the start of the reply, strace
> shows that the client is just blocked in read(2), waiting for something
> to come back from ssh. The server needs to be fixed to send smaller
> chunks.

It appears to return a single 13MB section, then a stream. Thats a
pretty large block of data. It may not be possible to sensibly break it
into smaller bits, if it is the revision graph (and I suspect it is) -
but we don't need smaller bits to show a better UI.

> There are probably multiple dupes of this.

I don't think there are, this looks new.

-Rob

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 526825] [NEW] Repository.get_stream_1.19 blocks for a long time before sending

On 24 February 2010 15:02, Robert Collins <email address hidden> wrote:
> On Wed, 2010-02-24 at 03:41 +0000, Martin Pool wrote:
>> Public bug reported:
>>
>> If you try to branch from lp:mysql-server into an empty 2a repository,
>> then you get bug 388269 but you also get a multi-minute delay before the
>> server starts to stream data, which looks to the user like the process
>> is either hung or not showing a progress indicator.  The bzr log on the
>> client shows:
>
> bzr should be showing traffic activity, according to your log 13MB of
> data is copied in that 400 second window.

As I said, strace shows there is a multi-minute lag before the client
gets any data from the network, therefore it can't show activity.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 526825] [NEW] Repository.get_stream_1.19 blocks for a long time before sending

On Wed, 2010-02-24 at 05:20 +0000, Martin Pool wrote:

> As I said, strace shows there is a multi-minute lag before the client
> gets any data from the network, therefore it can't show activity.

My guess is that this is poor performance of the mysql repository on
launchpad - and likely correctable simply by switching to 2a.

Determining what texts to send (and texts are the first chunk of a
stream) requires examining all the inventories to and doing a set
difference - there isn't a partial result available until this is done.
We could look at making that core function a generator and/or special
case this code path, but I'd be very worried about the potential for
regressions in other areas.

-Rob

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 526825] [NEW] Repository.get_stream_1.19 blocks for a long time before sending

If it turns out that this is specific to pre-2a repository formats I'd
be ok with marking it wontfix and recommending an upgrade. But there
is some kind of bug here.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 526825] [NEW] Repository.get_stream_1.19 blocks for a long time before sending

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> If it turns out that this is specific to pre-2a repository formats I'd
> be ok with marking it wontfix and recommending an upgrade. But there
> is some kind of bug here.
>

I worked pretty hard to make sure that GC => GC doesn't do a pre-fetch
of all the inventories (find_texts_affected_by_...), and streams the
inventories and reads the texts it needs as it goes.

However, 2a => yet-another-format may suffer from this, because the
Generic fetcher does.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuF2+kACgkQJdeBCYSNAAMENwCgz0mA0XlHovgmGR8FkCI0fjsX
LasAoLTauuQJKlWSLRlZ6SB0OcSOBKBf
=SUfo
-----END PGP SIGNATURE-----

Revision history for this message
Andrew Bennetts (spiv) wrote :

I think I've seen less severe forms of this with 2a branches too. After the server receives a Repository.get_stream_1.19 request it has to do some work before it can start streaming records back: figuring out the set of revisions described by the request's params at least. For an 'unordered' or 'groupcompress' stream ordering, I'd hope this preliminary work is quite fast, and then goes straight into streaming records from the repo.

But as Robert says, for cross-format streams I think the ordering sends texts first, so that means a lot of reading of revisions->inventories->texts to find the texts-changed-in-revs data before it even starts, so I suspect that's what's happening here. It may be possible to improve.

Revision history for this message
Parth Malwankar (parthm) wrote :

So I did some more experimentation with Bug #534148 (which is marked as dup of this).

The repository format for my shared repo was 2a with bzr 2.1 on ubuntu 9.10.
I waited for a long time trying to see what happens happens after the
multi-minute delay but things didn't really move even after waiting for 2hrs so I
killed bzr with a ^C.

[tmp]% bzr init-repo parrot-repo
Shared repository with trees (format: 2a)
Location:
  shared repository: parrot-repo
[tmp]% cd parrot-repo
[parrot-repo]% time bzr -Dhpss branch lp:parrot trunk
bzr: interrupted
HPSS calls: 25 (0 vfs) SmartSSHClientMedium(bzr+ssh://<email address hidden>/)
bzr -Dhpss branch lp:parrot trunk 244.09s user 2.09s system 2% cpu 2:35:01.91 total
[parrot-repo]%

bzr.log attached.
?field.comment=So I did some more experimentation with Bug #534148 (which is marked as dup of this).

The repository format for my shared repo was 2a with bzr 2.1 on ubuntu 9.10.
I waited for a long time trying to see what happens happens after the
multi-minute delay but things didn't really move even after waiting for 2hrs so I
killed bzr with a ^C.

[tmp]% bzr init-repo parrot-repo
Shared repository with trees (format: 2a)
Location:
  shared repository: parrot-repo
[tmp]% cd parrot-repo
[parrot-repo]% time bzr -Dhpss branch lp:parrot trunk
bzr: interrupted
HPSS calls: 25 (0 vfs) SmartSSHClientMedium(bzr+ssh://<email address hidden>/)
bzr -Dhpss branch lp:parrot trunk 244.09s user 2.09s system 2% cpu 2:35:01.91 total
[parrot-repo]%

bzr.log attached.

Revision history for this message
Parth Malwankar (parthm) wrote :

So I tried branching lp:parrot without killing it. It times out after about 3hrs.

[parrot-repo]% time bzr -Dhpss branch lp:parrot trunk
Read from remote host bazaar.launchpad.net: Connection timed out
bzr: ERROR: Connection closed: Unexpected end of message. Please check connectivity and permissions, and report a bug if problems persist.
HPSS calls: 25 (0 vfs) SmartSSHClientMedium(bzr+ssh://<email address hidden>/)
bzr -Dhpss branch lp:parrot trunk 248.26s user 2.69s system 2% cpu 3:07:26.69 total

Attached is the strace log and bzr.log

Revision history for this message
Parth Malwankar (parthm) wrote :
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 526825] Re: Repository.get_stream_1.19 blocks for a long time before sending

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Parth Malwankar wrote:
> ** Attachment added: "timeout bzr.log attached"
> http://launchpadlibrarian.net/40614870/bzr.log
>

So to start with, part of the problem is that you are doing on-the-fly
conversion, and then your network connection is probably dropping since
the lp side is being slow to send data packets to you.

If you did:

bzr branch lp:parrot --no-tree local_parrot
bzr init-repo --2a 2a_parrot
bzr branch local_parrot 2a_parrot

Things would probably work better for you. However, also note that if
'lp:parrot' isn't in 2a format, then you can't merge a 2a commit back
into it. (rich-root watershed, etc.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuWbukACgkQJdeBCYSNAAMWTQCfQ9ogQBAikXvU8OnP+mKHBsUZ
0rgAn3wmhtxLUyf35bepx8SF8iruojVn
=is8W
-----END PGP SIGNATURE-----

Revision history for this message
Parth Malwankar (parthm) wrote : Re: [Bug 526825] Re: Repository.get_stream_1.19 blocks for a long time before sending

On Tue, Mar 9, 2010 at 9:23 PM, John A Meinel <email address hidden> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Parth Malwankar wrote:
>> ** Attachment added: "timeout bzr.log attached"
>>    http://launchpadlibrarian.net/40614870/bzr.log
>>
>
> So to start with, part of the problem is that you are doing on-the-fly
> conversion, and then your network connection is probably dropping since
> the lp side is being slow to send data packets to you.
>
> If you did:
>
> bzr branch lp:parrot --no-tree local_parrot
> bzr init-repo --2a 2a_parrot
> bzr branch local_parrot 2a_parrot
>
> Things would probably work better for you. However, also note that if
> 'lp:parrot' isn't in 2a format, then you can't merge a 2a commit back
> into it. (rich-root watershed, etc.)
>
> John
> =:->

Thanks John.
The --no-tree workaround suggested by you worked.

[tmp]% time bzr branch lp:parrot --no-tree local_parrot
Branched 36837 revision(s).
bzr branch lp:parrot --no-tree local_parrot 209.87s user 7.67s system
5% cpu 1:09:22.15 total
[tmp]%

You are right about the parrot repository format. Its Bazaar pack
repository format 1 (needs bzr 0.92).

Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
tags: removed: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.