"git-ubuntu clone" should also preemptively fork the repository to the user's account?

Bug #2044575 reported by Steve Langasek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
git-ubuntu
New
Undecided
Unassigned

Bug Description

When working on a large repository, the 'git-ubuntu prepare-upload' stage can be quite expensive in terms of both data transfer and time (and therefore developer attention) because under the current workflow, we have to push a branch with full history to the repository under the user's namespace (which by default does not yet exist).

It is my understanding that doing things this way is also more expensive in terms of launchpad server storage, because when forking the repository Launchpad is able to deduplicate the content, but when pushing separately it is not.

Perhaps at `git-ubuntu clone` time, we should pre-emptively fork the repository (https://code.launchpad.net/~git-ubuntu-import/ubuntu/+source/<pkg>/+git/<pkg>/+fork) into the user's namespace, so that any pushes are made cheap?

Doing this only at push time would be too late, because the fork operation does not immediately populate the target repository, so would still have the problem of slowing the developer down.

And maybe this becomes less relevant once staging branches are available.

Revision history for this message
Robie Basak (racb) wrote :

I was under the impression that Launchpad is supposed to automatically share objects and therefore the push should be quick in the case that you are pushing to a repository whose target already has a default repository (and therefore that default repository should be used as the base). IOW, I was told that this should magically work anyway. But apparently it doesn't - I've never seen push times and bandwidth use suggest that this is actually happening.

I've asked the Launchpad team for guidance: https://answers.launchpad.net/launchpad/+question/708879

Revision history for this message
Clinton Fung (clinton-fung) wrote :

Implementing the suggestion would certainly result in an increase in storage, and I don't have numbers, but it does intuitively feel like a penalty that would have to be incurred anyway at some point in time.

As mentioned in https://answers.launchpad.net/launchpad/+question/708879 I don't think there is any kind of automagic object sharing (though I have to do some work to confirm this).

Once I've determined if there is indeed meant to be some kind of automagic object sharing, if it isn't working we will work to remediate that. If not, I think it warrants further analysis and conversation, as I think the lack of data prevents useful insights about storage usage (from me, at least).

Revision history for this message
Steve Langasek (vorlon) wrote :

Even if there is magic object sharing, AIUI that is necessarily only server-side - i.e. when you fork the repository, launchpad knows what you forked, but if you create a new repository via 'git push' it's a blank repo and contains only the bits that you're pushing... and if you're pushing a LOT, launchpad basically has no chance to deduplicate until the push is done anyway.

And even if there's no magic object sharing, a server-side fork is still going to be a lot faster than pushing e.g. a 500MB git repo up.

Revision history for this message
Steve Langasek (vorlon) wrote :

(or a 1.1GB repo in the case of libabigail, to pick a random package...)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.