import without refreshing index files from the net first

Asked by Rolf Leggewie

I have a working acng installation in my LAN. It's populated with index files for a number of releases as well as some corresponding deb files.

From time to time, I'd like to import some deb files from _import/ directory. acng always goes online first and updates all index files. Is there a way I can bypass that?

Question information

Language:
English Edit question
Status:
Expired
For:
Ubuntu apt-cacher-ng Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Eduard Bloch (edi-gmx) said :
#1

I am thinking about a tristate ATM. Maybe with a radio-button UI, sth. like

O no update
O smart update (default)
O full update

Planed for 0.9.4

Revision history for this message
Eduard Bloch (edi-gmx) said :
#2

There is, actually, a checkbutton right on top of the admin page, called

 Don't update index files (DANGEROUS for expiration tasks!)

Did you test it?

Revision history for this message
Rolf Leggewie (r0lf) said :
#3

That option is not present in jessie and Raspbian does not do the backports. I believe the Debian armhf binaries should be compatible with the Raspbian ones so I'll test that and see how it goes.

In any case, I wonder why to update the indexes at all? If the files to be imported are newer than the index they'd get left behind (and possibly imported later). It seems like a waste of bandwidth to update all indexes every time to do an import. I'd think that not updating the indexes at all would be safe, you can always force an update through checking the index from one of the clients. I don't think it's necessary for acng housekeeping to update indexes on its own.

But anyhow, why is acng doing that? I must be overlooking something.

Revision history for this message
Rolf Leggewie (r0lf) said :
#4

binary from debian backports segfaults in raspbian, so I went back to their package for now

Revision history for this message
Eduard Bloch (edi-gmx) said :
#5

If you prefer to "leave stuff behind and reimport later manually" - feel free, you could do that easily by turning off the daily job in the cron.daily script.

Otherwise: how to know which ones are trash for sure and which ones are "unknown because index is not updated" and shall be left behind? Comparing version strings might be error prone

And the old index might be not present in the archive at all so no information might be available about the remote state at all, i.e. when a nomadic user only does "apt update" once on the net then apt might fetch just the Release file, and maybe a couple of pdiff indexes but not the relevant bits.

Actually, the smart updating is not that wasteful when pdiff mechanism is in place. It works just fine for Debian (unless it runs into SHA1/SHA256 issue, see [Bug 1589231]). However, your "nice" distribution apparently does NOT offer pdiff at all.

Revision history for this message
Eduard Bloch (edi-gmx) said :
#6

Well, that question still gave me some though, and your input is really helpful!

I knowledge the possibility that there is some waste of bandwidth if the thing refetches index data. On the other hand, refetching periodically makes sense for Debian because then it's able to reconstruct index data cheaply via diff files (and the history is expired after some weeks). However, if the distro does not offer that and there are no changes in the cache then bandwidth is wasted for nightly updates.

Therefore, I think we could make some compromise here:

a) add an internal counter for the amount of data fetched by the clients
b) give the expiration code an additional option (and check) which only let's it continue if that counter reached a specific threshhold and silently exit otherwise

For example, the option says: 300MB. If the user keeps making minor updates then the trigger is reached after a couple of weeks. In the meantime, there might be deprecated packages in the cache but we don't really care about less than 300MB.
However, the other date the user might install 400MB of updates (say, a new latex release was rolled out, and the cached versions are deprecated), then the update would start, fetch new indexes, wipe old packages. And then stay dormant again until 300MB is reached.

I think I like that approach. Would also be beneficial for Debian because over years the index data reached a size where processing time (CPU) is more than peanuts.

Revision history for this message
Rolf Leggewie (r0lf) said :
#7

I don't think I understand every comment you made.

I do agree that pdiffs are awesome and I wish that Ubuntu had them (bug 214612, if you care to mark it as affecting you). Raspbian doesn't use it either, so pdiff is a debian-only thing as of now.

I don't see how cron.daily plays into my request which is about importing while cron.daily is about expiring. Turning off that cronjob will do nothing for importing. I would indeed like to 'If you prefer to "leave stuff behind and reimport later manually" ' but I do not think there is a way currently. That's what this question is about.

> Otherwise: how to know which ones are trash for sure and which ones are "unknown because index is not updated"
> and shall be left behind? Comparing version strings might be error prone

Why do you need to know? AFAIK, debs to be imported are left behind by default. No harm done.

My heuristic would be: acng does no index updates on its own for house-keeping. Indexes are refreshed only when a client requests the index file and it is out of date. Why does acng need to update index files for housekeeping? For importing it might only have two "errors", both of them tolerable. A) importing a deb that's already superseded or otherwise deleted upstream or b) not importing a deb file yet. b) would be fixed the next time the index is updated and another import triggered. Worst case is the file gets fetched over the network even though it's already in _import.

On top of that you might implement your 300 MB idea. Be careful though to also update index files whenever a 404 occurs. Imagine the following. Index is updated on day 1. Files requested from clients is 50 MB daily (to keep things simple). On day 3, foo_1.2-1_all.deb is replaced with foo_1.3_1.all.deb upstream and the former package is deleted. On day 4 (index file is still out of date) a client requests foo_1.2-1_all.deb and the server responds with 404. This would need to trigger an update to the corresponding index file or else the client will see a faulty 404 error. The client would still see the error but be able to recover from it upon updating its local index.

Revision history for this message
Rolf Leggewie (r0lf) said :
#8

I forgot to discuss the second case of housekeeping, expiry. Worst case with an outdated index file is that it would keep files in the local cache even though they are already gone upstream. No real harm done.

Revision history for this message
Launchpad Janitor (janitor) said :
#9

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

Revision history for this message
Rolf Leggewie (r0lf) said :
#10

Any comment, Eduard?

Revision history for this message
Launchpad Janitor (janitor) said :
#11

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

Revision history for this message
Eduard Bloch (edi-gmx) said :
#12

Hi, sorry, maybe I misunderstood the original question.
And I am still confused. There are now different questions in the discussion, which one do you mean?

The one in the topic -> easy, set the checkbox

Index file in the _import folder -> I will add to the wishlist. The code could attempt to include them, there are now better means for that after the last overhaul.

File remaining in _import folder -> whatever you want... remove them if they could not be picked up, or store them in your safe, ...

Updating index files regularly for daily jobs -> probably waste of bandwidth in some cases and should be changed, as described.

Revision history for this message
Rolf Leggewie (r0lf) said :
#13

> There is, actually, a checkbutton right on top of the admin page, called
>
> Don't update index files (DANGEROUS for expiration tasks!)

When was that introduced? I compiled the jessie-backports package for Raspbian now and that knob isn't there.

We touched on a few, related the areas but the original question is still how I can import stuff without downloading a bunch of index files (on a jessie system).

Revision history for this message
Rolf Leggewie (r0lf) said :
#14

> The one in the topic -> easy, set the checkbox

Nope, as the checkbox isn't there

Revision history for this message
Launchpad Janitor (janitor) said :
#15

This question was expired because it remained in the 'Open' state without activity for the last 15 days.