squid not caching ecap-modified response

Asked by Jonathan Wolfe on 2011-01-26

Using libecap 0.0.3 and squid 3.1.10, I am adjusting the adapter_modifying.cc example to do a string replacement that either vastly reduces (or greatly increases) the size of my response. When I modify very little or nothing, squid properly caches the second request after the first MISS. When I change the content significantly, it doesn't cache - I get MISSes every time. In my test, I have a string, say "asdf", repeated a few thousand times in the body, and my modifying adapter (victim of "asdf", replacement of, say, just "a") replaces and shortens the total response significantly.

Essentially I'm trying to get squid to cache the gzip ecap module's output, which is significantly smaller than the original response, so I created this test case to represent the change in content size and to make sure the ecap module wasn't varying in some other way from the modifying example code. It sure seems that the modifying example code only caches when the response is the same size (or close?) to the original body. Without passing through ecap, the request properly caches after the first MISS - the Expires and Cache-Control headers pass through just fine.

Question information

Language:
English Edit question
Status:
Solved
For:
eCAP Edit question
Assignee:
No assignee Edit question
Solved by:
Jonathan Wolfe
Solved:
2011-02-04
Last query:
2011-02-04
Last reply:
2011-01-26
Alex Rousskov (rousskov) said : #1

I am not aware of any specific eCAP effects on caching. The best thing to do here, IMO, is to study Squid cache.log with full debugging enabled (debug_options ALL,9) and see why the response is not cached during the first MISS. It could be the lack of Content-Length header and/or the timing of the body pieces arrival at the caching decision point.

Note that the second request is irrelevant assuming your response does not expire soon. You should be able to see the problem/difference by looking at reaction to the first request only. You might try "egrep -A 3 -i 'hit|miss' cache.log" as a starting point.

Please update this question if you figure out the reason or get more relevant info.

Right on:

2011/01/26 15:27:20.137| StoreEntry::checkCachable: NO: wrong content-length

Well that explains it. However, when replacing the body with the same length (replacing "asdf" with "fdsa", for example), the response caches even though the Content-Length header is still removed by the plugin. Apparently squid is checking the length against the original, or checking that a correct length header is present (I haven't tested this one yet).

One option is to accumulate without yet returning headers, zip, then include a length header and the new body, but that's not very efficient. (Interestingly, having the whole body would likely end up with slightly better compression.) Any suggestions for a more efficient means of caching a zipped response? Zipping before getting to squid is one option, I imagine. Capping the zipping to apply only below a certain original length is another, maybe.

Alex Rousskov (rousskov) said : #3

Squid does not (and should not) compare virgin and adapted responses. The adapted response should be treated as if it was received from the origin server.

It is possible that Squid incorrectly restores the content-length information after getting the adapted response. What does
  fgrep -10 storeEntryValidLength: cache.log
say for the relevant cases?

When it works ("asdf" to "fdsa" for example):

2011/01/26 16:07:21.312| storeEntryValidLength: Checking '1078B4E8EC1D17CFEBCD533EE19F7FD6'
2011/01/26 16:07:21.312| storeEntryValidLength: object_len = 20366
2011/01/26 16:07:21.312| storeEntryValidLength: hdr_sz = 360
2011/01/26 16:07:21.312| storeEntryValidLength: content_length = 20006

When it doesn't work ("asdf" to just "a"):

2011/01/26 16:05:59.878| storeEntryValidLength: Checking '1078B4E8EC1D17CFEBCD533EE19F7FD6'
2011/01/26 16:05:59.878| storeEntryValidLength: object_len = 14843
2011/01/26 16:05:59.878| storeEntryValidLength: hdr_sz = 360
2011/01/26 16:05:59.878| storeEntryValidLength: content_length = 20006
2011/01/26 16:05:59.878| storeEntryValidLength: 5523 bytes too small; '1078B4E8EC1D17CFEBCD533EE19F7FD6'

The headers returned in both cases don't actually include a Content-Length header, which is removed by the module. It looks like squid is restoring the content length in the second case, and declaring it too small.

Alex Rousskov (rousskov) said : #5

Accumulate-all-then-zip is the simplest approach. It is indeed rather RAM-inefficient and user-unfriendly from response time point of view. The best solution, IMO, is accumulate-some-and-zip-as-you-go, with a minimum accumulation size and maximum accumulation delay as optimization parameters.

The host application, such as Squid, can cache and forward responses without knowing their final length. However, there can be solvable technical limitations such as some cache storage systems that require the length upfront because they cannot cache large responses.

Alex Rousskov (rousskov) said : #6

This is a Squid bug. HttpMsg cached values such as content_length and cache_control are not kept in sync when the message header is modified by the adapter (e.g., a Content-Length header is removed and/or added). Consider filing a bug report with the Squid project (with a link to this Question).

To keep you going, try the two untested additions below (add those new lines by hand if needed), but this is _not_ the right or comprehensive fix:

=== modified file 'src/adaptation/ecap/MessageRep.cc'
--- src/adaptation/ecap/MessageRep.cc 2010-05-26 03:06:02 +0000
+++ src/adaptation/ecap/MessageRep.cc 2011-01-26 21:43:36 +0000
@@ -44,24 +44,30 @@
 void
 Adaptation::Ecap::HeaderRep::add(const Name &name, const Value &value)
 {
     const http_hdr_type squidId = TranslateHeaderId(name); // HDR_OTHER OK
     HttpHeaderEntry *e = new HttpHeaderEntry(squidId, name.image().c_str(),
             value.toString().c_str());
     theHeader.addEntry(e);
+
+ // XXX: this is too often and not enough, on many levels
+ theMessage.content_length = theHeader.getInt64(HDR_CONTENT_LENGTH);
 }

 void
 Adaptation::Ecap::HeaderRep::removeAny(const Name &name)
 {
     const http_hdr_type squidId = TranslateHeaderId(name);
     if (squidId == HDR_OTHER)
         theHeader.delByName(name.image().c_str());
     else
         theHeader.delById(squidId);
+
+ // XXX: this is too often and not enough, on many levels
+ theMessage.content_length = theHeader.getInt64(HDR_CONTENT_LENGTH);
 }

 libecap::Area
 Adaptation::Ecap::HeaderRep::image() const
 {
     MemBuf mb;
     mb.init();

Accumulate-some-and-zip-as-you-go is what the gzip module is doing right now, I think. Is squid restoring the original content length after getting the adapted response here, per the "X bytes too small" message? If it is, that's going to prevent it from caching the response without knowing the final length, isn't it? Or are you saying that squid's cache size configuration means it has to check length up front?

Nice, thanks for the help in narrowing down what's going on. I'll definitely take this up with the squid folks.

Squid bug #3149 filed with patch attached and submitted to squid dev team. Thanks Alex!