eCAP

Adapting gzip content

Asked by Matt Cochran on 2011-10-18

I've created an adapter that works fine for scenarios where the content is in text format, but of course when I get gzip'ed content I don't find what I'm looking for. Are there any simple examples of modifying gzip'ed content anywhere? What I need to do is extract the content to text, insert a string, and either send to the requestor the content either as text or gzip'ed back again.

One of the things I'm concerned about is whether I can just uncompress every chunk, or do I need to collect all the chunks and the uncompress the whole set together.

Question information

Language:: English Edit question

Status:: Solved

For:: eCAP Edit question

Assignee:: No assignee Edit question

Solved by:: Alex Rousskov

Solved:: 2011-10-23

Last query:: 2011-10-23

Last reply:: 2011-10-23

Link existing bug

Revision history for this message

Alex Rousskov (rousskov) said on 2011-10-18:

There is a gzip eCAP adapter at http://code.google.com/p/squid-ecap-gzip/

You can uncompress and compress gzip content by chunks. The above eCAP adapter appears to do that, and we have done that elsewhere (e.g., Web Polygraph benchmark and various Traffic Spicer ICAP server adapters). I believe our approach did not involve so many low-level buffer manipulations as found in the above eCAP adapter (we used public gzip APIs instead), but there may be advantages to this low-level manipulation. Both the above eCAP adapter and Polygraph licenses allow free reuse of the code.

Except for header/footer manipulation which I find rather annoying, the [de]compression code itself is not complex, but keep in mind that you cannot, in general, decompress the entire chunk you have just received from the host application -- you have to "stream" incoming data through the decompressing code which may need more data to finish decompressing a chunk than is currently available. On the other hand, accumulating the entire response body is usually not necessary and can be harmful if the body is large.

Finally, please keep in mind that there are other encodings besides identity and gzip, some of which are not standard.

Revision history for this message

Matt Cochran (matthewcochran) said on 2011-10-22:

Thanks Alex. I looked at the Web Polygraph code available online and I couldn't find where any gzip manipulation was happening, could you point me in the right direction?

Revision history for this message

Alex Rousskov (rousskov) said on 2011-10-23:

polygraph/src/csm/GzipEncoder.{cc,h} can be used as a starting point. You could also do something like

fgrep -RIi gzip polygraph/src

to find other related code. For configuration options, see http://www.web-polygraph.org/docs/userman/compression.html

Revision history for this message

Matt Cochran (matthewcochran) said on 2011-10-23:

Thanks Alex Rousskov, that solved my question.

Revision history for this message

sam (sadegh-sal) said on 2015-11-08:

http://www.web-polygraph.org/docs/userman/compression.html is only Compression . is there any way to decompress chunks ?

Revision history for this message

Alex Rousskov (rousskov) said on 2015-11-08:

Yes, there is a way to decompress chunks. Public Polygraph does not contain such examples (because Polygraph does not need to decompress) but you can find them elsewhere.

Revision history for this message

sam (sadegh-sal) said on 2015-11-09:

when I try to uncompress the gzip chunks I face segmentation fault . Would you please help me to find an example ?

Revision history for this message

Alex Rousskov (rousskov) said on 2015-11-09:

Sorry, I cannot help at that level.

To post a message you must log in.

Ask a question

Edit question

eCAP

Adapting gzip content

Question information

Related bugs

Related FAQ:

Subscribers