Adapting gzip content

Asked by Matt Cochran

I've created an adapter that works fine for scenarios where the content is in text format, but of course when I get gzip'ed content I don't find what I'm looking for. Are there any simple examples of modifying gzip'ed content anywhere? What I need to do is extract the content to text, insert a string, and either send to the requestor the content either as text or gzip'ed back again.

One of the things I'm concerned about is whether I can just uncompress every chunk, or do I need to collect all the chunks and the uncompress the whole set together.

Question information

Language:
English Edit question
Status:
Solved
For:
eCAP Edit question
Assignee:
No assignee Edit question
Solved by:
Alex Rousskov
Solved:
Last query:
Last reply:
Revision history for this message
Alex Rousskov (rousskov) said :
#1

There is a gzip eCAP adapter at http://code.google.com/p/squid-ecap-gzip/

You can uncompress and compress gzip content by chunks. The above eCAP adapter appears to do that, and we have done that elsewhere (e.g., Web Polygraph benchmark and various Traffic Spicer ICAP server adapters). I believe our approach did not involve so many low-level buffer manipulations as found in the above eCAP adapter (we used public gzip APIs instead), but there may be advantages to this low-level manipulation. Both the above eCAP adapter and Polygraph licenses allow free reuse of the code.

Except for header/footer manipulation which I find rather annoying, the [de]compression code itself is not complex, but keep in mind that you cannot, in general, decompress the entire chunk you have just received from the host application -- you have to "stream" incoming data through the decompressing code which may need more data to finish decompressing a chunk than is currently available. On the other hand, accumulating the entire response body is usually not necessary and can be harmful if the body is large.

Finally, please keep in mind that there are other encodings besides identity and gzip, some of which are not standard.

Revision history for this message
Matt Cochran (matthewcochran) said :
#2

Thanks Alex. I looked at the Web Polygraph code available online and I couldn't find where any gzip manipulation was happening, could you point me in the right direction?

Revision history for this message
Best Alex Rousskov (rousskov) said :
#3

polygraph/src/csm/GzipEncoder.{cc,h} can be used as a starting point. You could also do something like

    fgrep -RIi gzip polygraph/src

to find other related code. For configuration options, see http://www.web-polygraph.org/docs/userman/compression.html

Revision history for this message
Matt Cochran (matthewcochran) said :
#4

Thanks Alex Rousskov, that solved my question.

Revision history for this message
sam (sadegh-sal) said :
#5

 http://www.web-polygraph.org/docs/userman/compression.html is only Compression . is there any way to decompress chunks ?

Revision history for this message
Alex Rousskov (rousskov) said :
#6

Yes, there is a way to decompress chunks. Public Polygraph does not contain such examples (because Polygraph does not need to decompress) but you can find them elsewhere.

Revision history for this message
sam (sadegh-sal) said :
#7

when I try to uncompress the gzip chunks I face segmentation fault . Would you please help me to find an example ?

Revision history for this message
Alex Rousskov (rousskov) said :
#8

Sorry, I cannot help at that level.