how to decompress GZIP content ?

Asked by sam on 2015-11-08

Dear Friend's
     Hello. I am new to ecap . I am developing content processing based on ecap . but I have some problems with gzip content :

how to determine chunk is gzip or plain text ?

how to decompress the chunks ?

should I compress it again to send to user ?

It would be grate to help me on these problems .

Question information

Language:
English Edit question
Status:
Solved
For:
eCAP Edit question
Assignee:
No assignee Edit question
Solved by:
sam
Solved:
2015-11-14
Last query:
2015-11-14
Last reply:
2015-11-11

This question was reopened

  • 2015-11-09 by sam
Alex Rousskov (rousskov) said : #1

> how to determine chunk is gzip or plain text?

HTTP Content-Encoding and Transfer-Encoding headers define the encoding(s) applied to the message body. RFC 7230 and RFC 7231 describe those headers. Please note that the encodings are applied to the whole body, not individual chunks. Transfer-Encoding should be handled by the host application so you probably mostly care about Content-Encoding.

> how to decompress the chunks?

You need to write code to do that. Search eCAP Answers for "gzip" to find references to some code examples. The RFCs may also contain references to documents that define gzip encoding.

> should I compress it again to send to user?

The adapter determines the encoding of the adapted message content. The adapter may change the encoding, but it is usually a good idea to preserve the original encoding if possible.

sam (sadegh-sal) said : #3

I found the The first problem solution here
         hostx->virgin().header().value();

for the second part
the inflateInit() function of zlib does not correct answer . After

 std::string chunk = vb.toString();

I passed chunk to Uncompress but the result is not correct . I use these function

int UncompressData( const Byte* abSrc, int nLenSrc, Byte* abDst, int nLenDst )
{

    z_stream zInfo ={0};
    zInfo.total_in= zInfo.avail_in= nLenSrc;
    zInfo.total_out= zInfo.avail_out= nLenDst;
    zInfo.next_in= (Byte*)abSrc;
    zInfo.next_out= abDst;

    int nErr, nRet= -1;
    nErr= inflateInit( &zInfo ); // zlib function
    if ( nErr == Z_OK ) {
        nErr= inflate( &zInfo, Z_NO_FLUSH ); // zlib function
        if ( nErr == Z_STREAM_END ) {
            nRet= zInfo.total_out;
        }
    }
    inflateEnd( &zInfo ); // zlib function
    return( nRet ); // -1 or len of output
}
 where would be my wrong ?
Thanks Alex

Alex Rousskov (rousskov) said : #4

I cannot debug your code for you, but I believe that you need to use inflateInit2() and deflateInit2() instead of the standard inflate() and deflate() functions. IIRC, the former functions allow you to deal with gzip encoding headers which are slightly different than the headers of a gzip-compressed file supported by the latter functions.

FWIW, here is a working example from the development version of Web Polygraph:

// the magic constants below are taken from zlib.h to force
// gzip header and footer for the deflated stream
static const int TheWindowBits = 15 + 16;

void zlib::Deflator::init(const int level) {
        Stream::init();
#ifdef HAVE_LIBZ
        const int res = deflateInit2(this, level, Z_DEFLATED, TheWindowBits,
                TheMemLevel, Z_DEFAULT_STRATEGY);
        if (!Should(res == Z_OK))
                theState = stError;
#endif
}

void zlib::Inflator::init() {
        Stream::init();
#ifdef HAVE_LIBZ
        const int res = inflateInit2(this, TheWindowBits);
        if (!Should(res == Z_OK))
                theState = stError;
#endif
}

sam (sadegh-sal) said : #5

The init part is done with deflateInit2(&zstream,15 + 16);
But only the first chunk will decompress well . the rest chunks inflate() return Z_DATA_ERROR .
Do I need deflateInit2() and inflateEnd() for every chunk ?

Alex Rousskov (rousskov) said : #6

No, you need to decompress the body as a whole. Individual "chunks" the adapter gets from the host application have random, meaningless boundaries.

sam (sadegh-sal) said : #7

I can copy chunks to a buffer but how can I find out I have whole body ? Does it have a flag ?

sam (sadegh-sal) said : #8

and how can I get sequence number of every chink ?

Alex Rousskov (rousskov) said : #9

Use noteVbContentDone() method.

Alex Rousskov (rousskov) said : #10

> and how can I get sequence number of every chink ?

"Chunks" do not have a notion of sequence numbers. They are just sequential pieces of a message body with essentially random sizes. If _you_ want to number them, you can maintain a counter that you will increment with each new chunk, of course.

Please note that the body pieces seen by an eCAP adapter may have nothing to do with "chunked encoding" chunks seen by the host application.

sam (sadegh-sal) said : #11

Thank you Alex :)

@Alex

As we know that noteVbContentAvailable() method inform ecap module to receive data from host-side, if the http response is chunked (Transfer-Encoding), whether each noteVbContentAvailable() method call send a whole chunked data ?

if not a whole chunked data, how could we determine a whole chunked data ? and if http response have several chunks, it is unable to split it from each other.

Need we decode the chunked data, then decode gzip data , if the Transfer-Encoding is chunk and Content-Encoding is gzip ?

@Alex

Have known the rule, host-side will de-chunk the chunks, ecap modules receive gzip stream.

But I still have a problem, I buffer the gzip stream in noteVbContentAvailable() method, and in noteVbContentDone(bool atEnd) decompress gzip stream, after that, what should I do to send the decompressed data to host-side ?

Alex Rousskov (rousskov) said : #14

> if the http response is chunked (Transfer-Encoding), whether each noteVbContentAvailable() method call send a whole chunked data ?

Not necessarily. The body pieces seen by an eCAP adapter may have nothing to do with "chunked encoding" chunks seen by the host application. Those two mechanisms are completely separate. It may happen that the host application receives the whole chunk at once and passes that whole chunk to the adapter, but there is no guarantee and there should be no expectation of that happening.

> how could we determine a whole chunked data

You cannot and you should not need to.

> Need we decode the chunked data, then decode gzip data , if the Transfer-Encoding is chunk and Content-Encoding is gzip?

The host application should dechunk. The adapter should see dechunked bytes.

> what should I do to send the decompressed data to host-side?

See https://answers.launchpad.net/ecap/+question/255988 for hints about sending adapted bodies to the host application.