How to collect all chunks together

Asked by Sandeep Kuttal on 2010-11-08

Hi Alex,

Today just i found a fundamental flaw in the code I wrote for ecap. My code is just looking at the json data exchanged. I was filtering some contents by looking at the contents. The code was working fine for small length json data. But as soon as increased the data it was not giving all the contents.

I figured out that the buffer (in file) when I am adding chunk to it has just the first chunk and latter I am filtering that first chunk and getting contents. Can you kindly suggest how I can get all the related chunks first and then start filtering the contents. I am using the following code in start() of the file..


                                  const libecap::Area vb = hostx->vbContent(0, libecap::nsize); // get all vb
                                 std::string chunk = vb.toString(); // expensive, but simple
                                 hostx->vbContentShift(vb.size);//wehave copy do not need vb at all
                                 buffer += chunk; // buffer what we got
                                // here I used the buffer to do filter the contents from it

since this code is not accumulating all the contents just has the chunk is filtering for the chunk. Can you kindly suggest?

Question information

English Edit question
eCAP Edit question
No assignee Edit question
Solved by:
Sandeep Kuttal
Last query:
Last reply:
Alex Rousskov (rousskov) said : #1

You may accumulate incoming chunks in a dedicated transaction member and then process the accumulated contents when the last chunk comes in. To handle large files efficiently, you would need to write more complex code that does not accumulate too much data.

This kind of processing is not specific to eCAP; it is very similar to processing content being read from a disk file or network socket.

Sandeep Kuttal (skuttal) said : #2

Hi Alex,

Thanks for your response. I tried to play around with the code...but I am unable to get it working.
I am not sure is there some problem with my logic or the way I am approaching it.

I changed the code to following thinking that it will add all chunks to the buffer (defined as string) until it finds an empty chunk (i.e. all the chunks are gathered) Then I can process the buffer, which would be sum of all the chunks.

This is what I tried:

 libecap::Area vb;
std::string chunk;

  do {
               vb = hostx->vbContent(0, libecap::nsize); // get all vb
               chunk = vb.toString(); // expensive, but simple
               hostx->vbContentShift(vb.size);//wehave copy do not need vb at all
               buffer += chunk; // buffer what we got
 }while (chunk != "");

But, it seems the chunks are not getting added in to buffer. When I am printing the chunk, it is just showing the first 2785 characters.
(so it seems the max size of a single chunk is 2785).
The remaining data (which should be in separate chunks) are some how getting lost or I am unable to add or get them to form a single string.

Can you please recommend the best way to add all the chunks in to a single string?

Thanks a lot.

Alex Rousskov (rousskov) said : #3

You should add chunks when they come in, one chunk per noteVbContentAvailable() call.

There should not be an explicit do-while loop like yours inside the noteVbContentAvailable() method because your hostx->vbContent() call extracts all currently available virgin body content in one call. There will be nothing to extract on the second iteration of that do-while loop.

However, there will be more noteVbContentAvailable() calls if there is more virgin content coming, until noteVbContentDone() is called. There is an implicit loop comprised of noteVbContentAvailable() calls and finished with the noteVbContentDone() call.

Sandeep Kuttal (skuttal) said : #4

Thanks Alex for a brief explanation.... Actually I have used the code to collect chunks in the start() ...since i want to collect the chunks only related to json data that I am doing by checking the header in the start function. If the contents are there then only i use them to filter. Actually i was interested only in the json contents for filtering that's why I kept the code in the start.

So is it possible to do the same by using in noteVbContentDone() that I can just look for the json data?

Thanks a lot

Sandeep Kuttal (skuttal) said : #5

Hi Alex,

I know that sounds weird that I am looking at chunk in the start() but I wanted to save the contents according to the uri. Hence that's why i looked for the json data and started taking chunks. My all code for taking data out depends on the uri and json data contents. Is there a way to fix this like the way I am doing now? or any other recommendations?

Thanks a Lot

Alex Rousskov (rousskov) said : #6

You cannot reliably get all the contents in start() because the contents may not be available at transaction start time. You may make the decision to collection contents in start() and then proceed with the collection as suggested above.

Sandeep Kuttal (skuttal) said : #7

Thanks Alex, for giving insight... I was able to collect chunks... Thanks a lot