Regarding Caching of POST messages

Asked by Sandeep Kuttal

Hi,

I am a PhD student and I am looking forward on mashups for providing better mashups for the endusers. My research needs to cache or save the GET as well as POST messages to get important information to me interpretted latter. So I want to save them on hardisk and then analyze them and comeup with some meaningfull results.

To achieve this I am using squid 3.1.1 as the proxy and ecap so as to acheive the caching of the POST messages (since squid already caches get messages but not POST) Hence I know I need to look for REQMOD and RESPMOD vector to achieve this( what ever I have tried to google and RFC knowledge) But I am not sure how should I start with this. Will someone kindly suggest how to start with.

Till now I have installed ecap and squid 3.1.1 and made changes to squid.conf file so as to enable the ecap. But how should and where should I go next to either cache or access the messages so as to just get the body of message and then dump to hard disk.

Kindly suggest. Thanks in advance for your time and patience for looking at my problem.

Thanks
Sandeep

Question information

Language:
English Edit question
Status:
Solved
For:
eCAP Edit question
Assignee:
No assignee Edit question
Solved by:
Sandeep Kuttal
Solved:
Last query:
Last reply:

This question was reopened

Revision history for this message
Alex Rousskov (rousskov) said :
#1

I would recommend using a sample adapter from e-cap.org as a starting point. You may need to patch it based on the open bug reports here on lp.

IIRC, sample adapters work with response message bodies. You will need to change Squid configuration and the adapter a little for your adapter to work with the request body (on REQMOD) and response body (on RESPMOD).

It sounds like you will need to tie request and responses together. This can be done via custom request headers that the adapter will add to requests and Squid will filter out before sending to the next hop (if needed). Not trivial, but doable.

Revision history for this message
Sandeep Kuttal (skuttal) said :
#2

Hi Alex,

First of all I am really very thankful for helping me in getting squid started so now my next task is to save the body of messages going and coming to a local file on hard disk for analysis. Actually I want it for specifically for yahoo server only. But at this time I am just concentrating on any message going and coming. For that I made few changes in adapter_sample-0.0.2/src/adapter_passthru.cc to buffer the messages and put in a file on disk. For this I am not able to create even file so I am not sure if I am missing something in config file or there is some other way to approach this.
Till now I have configured, make and installed the adapters but I am not sure whether the config file will automatically take it or there is something else need to be done.
Thanks once again for all you time and help.
The config file is as under

http_access allow internal_network

ecap_enable on

ecap_service eReqmod reqmod_precache 0 ecap://e-cap.org/ecap/services/sample/passthru
ecap_service eRespmod respmod_precache 0 ecap://e-cap.org/ecap/services/sample/passthru

loadable_modules /usr/local/lib/ecap_adapter_passthru.so
adaptation_service_set reqFilter eReqmod
adaptation_service_set respFilter eRespmod

adaptation_access respFilter allow all
adaptation_access reqFilter allow all

or do I need to follow some other steps to get it going
Thanks a lot again for you time and effort

Thanks
Sandeep

Revision history for this message
Sandeep Kuttal (skuttal) said :
#3

another quick question will i be able to access both get as well as POST messages with this approach.
Thanks
Have a Nice Day!

Revision history for this message
Alex Rousskov (rousskov) said :
#4

The Squid config file is read when Squid starts and during reconfiguration, but that is not directly related to libecap.

Yes, by default, an eCAP REQMOD service should receive all HTTP requests that the host application (e.g., Squid).

The adapter can do virtually anything, including file creation. If yours cannot create a file, it is probably not related to eCAP because eCAP does not deal with file creation. I would recommend logging file creation errors and addressing them accordingly.

Revision history for this message
Sandeep Kuttal (skuttal) said :
#5

Thanks a Lot. Just now i was able to create file but couldn't get contents which must be passed. I am working on it. I modified adapter_modifing.cc

kindly do give you valuable comments
thanks

Revision history for this message
Sandeep Kuttal (skuttal) said :
#6

Hi Alex, Another quick question.. what adapter is better for my problem.. to work with adapter_modifing.cc or passthru . Since I saw in old questions someone modified the code for passthru and I am doing for modifing..

Revision history for this message
Alex Rousskov (rousskov) said :
#7

You need to do so much that it does not really matter which sample you start with. They are all very simple/basic. If you started with the modifying adapter, you might as well keep going with your code. By the time you are finished, there will be virtually nothing of the original sample left.

Revision history for this message
Sandeep Kuttal (skuttal) said :
#8

Hi All,

I was trying to dump the body contents of the message into a file. In adapter_modify.cc in function noteVbContentAvailable I tried to save the contents of the buffer as soon as they are available. But the file I am getting is having very different format. Which is not understandable. Everytime I start browser window it gives some new contents but not understandable. So am I missing some very basic understanding or there is some format which need to be taken care of. One thing I observed is this file contains similar contents if I try to look in squid cache files.

Kindly suggest... what need to be looked at. Thanks a lot for prompt response.

Thanks a Lot
Sandeep

Revision history for this message
Sandeep Kuttal (skuttal) said :
#9

Another point for any site the browser takes little bit longer time to open and by the time it opens the file contains the following contents
C "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid^M
response from an upstream server.<br />^M
The proxy server could not handle a request <em><a href="/">POST&nbsp;/</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>

Revision history for this message
Alex Rousskov (rousskov) said :
#10

I recommend that you start with a single and very basic plain text web page (no images, CSS, etc.). Do one HTTP transaction at a time, using wget, curl, or similar. Do not use a browser.

1. Make sure everything works when eCAP is disabled.
2. Make sure everything works when a sample adapter (patched but not customized) is used.
3. Start customizing the adapter, one small change at a time. Check that stuff is still working after each change.

If you encounter problems, add code to log (using eCAP logging facilities!) what your adapter receives from the host application, when it receives it. You can also log what methods are called. This will tell you whether the adapter receives what you think it should.

Revision history for this message
Sandeep Kuttal (skuttal) said :
#11

Hi Alex,

Thanks for the detailed steps. I tried like the way you told. The outputs are ridiculous. When I did wget once the contents were the way it should be and all other times they were some ASCII characters. Now I opened browser and wget but to my surprise few of the sites contents were displayed very well and others ASCII characters. The link to the file with outputs of the sites when accessed is here:
http://www.cse.unl.edu/~skuttal/test1.txt.gz
the start here line is just to check how manytimes noteVbContentAvailable function is called. Kindly figure out what is going wrong.
Thanks a lot

Revision history for this message
Sandeep Kuttal (skuttal) said :
#12

Hi Alex,

Thanks for the detailed steps. I tried like the way you told. The outputs are ridiculous. When I did wget once the contents were the way it should be and all other times they were some ASCII characters. Now I opened browser and wget but to my surprise few of the sites contents were displayed very well and others ASCII characters. The link to the file with outputs of the sites when accessed is here:
http://www.cse.unl.edu/~skuttal/test1.txt.gz
the start here line is just to check how manytimes noteVbContentAvailable function is called. Kindly figure out what is going wrong.
Thanks a lot

Revision history for this message
Alex Rousskov (rousskov) said :
#13

The first few lines of your file indicate that you are either writing a dirty buffer or buffering with a GIF image. The former would be a bug in your code. The latter would be a bug in your config (because you want the eCAP adapter to deal with just one, plain text response).

It is also possible that you did not start writing the file from scratch, so the beginning of the file has responses captured earlier. The fact that the end of your file contains other response bodies suggests that it is indeed the case. I would recommend dealing with just one URL that leads to a simple, plain text response, to start with.

If you use eCAP logging facilities, you would be able to post a log that contains both Squid actions and the content received by the adapter.

Revision history for this message
Sandeep Kuttal (skuttal) said :
#14

Hi Alex,

Yes I think the problem was that the file's aren't plain text. Since some sites were the exact copy of what was the source code of the site. So I have to look forward
1) to allow ecap adapter to get the contents of sites other than text format.
2) access the POST messages (since till now I am just getting the GET messages)
Can you kindly suggest how should I proceed with these.

Thanks
Sandeep

Revision history for this message
Alex Rousskov (rousskov) said :
#15

I do not fully understand your question, but I recommend that you use URLs that point to plain text files, for now.

If you want to start working with POST requests before you get the responses figured out, you can do it, of course. Just change your host (Squid) configuration to engage eCAP for POST requests instead of responses. Again, I would recommend that you post plain text information to start with.

Revision history for this message
Sandeep Kuttal (skuttal) said :
#16

Hi Alex,

I tried to figure out the ecap logging facility...but couldn't get any clue.Can you point to some documentation which will help in starting with it.

Thanks

Revision history for this message
Alex Rousskov (rousskov) said :
#17

The Host class provides the following logging interface:

        // Logging
        virtual std::ostream *openDebug(LogVerbosity lv) = 0;
        virtual void closeDebug(std::ostream *debug) = 0;

To start logging, obtain a pointer to the std::ostream by calling openDebug. Write what you want to the stream using standard stream methods and operators. When done, call closeDebug with the same pointer.

Do not keep the pointer across adapter Xaction calls; return it before giving back control to the host application.

Revision history for this message
Sandeep Kuttal (skuttal) said :
#18

Thanks Alex