Squid 3.1.0.7 or modifying adapter sample mangles web pages

Asked by JohnB on 2009-04-15

Have others using the modifying adapter had any troubles with content being mangled? I've:
- Installed/built squid 3.1.0.7
- Installed/built the examples from e-cap.org "ecap_adapter_sample-0.0.2.tar.gz"
- Configured Squid to load the adapter module I created.

When I access sites (like squid-cache.org) through my proxy, the modifying adapter changes the "the"'s to "a"'s nicely! But the http stream is clearly mangled, starting over several times with "<html>..." etc.

When I unload the modifying adapter and load the passthru adapter, the sites come through cleanly.

Thanks for hosting this site!
John

Question information

Language:
English Edit question
Status:
Solved
For:
eCAP Edit question
Assignee:
No assignee Edit question
Solved by:
JohnB
Solved:
2009-04-22
Last query:
2009-04-22
Last reply:
2009-04-16
Alex Rousskov (rousskov) said : #1

Have you patched Squid? Does the following patch help? http://www.squid-cache.org/bugs/show_bug.cgi?id=2614

JohnB (jbrady-bluemodus) said : #2

Hi Alex - Thanks for that link: it looks like the code I have is out of line with that patch delta put forth as the fix. I think the adapter_passthru.cc that I have is stale. I should get the freshest version of that adapter - what source do you recommend for that? The Code tab here on launchpad says there is on eCap code here - John

Alex Rousskov (rousskov) said : #3

The patch I pointed to is for Squid, not an adapter.

JohnB (jbrady-bluemodus) said : #4

Ok, I wound up here: http://launchpadlibrarian.net/23518464/td which was referred to by that link - I see the squid patch too - do you recommend the adapter changes in http://launchpadlibrarian.net/23518464/td?

Alex Rousskov (rousskov) said : #5

Yes, I recommend both the Squid patch for Squid bug #2614 and the sample adapter patch you found elsewhere. If possible, please update the Squid bug report after testing the Squid patch.

JohnB (jbrady-bluemodus) said : #6

After patching both the squid modules (XactionRep.cc and Xactionrep.h) and the modifying adapter there is still corruption(I assume) and hang-up in the HTML stream - the browser waits forever on urls like http://www.cnn.com and http://www.linuxquestions.org/questions/misc.php?

The passthru adapter, unchanged, seems to work just as well after the squid patch was applied

Alex Rousskov (rousskov) said : #7

My understanding is that your Squid and sample adapters appear to work correctly after patching but you continue to have problems with your adapter. Please confirm.

If my understanding is correct, you will need to share your adapter code for others to review it. If that is not possible or if nobody reviews/finds bugs in your code, then you would have to add debugging to your adapter and see if you can track the source of problems. For example, you may be able to check whether the adapted content produced by your adapter is mangled (before Squid has a chance to corrupt it) while the virgin content is correct.

JohnB (jbrady-bluemodus) said : #8

So far I don't have any of my own code in the content modifying adapter - I have only compiled and loaded the sample content adapters, with the only modifications to the modifying adapter I've made so far are from http://launchpadlibrarian.net/23518464/td?

The passthru adapter works as-is (from ecap_adapter_sample-0.0.2.tar.gz) when using the squid patch I made today. I'd be happy to try the original adapter with the patched squid code if you think that would bear fruit.

The modifying adapter (from ecap_adapter_sample-0.0.2.tar.gz + patches made from the bug http://launchpadlibrarian.net/23518464/td) still corrupts the stream.

Basically, I want to see these adapters work before I go changing them.

JohnB (jbrady-bluemodus) said : #9

I've also just discovered a bug in the a adapter code as found in the example adapter code as provided on www.e-cap.org/Downloads

there's a line with no effect:

void Adapter::Xaction::noteVbContentDone(bool atEnd)
{
...
receivingVb == opComplete; // nice comparison but no assignment!!
}

Could this cause problems?

Alex Rousskov (rousskov) said : #10

Glad you are making progress.

Bug #362069 addresses receivingVb non-assignment in noteVbContentDone. Please see that bug for details.

It looks like your remaining issue is making the sample modifying adapter work. Can you add debugging to the adapter and see if you can track the source of problems? For example, you may be able to check whether the adapted content produced by the adapter is mangled (before Squid has a chance to corrupt it) while the virgin content is correct. Or perhaps Squid mangles the correct adapted content returned by the adapter.

Also, do you have a specific URL that always exposes the mangling problem?

Thank you,

Alex.

JohnB (jbrady-bluemodus) said : #11

Hi Alex, thanks for being responsive - here's a run-down

- Right now, all sites I've tried get hung up when going through the proxy and content adaptation adapter. I applied the squid patch you recommended and also the modifying adapter patch. The passthru adapter passes the sites through cleanly. I've tried cnn.com and squid-cache.org
- Is there a clean version of the modifying adapter with all of the latest bugs integrated that should be working I could just install and test, then make my changes to? If I start debugging (I haven't yet been able to access breakpoints in the adapter either, only in the squid process) I'd like the latest to start from.

thank you!

Alex Rousskov (rousskov) said : #12

Is there a version of the modifying adapter with all of the latest bug fixed integrated? Not that I know of. Need to work on the next release...

I would suggest adding debugging statements (eCAP has debugging API that Squid implements) rather than breakpoints. If you want to do breakpoints, you may have to set them after the module loads OR you may want to link your module statically (requires fiddling with Makefiles, and I do not have a ready-to-use recipe).

One trick you can try is disabling the modifying code in the modifying adapter by specifying a search string that will never match (or just changing the search so that it always fails). Does the modifying adapter work in that case? If not, you have an smaller/simpler problem to triage.

FWIW, the modifying adapter did work in limited tests at release time so either you are working with significantly different sites OR something changed in Squid. Unfortunately, I do not have enough free time to triage this right now. If you want my help, you need to supply the necessary information (by collecting debugging info that exposes the data corruption, etc.).

Thank you,

Alex.

JohnB (jbrady-bluemodus) said : #13

Hi Alex, This probably isn't your full-time job, eh?!! Thanks and thanks.

What I've done is:
- Kept the squid patch in place.
- Verified the Passthru adapter works well (really well, no flaws) as unpackaged from ecap_adapter_sample-0.0.2.tar.gz
- Removed what I think is the adaptation code (code follows) from the patched (http://launchpadlibrarian.net/23518464/td) modifying adapter.
- Enabled "ALL,9" debug options in Squid (did you mean this for collecting eCap debug info?) - let me know what to look for in there.
- http://www.squid-cache.org/Intro/why.dyn returns corrupted, showing 6 squid header graphics when fetched through the proxy with this adapter running.

adapter_modifying_patched.cc:
========================
#include "sample.h"
#include <iostream>
#include <libecap/common/registry.h>
#include <libecap/common/errors.h>
#include <libecap/common/message.h>
#include <libecap/common/header.h>
#include <libecap/common/names.h>
#include <libecap/host/host.h>
#include <libecap/adapter/service.h>
#include <libecap/adapter/xaction.h>
#include <libecap/host/xaction.h>

namespace Adapter { // not required, but adds clarity

using libecap::size_type;

class Service: public libecap::adapter::Service {
 public:
  // About
  virtual std::string uri() const; // unique across all vendors
  virtual std::string tag() const; // changes with version and config
  virtual void describe(std::ostream &os) const; // free-format info

  // Configuration
  virtual void configure(const Config &cfg);
  virtual void reconfigure(const Config &cfg);

  // Lifecycle
  virtual void start(); // expect makeXaction() calls
  virtual void stop(); // no more makeXaction() calls until start()
  virtual void retire(); // no more makeXaction() calls

  // Scope (XXX: this may be changed to look at the whole header)
  virtual bool wantsUrl(const char *url) const;

  // Work
  virtual libecap::adapter::Xaction *makeXaction(libecap::host::Xaction *hostx);
};

class Xaction: public libecap::adapter::Xaction {
 public:
  Xaction(libecap::host::Xaction *x);
  virtual ~Xaction();

  // lifecycle
  virtual void start();
  virtual void stop();

        // adapted body transmission control
        virtual void abDiscard();
        virtual void abMake();
        virtual void abMakeMore();
        virtual void abStopMaking();

        // adapted body content extraction and consumption
        virtual libecap::Area abContent(size_type offset, size_type size);
        virtual void abContentShift(size_type size);

        // virgin body state notification
        virtual void noteVbContentDone(bool atEnd);
        virtual void noteVbContentAvailable();

  // libecap::Callable API, via libecap::host::Xaction
  virtual bool callable() const;

 protected:
    void adaptContent(std::string &chunk) const; // converts vb to ab
   void stopVb(); // stops receiving vb (if we are receiving it)
    libecap::host::Xaction *lastHostCall(); // clears hostx

 private:
  libecap::host::Xaction *hostx; // Host transaction rep

  std::string buffer; // for content adaptation

  typedef enum { opUndecided, opOn, opComplete, opNever } OperationState;
  OperationState receivingVb;
  OperationState sendingAb;
};

} // namespace Adapter

std::string Adapter::Service::uri() const {
 return "ecap://e-cap.org/ecap/services/sample/modifying";
}

std::string Adapter::Service::tag() const {
 return PACKAGE_VERSION;
}

void Adapter::Service::describe(std::ostream &os) const {
 os << "A modifying adapter from " << PACKAGE_NAME << " v" << PACKAGE_VERSION;
}

void Adapter::Service::configure(const Config &) {
 // this service is not configurable
}

void Adapter::Service::reconfigure(const Config &) {
 // this service is not configurable
}

void Adapter::Service::start() {
 libecap::adapter::Service::start();
 // custom code would go here, but this service does not have one
}

void Adapter::Service::stop() {
 // custom code would go here, but this service does not have one
 libecap::adapter::Service::stop();
}

void Adapter::Service::retire() {
 // custom code would go here, but this service does not have one
 libecap::adapter::Service::stop();
}

bool Adapter::Service::wantsUrl(const char *url) const {
 return true; // no-op is applied to all messages
}

libecap::adapter::Xaction *Adapter::Service::makeXaction(libecap::host::Xaction *hostx) {
 return new Adapter::Xaction(hostx);
}

Adapter::Xaction::Xaction(libecap::host::Xaction *x): hostx(x),
 receivingVb(opUndecided), sendingAb(opUndecided) {
}

Adapter::Xaction::~Xaction() {
 if (libecap::host::Xaction *x = hostx) {
  hostx = 0;
  x->adaptationAborted();
 }
}

void Adapter::Xaction::start() {
 Must(hostx);
 if (hostx->virgin().body()) {
  receivingVb = opOn;
  hostx->vbMake(); // ask host to supply virgin body
 } else {
  receivingVb = opNever;
 }

 /* adapt message header */

 libecap::shared_ptr<libecap::Message> adapted = hostx->virgin().clone();
    Must(adapted != 0);

 // delete ContentLength header because we may change the length
 // unknown length may have performance implications for the host
 //adapted->header().removeAny(libecap::headerContentLength);

 // add a custom header
 //static const libecap::Name name("X-Ecap");
 //const libecap::Header::Value value =
 // libecap::Area::FromTempString(libecap::MyHost().uri());
 //adapted->header().add(name, value);

 if (!adapted->body()) {
  sendingAb = opNever; // there is nothing to send
  lastHostCall()->useAdapted(adapted);
 } else {
  hostx->useAdapted(adapted);
 }
}

void Adapter::Xaction::stop() {
 hostx = 0;
    // the caller will delete
}

void Adapter::Xaction::abDiscard()
{
 Must(sendingAb == opUndecided); // have not started yet
 sendingAb = opNever;
 // we do not need more vb if the host is not interested in ab
 stopVb();

}

void Adapter::Xaction::abMake()
{
 Must(sendingAb == opUndecided); // have not yet started or decided not to send
 Must(hostx->virgin().body()); // that is our only source of ab content

    // we are or were receiving vb
 Must(receivingVb == opOn || receivingVb == opComplete);

 sendingAb = opOn;
 if (!buffer.empty())
  hostx->noteAbContentAvailable();
}

void Adapter::Xaction::abMakeMore()
{
 Must(receivingVb == opOn); // a precondition for receiving more vb
 hostx->vbMakeMore();
}

void Adapter::Xaction::abStopMaking()
{
 sendingAb = opComplete;
 // we do not need more vb if the host is not interested in more ab
 stopVb();

}

libecap::Area Adapter::Xaction::abContent(size_type offset, size_type size) {
 Must(sendingAb == opOn || sendingAb == opComplete);
 return libecap::Area::FromTempString(buffer.substr(offset, size));
}

void Adapter::Xaction::abContentShift(size_type size) {
 Must(sendingAb == opOn || sendingAb == opComplete);
 buffer.erase(0, size);
}

void Adapter::Xaction::noteVbContentDone(bool atEnd)
{
 Must(receivingVb == opOn);
 receivingVb = opComplete;
 if (sendingAb == opOn) {
  hostx->noteAbContentDone(atEnd);
  sendingAb = opComplete;
 }
}

void Adapter::Xaction::noteVbContentAvailable()
{
 Must(receivingVb == opOn);

 const libecap::Area vb = hostx->vbContent(0, libecap::nsize); // get all vb
 std::string chunk = vb.toString(); // expensive, but simple
 //adaptContent(chunk);
 buffer += chunk; // buffer what we got

 if (sendingAb == opOn)
  hostx->noteAbContentAvailable();
}

void Adapter::Xaction::adaptContent(std::string &chunk) const {
 // this is oversimplified; production code should worry about content
 // split by arbitrary chunk boundaries, efficiency, and other things

 // another simplification: victim does not belong to replacement
 static const std::string victim = "the";
 static const std::string replacement = "a";

 std::string::size_type pos = 0;
 while ((pos = chunk.find(victim, pos)) != std::string::npos)
  chunk.replace(pos, victim.length(), replacement);
}

bool Adapter::Xaction::callable() const {
    return hostx != 0; // no point to call us if we are done
}

// tells the host that we are not interested in [more] vb
// if the host does not know that already
void Adapter::Xaction::stopVb() {
 if (receivingVb == opOn) {
  hostx->vbStopMaking();
  receivingVb = opComplete;
 } else {
  // we already got the entire body or refused it earlier
  Must(receivingVb != opUndecided);
 }
}

// this method is used to make the last call to hostx transaction
// last call may delete adapter transaction if the host no longer needs it
// TODO: replace with hostx-independent "done" method
libecap::host::Xaction *Adapter::Xaction::lastHostCall() {
 libecap::host::Xaction *x = hostx;
 Must(x);
 hostx = 0;
 return x;
}

// create the adapter and register with libecap to reach the host application
static const bool Registered = (libecap::RegisterService(new Adapter::Service), true);

Alex Rousskov (rousskov) said : #14

Sorry, I cannot triage this for you right now. If you can collect debugging output, find the first place where data corruption occurs, and point me to it, then I should be able to either fix the bug or ask for more debugging details.

To accomplish this, you most likely need to add debugging statements to Squid or the adapter, and those statements have to include raw data sent to and received from the adapter. At the minimum you can add debugging to the calls adding or consuming virgin and adapted message bodies. As a result, the debug log will have a record of what was adapted and where the corruption occurred.

Since you are writing your own adapter, it may be a good idea to learn eCAP debugging API (Host::*Debug) and add debugging statements to the adapter, but it is your call. If you use this approach, make sure you open and close the debugging stream for each debug message.

You can then search the log for the corrupted pieces that you see in your browser (or you can even diff the virgin and adapted responses collected with wget, curl, or "save as"). Once you find the corruption signs in the logs, we can look at what caused it.

This will take some time, which is exactly why I cannot offer to do this for you right now.

JohnB (jbrady-bluemodus) said : #15

Hi Alex - I'm going to put the pause on this since the icap server we have is behaving - unfortunately I don't have permission to work this angle of content modification through at this time. I like the eCap model better though, I hope to get back to eCap adapters at a later time.

THANK YOU for being so helpful.

Ali Mohammad (ali-mohammad) said : #16

JohnB,

Try the following patch, Hope it will solve your problem.

http://launchpadlibrarian.net/22098875/consume-vb.patch

Thanks
Ali