Can I inject ads into pages?

Created by Alex Rousskov
Keywords:
Last updated by:
Alex Rousskov

Yes, an eCAP service can modify HTML content to inject advertisements and such. However, it is difficult to do that well due to many technical and non-technical issues.

The problems often start even before your eCAP service gets a chance to do anything: An increasing number of web sites use SSL/TLS encryption. The host application (e.g., an HTTP proxy) has to decrypt traffic for an eCAP adapter to be able to inspect and modify HTTP messages. By design, seamless in-proxy decryption without consent of at least one message endpoint is virtually impossible. And even with one side consenting, such in-proxy decryption is essentially a man-in-the-middle attack on the HTTPS traffic. Modern proxies do support attacks on consenting users (e.g., see the SslBump feature in Squid), but making them work reasonably well in production requires both serious expertise and an on-going babysitting of various exception lists.

Besides encryption (a problem outside of eCAP domain), the biggest underlying technical problem may be that the host application does not operate on HTML pages. It operates on HTTP responses. A browser "page" is often a collection of several HTTP responses that may contain encoded HTML and that have no or little information about each other and their containing context. This causes the following side-effects:

* If your adaptation service modifies every HTTP response, it will corrupt responses that are not really HTML. One can limit adaptation to responses that have one of the supported content types, but then you will miss some responses that just lack a content type or have the wrong content type but are actually rendered as HTML by the browser.

* If your adaptation service modifies every HTML object, you will get multiple <iframe> tags for pages that use an HTML frameset. HTTP does not contain the information necessary to identify HTML frames. Identification and corrective actions is best done inside the browser, using Javascript tricks. The adaptation service can inject Javascript code, but somebody needs to develop and maintain the injected code. The skill set required to create a good adaptation service is rather different from what is needed to write good Javascript.

* Modifying HTML pages requires parsing HTML. Some HTML is malformed and cannot be parsed or modified correctly. Thus, some objects may be incorrectly skipped and/or corrupted by the adaptation service. The service can try to deal with malformed HTML but that increases complexity and not all problems can be solved reliably.

* Some sites come through the proxy in encoded form. Gzip encoding is the most popular one but there are others. You will need to decide whether the adaptation service should decode HTTP responses and what encodings to support.

On the non-technical front, content owners and web site operators may become very upset if their content is being modified without their consent. Site owners may become upset if their pages do not look/function as intended and/or if their ad revenue stream is decreasing because of your adaptations. Similar problems apply to content consumers. You might be sued or otherwise attacked. This page does not give legal advice, but we have heard of legal threats directed at companies that inject their content into 3rd-party web sites.

There are other subtle problems, some of which we probably have not seen in the wild yet. The overall solution will work in most cases, but will not work in some, and will require some babysitting/maintenance.

Related links:

* http://arstechnica.com/tech-policy/2013/04/how-a-banner-ad-for-hs-ok/