How is done the printing of an HTML page?

Asked by Uqbar

I'd like to know how it's done the printing of an HTML page from, say, a KDE or GNOME application.
Is it an external helper program or is it a library?

Question information

Language:
English Edit question
Status:
Answered
For:
Ubuntu firefox Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
madbiologist (me-again) said :
#1

You can open the web page with either Firefox or LibreOffice and then print it. Or you can save it, then install html2text and then use this to convert it to plain text which you can then open with gedit or leaf or nano or your favourite text editor and then print.

Revision history for this message
madbiologist (me-again) said :
#2

.

Revision history for this message
Uqbar (uqbar) said :
#3

I've been unclear.
I want to know how I go from an HTML page to a PS/PDF/whatever file format to be printed.
I do know that I can use a browser to print any local or remote page. But what I want to know is how this is done: with a library or with a program helper?

For example, you can print a JPEG image by command line with a simple "lp image.jpg".
The image is converted into PS by CUPS, then possibly translated into the proper raster format for the specific printer and finally sent to the printer itself.

I know about htmldoc which can produces either PS or PDF, but it lacks support to STYLEs and has a few bugs with tables.

Revision history for this message
madbiologist (me-again) said :
#4

You can use Firefox's Print to File to print to PDF or Postscript. One of those options (I think it was PS) was removed a while back because it was broken, but it seems to be back now. I haven't tested this myself. I believe cairo is involved.

There is also html2ps. From it's description - "This program converts HTML directly to PostScript. The HTML code can be retrieved from one or more URLs or local files, specified as parameters on the command line. A comprehensive level of HTML is supported, including inline images, CSS 1.0, and some features of HTML 4.0."

If you are familiar with programming in python there is the python module called python-pisa. From it's description - "pisa is an html2pdf converter using the ReportLab Toolkit, HTML5lib and pyPdf. It supports HTML 5 and CSS 2.1 (and some of CSS 3). It is completely written in pure Python so it is platform independent. The main benefit of this tool that a user with Web skills like HTML and CSS is able to generate PDF templates very quickly without learning new technologies. Easy integration into Python frameworks like CherryPy, KID Templating, TurboGears, Django, Zope, Plone, Google AppEngine (GAE) etc."

Revision history for this message
Uqbar (uqbar) said :
#5

I have an application (in C) that needs to print some tabular data.
I have successfully used the external program htmldoc so far.
Recently I've hit a few bugs that are making some printouts useless.
So I was arguing how the browsers make their HTML print outs as both Mozilla Firefox and Google Chrome can display and print those HTML files correctly.

Revision history for this message
madbiologist (me-again) said :
#6

If you install mercurial you can get the Firefox source code from Mozilla's Mercurial code repository (this may take a while; it's a lot of code!):

hg clone http://hg.mozilla.org/mozilla-central

Can you help with this problem?

Provide an answer of your own, or ask Uqbar for more information if necessary.

To post a message you must log in.