While html is an excellent medium for distributing and consuming information on the web, it is not an ideal format as far as printing and archiving purposes are concerned. For that, pdf is a better format, as pdf documents have well-defined page layout, and have all contained images embedded into pdf files. If you would like to convert html pages to pdf format on Linux, follow this guideline.

You can use a command line utility called wkhtmltopdf to convert any html webpage or url to pdf file. wkhtmltopdf uses Webkit web browser rendering engine to do html to pdf conversion.

You can install wkhtmltopdf on Debian/Ubuntu as follows.
$ sudo apt-get install wkhtmltopdf
You need to be aware that wkhtmltopdf installed via apt-get has reduced functionality and other limitations. First of all, it cannot run without X11 system. Also, it cannot add hyperlinks or a table of contents in the converted pdf file.

To convert html to pdf using wkhtmltopdf, run it as follows.
$ wkhtmltopdf http://www.cnn.com cnn.pdf
If you would like to use wkhtmltopdf without X11 system, while enjoying its full features, you need to use a static binary of wkhtmltopdf, built with Qt and X11. You can download these binaries from its official website.

Note that if you want to capture web pages hosted on https site, you need to install openssl first, and run xkhtmltopdf.

$ sudo apt-get install openssl
If xkhtmltopdf does not work for some reason, an alternative way to convert html web pages to pdf files is to use Google Chrome browser.

On Google Chrome, go to the URL of the web page you would like to convert to pdf. Then, choose “Print a page” menu of Google Chrome, and change “Destination” to “Save as PDF”. Once you click print button, the web page will be saved as a local pdf file that you designate.


Post a Comment

Previous Post Next Post