3

Is there a library (preferably a Python one) that shortens an HTML page? By that I mean that it will produce a possibly smaller (in terms of number of characters, including line breaks <- think about the length of a string) HTML page that is rendered exactly the same as the original one?

For instance:

<b>
    Silly example
</b>

could be changed to:

<b>Silly example</b>

and the final result would be the same:

Silly example

averageman
  • 893
  • 3
  • 12
  • 19
  • possible duplicate of [Remove whitespaces in XML string](http://stackoverflow.com/questions/3310614/remove-whitespaces-in-xml-string) – cwallenpoole Jun 26 '14 at 16:37
  • The difference is that I am talking about HTML and not XML... – averageman Jun 26 '14 at 16:39
  • Read the answers. Most of them deal with HTML. If you know that your HTML is not well-formed, then you can use BeautifulSoup or even the HTMLParser class. – cwallenpoole Jun 26 '14 at 16:56

2 Answers2

6

You can use BeautifulSoup to prettify (not minify) HTML or XML code in Python.

from bs4 import BeautifulSoup
soup = BeautifulSoup('file.html')
prettified = soup.prettify(encoding="utf8")

For minifying HTML in Python you can use htmlmin. More parameters for htmlmin.minify can be found in the documentation.

import htmlmin

with open('file.html', 'r') as f:
    content = f.read()
    minified = htmlmin.minify(content, remove_empty_space=True)
Christian Berendt
  • 3,416
  • 2
  • 13
  • 22
  • 3
    Doesn't "prettify" do the exact opposite of what I want? I think "prettify" will make it longer (and not sorter). Am I right? – averageman Jun 26 '14 at 16:34
  • 2
    This answer is wrong. OP wants to compress the HTML. – vaultah Jun 26 '14 at 16:35
  • 1
    Also, "prettify" does not seem to remove line breaks. On the contrary, it will add them and possibly make my files longer... – averageman Jun 26 '14 at 16:41
  • 1
    Please use htmlmin (and other minifiers) with care, I had a case where it moved the title tag from head into body... – tdma May 22 '16 at 19:31
0

You could use htmlmin.

import htmlmin

input_html = '<b>\n\tSilly example\n</b>'

minified_html = htmlmin.minify(input_html)

print(input_html)

# <b>
#   Silly example
# </b>


print(minified_html)

# <b> Silly example </b>
Jacob Beauchamp
  • 532
  • 5
  • 18