60

How can I format/tidy/beautify HTML in JavaScript? I have tried doing a search/replace for angle brackets (<, >) and indenting accordingly. But of course it does not take into account when the is JS or CSS etc inside the HTML.

The reason I want to do this is I have made a content editor (CMS) which has both WYSIWYG and source code views. The problem the code written by the WYSIWYG editor is normally a single line. So I would like a JavaScript that could format this into a more readable form on demand.

Here what I have so far:

function getIndent(level) {
    var result = '',
        i = level * 4;
    if (level < 0) {
        throw "Level is below 0";
    }
    while (i--) {
        result += ' ';
    }
    return result;
}

function style_html(html) {
    html = html.trim();
    var result = '',
        indentLevel = 0,
        tokens = html.split(/</);
    for (var i = 0, l = tokens.length; i < l; i++) {
        var parts = tokens[i].split(/>/);
        if (parts.length === 2) {
            if (tokens[i][0] === '/') {
                indentLevel--;
            }
            result += getIndent(indentLevel);
            if (tokens[i][0] !== '/') {
                indentLevel++;
            }

            if (i > 0) {
                result += '<';
            }

            result += parts[0].trim() + ">\n";
            if (parts[1].trim() !== '') {
                result += getIndent(indentLevel) + parts[1].trim().replace(/\s+/g, ' ') + "\n";
            }

            if (parts[0].match(/^(img|hr|br)/)) {
                indentLevel--;
            }
        } else {
            result += getIndent(indentLevel) + parts[0] + "\n";
        }
    }
    return result;
}
JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
Petah
  • 45,477
  • 28
  • 157
  • 213
  • 2
    sometimes the best questions/answers are off-topic. – NilsB Jul 23 '15 at 12:44
  • @NilsB rubbish this is on topic, in fact it had already been closed as off topic and then reopend again – Petah Jul 23 '15 at 20:34
  • 2
    [Here's ultra simple HTML formatter in javascript](https://jsfiddle.net/buksy/rxucg1gd/) – Buksy May 31 '16 at 08:21
  • Your code works well, but still need some improvements. To support more singleton or void tags. Try change the match method if (parts[0].match(/^(area|base|br|col|command|embed|hr|img|input|link|meta|param|source)/)). Also update: return result.trim(); instead of html = html.trim(); – Amr Dec 02 '16 at 07:51

8 Answers8

36

I use this method to format HTML. Simple, but does the job:

function format(html) {
    var tab = '\t';
    var result = '';
    var indent= '';

    html.split(/>\s*</).forEach(function(element) {
        if (element.match( /^\/\w/ )) {
            indent = indent.substring(tab.length);
        }

        result += indent + '<' + element + '>\r\n';

        if (element.match( /^<?\w[^>]*[^\/]$/ ) && !element.startsWith("input")  ) { 
            indent += tab;              
        }
    });

    return result.substring(1, result.length-3);
}
Mario Petrovic
  • 7,500
  • 14
  • 42
  • 62
michal.jakubeczy
  • 8,221
  • 1
  • 59
  • 63
  • 1
    Great solution. While it doesn't give indenting, nor format CSS, it's a great quick-and-dirty, and sweetly short. – johny why Apr 26 '20 at 17:40
  • 1
    @johnywhy Seems to handle indenting fine. – Dustin Poissant Nov 16 '20 at 20:38
  • 1
    i suggest to change \t with 4 space – hossein sedighian Dec 09 '20 at 23:13
  • This works great for HTML, Thanks a ton! is there any similar quick solution for CSS? – damnitrahul Feb 21 '21 at 06:00
  • It's very simple yet good. But miscalculates when tags like be, hr – AKIB AKRAM May 18 '21 at 16:35
  • Indentation is miscalculated somewhere, since the closing tags at the bottom of our long HTML files are not at position 1, but we had to scroll miles to the right. Used js-beautify as suggested by Cybernetic and that works perfect with a simple command as `html_beautify(htmlDoc.documentElement.innerHTML, { indent_size: 2})`. – Martijn Jul 15 '21 at 11:08
  • @Martijn send me the HTML you used this for. I can take a look at it. – michal.jakubeczy Jul 15 '21 at 15:28
  • @michal.jakubeczy Thanks for your offer to help. The HTML contains confidential information, but I managed to obfuscate strings by hashing. Newbie question; how do I send you the files? – Martijn Jul 16 '21 at 20:37
  • @michal.jakubeczy Update: I think I've spotted the root cause. Your script expects a closing tag, but sometimes there isn't, such as ``, or it comes later in case of nesting. Anyway, I've 3 files ready for your analysis: 1) raw ugly formatted string from the DOM object 2) result of your function (wrong indent) 3) result of `html_beautify` (correct indent) – Martijn Jul 16 '21 at 20:45
  • @Martijn I created a chat room regarding this one - https://chat.stackoverflow.com/rooms/info/235038/room-for-michal-jakubeczy-and-martijn?tab=general ... I think there we can finish the investigation. – michal.jakubeczy Jul 19 '21 at 06:48
20

@lovasoa How to format/tidy/beautify in JavaScript is an excellent solution.
rock-solid, much better than vkBeautify or even CodeMirror (hard to use AMD) and VERY easy

<script src='http://lovasoa.github.io/tidy-html5/tidy.js'></script>
<script>
  options = {
  "indent":"auto",
  "indent-spaces":2,
  "wrap":80,
  "markup":true,
  "output-xml":false,
  "numeric-entities":true,
  "quote-marks":true,
  "quote-nbsp":false,
  "show-body-only":true,
  "quote-ampersand":false,
  "break-before-br":true,
  "uppercase-tags":false,
  "uppercase-attributes":false,
  "drop-font-tags":true,
  "tidy-mark":false
}

var html = document.querySelector("body").outerHTML;
var result = tidy_html5(html, options);
console.log(result);
</script>
Community
  • 1
  • 1
rickdog
  • 748
  • 8
  • 10
  • 6
    home http://www.htacg.org/tidy-html5/ --- github https://github.com/htacg/ --- config options http://www.htacg.org/tidy-html5/quickref.html – rickdog May 16 '15 at 21:14
5

I find js-beautify far superior to any solution posted so far.

Add the script to your lib folder:

Bring inside header as usual:

<script src="libs/beautify.js"></script>

Target code wherever it is on your page (e.g. pre or code tag) and use the js_beautify function to format as needed:

$(".my_class").text(js_beautify($(".my_class").text()))

This will format as needed. All kinds of config options available on the repo.

Cybernetic
  • 12,628
  • 16
  • 93
  • 132
4

I needed something similar and here is my solution, inspired by method provided by michal.jakubeczy. It is slightly complicated in order to preserve formatting within <pre> tags. Hope this will help someone.

function formatHTML(html) {
    var indent = '\n';
    var tab = '\t';
    var i = 0;
    var pre = [];

    html = html
        .replace(new RegExp('<pre>((.|\\t|\\n|\\r)+)?</pre>'), function (x) {
            pre.push({ indent: '', tag: x });
            return '<--TEMPPRE' + i++ + '/-->'
        })
        .replace(new RegExp('<[^<>]+>[^<]?', 'g'), function (x) {
            var ret;
            var tag = /<\/?([^\s/>]+)/.exec(x)[1];
            var p = new RegExp('<--TEMPPRE(\\d+)/-->').exec(x);

            if (p) 
                pre[p[1]].indent = indent;

            if (['area', 'base', 'br', 'col', 'command', 'embed', 'hr', 'img', 'input', 'keygen', 'link', 'menuitem', 'meta', 'param', 'source', 'track', 'wbr'].indexOf(tag) >= 0) // self closing tag
                ret = indent + x;
            else {
                if (x.indexOf('</') < 0) { //open tag
                    if (x.charAt(x.length - 1) !== '>')
                        ret = indent + x.substr(0, x.length - 1) + indent + tab + x.substr(x.length - 1, x.length);
                    else 
                        ret = indent + x;
                    !p && (indent += tab);
                }
                else {//close tag
                    indent = indent.substr(0, indent.length - 1);
                    if (x.charAt(x.length - 1) !== '>')
                        ret =  indent + x.substr(0, x.length - 1) + indent + x.substr(x.length - 1, x.length);
                    else
                        ret = indent + x;
                }
            }
            return ret;
        });

    for (i = pre.length; i--;) {
        html = html.replace('<--TEMPPRE' + i + '/-->', pre[i].tag.replace('<pre>', '<pre>\n').replace('</pre>', pre[i].indent + '</pre>'));
    }

    return html.charAt(0) === '\n' ? html.substr(1, html.length - 1) : html;
}

function unformatHTML(html) {
    var i = 0;
    var pre = [];

    html = html.replace(new RegExp('<pre>((.|\\t|\\n|\\r)+)?</pre>'), function (x) {
        pre.push(x);
        return '<--TEMPPRE' + i++ + '/-->'
    }).replace(/\n/g, '').replace(/\t/g, '');

    for (i = pre.length; i--;) {
        html = html.replace('<--TEMPPRE' + i + '/-->', pre[i]);
    }

    html = html.replace(new RegExp('<pre>\\n'), '<pre>').replace(new RegExp('\\n\\t*</pre>'), '</pre>');
    return html;
}
Gabriel
  • 41
  • 2
1

You can also use a command line tool if you have node.js install

run npm install -g uglify-js to install uglifyjs globally, check here for documentation.

Then you can uglify index.min.js -b -o index.js

nickleefly
  • 3,733
  • 1
  • 29
  • 31
  • My quess is because your solution is for developement and no runtime inside browser which I think is intention. – Risord Aug 31 '17 at 10:49
0

jQuery creator John Resig wrote a fast and lightweight HTML parser in javascript. If you're looking for a solution which you can add directly to your CMS then you could write a simple beautifier using this parser as a base. All you'd need to do is reoutput the elements adding spaces and line breaks as you like, using the built in api:

HTMLParser(htmlString, {
  start: function(tag, attrs, unary) {},
  end: function(tag) {},
  chars: function(text) {},
  comment: function(text) {}
});

An added benefit of this approach is that you could use the same HTMLParser to read HTML back into your WYSIWYG, or otherwise interact with your user's HTML tree. HTMLParser also comes prebuilt with an HTMLtoDOM method.

Daniel Mendel
  • 9,862
  • 1
  • 24
  • 37
-1

I believe that both chrome and firebug's debugging code display engines are written in JS. That's probably heavier duty than you really want to be messing with though.

Paul McMillan
  • 19,693
  • 9
  • 57
  • 71
-2

Writing the on one line would download faster to the browser, so I am not sure I would want it formatted. Maybe an option for a formatted version or an optimized version.

As for the question... you could do an call after so many actions and send the code to the server to be formatted and shown in a different box on the screen. Basically it would be a real time version of this site, http://infohound.net/tidy/

Eric
  • 6,563
  • 5
  • 42
  • 66
  • 2
    Yes it would be ever so slightly faster (like 0.0001 of a second). But considering a WYSIWYG editor is aimed at clients that know only a little about HTML, formatted HTML makes it a lots easier. Also in regards to sending the data to the server for formatting, that is far from ideal. – Petah Oct 20 '10 at 12:15