1

the HTML/concept:

    <textarea id="input"></textarea>        
    <button onclick="format()">submit</button>
    <textarea id="output"></textarea>

I regularly have to convert docs into html for clients, and I'm tired of having to find/replace + manually change-to/add the appropriate HTML. So I looked for my dream formatter, but couldn't find anything (please post if you know about one that fits), so I figured I'd just wright my own with javascript. It's very straight forward but I'm unfamiliar with regular expressions and having some trouble, here's what I've been able to piece together using regexp I've found in other posts:

    var email = /(\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,6})/gim;
    var url = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
    var br = /(\r\n|\n|\r)/gm;

    function format() {

        var input = $('#input').val();

        var check1 = input.replace(br,"<br>");
        var check2 = check1.replace(url,'<a href="$1" target="blank">$1</a>');
        var check3 = check2.replace(email, '<a href="mailto:$1">$1</a>' );
        var check4 = check3.replace(etc, ...);

        var output = check4;
        $('#output').val(output);
    }

There's a couple more things I want to do, but can't seem to find/write the correct regexp, these are:

  1. find any bold characters, and replace them with appropriate html/css
  2. find any italic characters, and replace them with appropriate html/css
  3. find particular characters (©,“,”,ñ,etc) and replace them with the appropriate characters/entities ie:

     [&#169; , " , " , &ntilde; , etc]
    

My apologies if this has been answered, but I can't seem to find these bits (perhaps I'm asking the wrong questions?), any help finding bold/italic as well as replacing specific characters/entities would be great! Also, if I'm going about this the wrong way please call me out. Thans so much!

Nick Briz
  • 1,917
  • 3
  • 20
  • 34
  • 1
    Textareas only support plain text. So you won't be able to detect bold because there's no (sane) way to put bold text in them. So maybe your real question is how to convert from word processor documents to decent HTML? – sourcejedi Aug 28 '12 at 19:02
  • hmmm, this is a good point. Do you happen to know how to do a simple character swap? for example the quote (“) to ", I've tried /“/gi, /\“\/gi, etc but am having no luck, I imagine I'm making a very stupid mistake – Nick Briz Aug 28 '12 at 19:05
  • You mean `/“/"/gi`. Dunno. Might be a character encoding issue. I would just make sure I was using UTF-8 for everything and not bother about entities. Use UTF-8, it's the [law^H^H^H highly recommended](https://tools.ietf.org/html/rfc2277#section-3.1). – sourcejedi Aug 28 '12 at 21:39
  • @sourcejedi that doesn't work, but I was able to get it working using unicode+regex : /\u201c|\u201d|\u201e/g = curly quotes :) – Nick Briz Aug 28 '12 at 23:40

1 Answers1

0

well, looks like bold/italic isn't really an option as @sourcejedi points out. But I figured out how to find/replace everything else I needed to with regexp's. Again, this is a very specific task which I happen to have to do quite often (converting what are usually very long [often in spanish w/accented characters] google doc files into html), in the event someone else is in the same/similar boat, here's what I ended up with:

HTML:

    <textarea id="input" cols="50" rows="10"></textarea><br>
    <button onclick="format()">format!</button><br>
    <textarea id="output" cols="50" rows="10"></textarea><br>

Javascript:

    function format() {

        var input = document.getElementById('input').value; // get input txt 
        var output = document.getElementById('output');     //target ouput txt box
        var i = input;                                      

        var email = /(\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,6})/gim;
        var url = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
        var br = /(\r\n|\n|\r)/gm;

        i = i.replace(br, '<br>'+'\n'); 
        i = i.replace(url, '<a href="$1" target="blank">$1</a>'); 
        i = i.replace(email, '<a href="mailto:$1">$1</a>' );
        i = i.replace( /\u2018|\u2019|\u201A|\uFFFD/g, "'" );
        i = i.replace( /\u201c|\u201d|\u201e/g,  '"' );
        i = i.replace( /\u02C6/g, '^' );
        i = i.replace( /\u2039/g, '<' );
        i = i.replace( /\u203A/g, '>' );
        i = i.replace( /\u2013/g, '-' );
        i = i.replace( /\u2013/g, '-' ); 
        i = i.replace( /\u2022/g, '<span style="padding-left:15px;">&#38;'+'#8226;</span>' ); 
        i = i.replace( /\u00C9/g, '&#38;'+'Eacute;' ); // E w/accent
        i = i.replace( /\u00CD/g, '&#38;'+'Iacute;' ); // I w/accent
        i = i.replace( /\u00D3/g, '&#38;'+'Oacute;' ); // O w/accent
        i = i.replace( /\u00DA/g, '&#38;'+'Uacute;' ); // U w/accent
        i = i.replace( /\u00DD/g, '&#38;'+'Yacute;' ); // Y w/accent
        i = i.replace( /\u00D1/g, '&#38;'+'Ntilde;' ); // Nye
        i = i.replace( /\u00E1/g, '&#38;'+'aacute;' ); // a w/accent
        i = i.replace( /\u00E9/g, '&#38;'+'eacute;' ); // e w/accent
        i = i.replace( /\u00ED/g, '&#38;'+'iacute;' ); // i w/accent
        i = i.replace( /\u00F3/g, '&#38;'+'oacute;' ); // o w/accent
        i = i.replace( /\u00FA/g, '&#38;'+'uacute;' ); // u w/accent
        i = i.replace( /\u00FD/g, '&#38;'+'yacute;' ); // y w/accent
        i = i.replace( /\u00F1/g, '&#38;'+'ntilde;' ); // nye
        i = i.replace( /\u2014/g, '&#38;'+'#8212;' );  // mdash
        i = i.replace( /\u2026/g, '...' );      // elipses
        i = i.replace( /\u00A9/g, '&#38;'+'#169;' );   // copyright logo
        i = i.replace( /\u00AE/g, '&#38;'+'#174;' );      // restricted logo
        i = i.replace( /\u2122/g, '&#38;'+'#8482;' );  // trade mark logo
        i = i.replace( /\u00BC/g, '&#38;'+'#188;' );   // 1/4
        i = i.replace( /\u00BD/g, '&#38;'+'#189;' );   // 1/2
        i = i.replace( /\u00BE/g, '&#38;'+'#190;' );   // 3/4
        i = i.replace(/[\u02DC|\u00A0]/g, " "); // speacial spaces characters            

        output.innerHTML = i;
    }

...and for those really interested, here's a version with adjustable parameters (preloading email, adding wrapper div with custom CSS, target=blank toggle, mailto toggle, minify, etc) http://jsfiddle.net/N4vrE/

Nick Briz
  • 1,917
  • 3
  • 20
  • 34