1

I have a string similar to "<p></p>". Now, I want to split this string, so I have 2 tags. If I do

var arr = "<p></p>".split("><") , I get an array that looks like

["<p", "/p>"]

Is there a simple way to keep the separator in this split? NOT a REGEX (Not a dupe) I want :

["<p>","</p>"]
mplungjan
  • 169,008
  • 28
  • 173
  • 236
johnny_mac
  • 1,801
  • 3
  • 20
  • 48
  • 2
    What if there is some content between them? Maybe you need `<.*?>` – Tushar Jan 31 '17 at 06:32
  • presume there will not be. I am just curious about keeping the separator within the string for now. – johnny_mac Jan 31 '17 at 06:33
  • @jmcgui05 see http://stackoverflow.com/questions/12001953/javascript-and-regex-split-string-and-keep-the-separator – london-deveoper Jan 31 '17 at 06:34
  • If you want to grap something like HTML tags from JavaScript, I suggest to use RegExp. – modernator Jan 31 '17 at 06:35
  • @mplungjan thanks for the google link. How many of those solutions are Regex? My question was if there was a way to do this on a split. I did not ask for a regex. Your profile says you like to help people, many thanks for that. – johnny_mac Jan 31 '17 at 06:41
  • @modernator—regular expressions are not a good tool for parsing HTML, see [*this answer*](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – RobG Jan 31 '17 at 06:47
  • The link was intended as a help. You cannot keep the delimiter in split without a regex or a subsequent processing of the array as seen in the answer below. – mplungjan Jan 31 '17 at 06:55
  • [Here is a cool hack from the google search](http://stackoverflow.com/a/4514241/295783) `var string = '

    asdoasidhaois

    adaosdja

    ' var delim='><',parts= string.split(delim); for (var i= parts.length; i-->1;) parts.splice(i, 0, delim); console.log(parts)`
    – mplungjan Jan 31 '17 at 07:02
  • @mplungjan "You cannot keep the delimiter without regex or processing the array". Why not just say that, which would answer the question instead of incorrectly closing out as a duplicate of questions with regex solutions? It seems I have to explicitly state that I want to know a) if some is possible, and b) how to code it myself without help from regex or some library. Thats the whole point of SO, is to learn. – johnny_mac Jan 31 '17 at 07:22
  • Ok ok... Reopened... – mplungjan Jan 31 '17 at 07:23

2 Answers2

1

Since javascript regex doesn't support look behind assertion it's not possible with String#split method. Use String#match method to get the complete string.

var arr = "<p></p>".match(/[\s\S]+?>(?=<|$)/g)

console.log(arr)

Without regex and using split you can do something like this.

var arr = "<p></p>".split('><').map(function(v, i, arr1) {
  if (i != 0)
    v = '<' + v;
  if (i < arr1.length - 1)
    v += '>';
  return v;
})

// using ternary 
var arr1 = "<p></p>".split('><').map(function(v, i, arr1) {
  return (i != 0 ? '<' : '') + v + (i < arr1.length - 1 ? '>' : '');
})

console.log(arr);
console.log(arr1);
Pranav C Balan
  • 113,687
  • 23
  • 165
  • 188
0

To do this without a regular expression, you'll need some kind of parser. Inspect every character, build up chunks and store them in an array. You may then want to process the bits, looking for tokens or doing other processing. E.g.

/* Break string into chunks of <...>, </...> and anything in between.
** @param {string} s - string to parse
** @returns {Array} chunks of string
*/
function getChunks(s) {
    var parsed = [];
    var limit = s.length - 1;

    s.split('').reduce(function(buffer, char, i) {
      var startTag = char == '<';
      var endTag   = char == '/';
      var closeTag = char == '>';

      if (startTag) {
        if (buffer.length) {
          parsed.push(buffer);
        }
        buffer = char;

      } else if (endTag) {
        buffer += char;

      } else if (closeTag) {
        parsed.push(buffer + char)
        buffer = '';

      } else {
        buffer += char;
      }

      if (i == limit && buffer.length) {
        parsed.push(buffer);
      }

      return buffer;
    }, '');
    return parsed;
}


['<p></p>',
 '<div>More complex</div>',
 '<span>broken tag</sp'
].forEach(function(s){
  console.log(s + ' => [' + getChunks(s) + ']')
});

Note that this is very simple and just looks for <...> and </...> where ... can be anything.

RobG
  • 142,382
  • 31
  • 172
  • 209