3

Hello everyone I am writing a script, the main idea is that I have a text with a fixed structure as follows:

"RBD|X|RBD|C|92173~GJHGWO.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX4.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX6.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX8.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGXA.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGXC.NAYE" "SAMBORNSiPOSSSTHRa"

I want to process that text, I want to split that text by the following symbols: |"~, pipe double quote and ~, I want to create an array to store that values, as follows:

splitWords = [RBD,X,RBD,C,92173,GJHGWO.NAYE,SAMBORNSiPOSSSTHRa]

In order to achieve it I tried:

var splitWords = document.getElementById("texto").value.split("|");
document.write(stringArray.toString());

and I get:

"RBD,X,RBD,C,92173~GJHGWO.NAYE" "SAMBORNSiPOSSSTHRa" "RBD,X,RBD,C,92173~GJHGX4.NAYE" "SAMBORNSiPOSSSTHRa" "RBD,X,RBD,C,92173~GJHGX6.NAYE" "SAMBORNSiPOSSSTHRa" "RBD,X,RBD,C,92173~GJHGX8.NAYE" "SAMBORNSiPOSSSTHRa" "RBD,X,RBD,C,92173~GJHGXA.NAYE" "SAMBORNSiPOSSSTHRa" "RBD,X,RBD,C,92173~GJHGXC.NAYE" "SAMBORNSiPOSSSTHRa"

The problem with this is that this is just splitting the text by the pipe, I would like to split it by the others symbols too, in order to get my desired output. The complete code looks as follows:

<!DOCTYPE html>
<html>

<body>
<p id="demo"></p>

<textarea cols=150 rows=15 id="texto">
"RBD|X|RBD|C|92173~GJHGWO.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX4.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX6.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX8.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGXA.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGXC.NAYE" "SAMBORNSiPOSSSTHRa"
</textarea>

<script>
var splitWords = document.getElementById("texto").value.split("|");
document.write(splitWords.toString());
</script>

</body>
</html>

I would like to appreciate any suggestion of a regular expression to achieve this.

neo33
  • 1,809
  • 5
  • 18
  • 41
  • text.split(/[\["\|~]/) – le_m Jun 01 '16 at 17:33
  • Split accepts regular expressions. `/\||"|~|\./g` – ndugger Jun 01 '16 at 17:47
  • You ask for a regex to split text by specific characters, but this won't solve your problem: you want to get a multidimensional array containing entries for each line where each line is separated into two "words|with|your|data" by a space. – le_m Jun 01 '16 at 17:49
  • @user138717 Are you sure that your individual values can never contain any of the characters used for splitting, e. g. instead of "SAMBORNSiPOSSSTHRa" there can never be "SA BORNSiPOSSTHRa" etc.? If not, better use a more refined regex matching strategy. – le_m Jun 01 '16 at 19:10
  • Ok, thanks I will verify it. – neo33 Jun 01 '16 at 19:15

3 Answers3

3

Use a regular expression:

str = '"RBD|X|RBD|C|92173~GJHGWO.NAYE" "SAMBORNSiPOSSSTHRa"';
str.split(/[\|"~\s]+/).filter(Boolean); // Output: ["RBD", "X", "RBD", "C", "92173", "GJHGWO.NAYE", "SAMBORNSiPOSSSTHRa"]

If you want to filter the period as well, add it in the square brackets of the regex with a backslash to escape it.

Makaze
  • 1,076
  • 7
  • 13
2

Ok, let's begin... Get textarea value and trim it...

var splitWords = document.getElementById("texto").value.trim();

First of all you need to replace " symbol...

splitWords = splitWords.replace(/"/g, '');

Then split the lines because it's like table rows...

splitWords = splitWords.split('\n');

Then split each row by posible delimeters |, ~, ...

splitWords.forEach(function(rowValue,rowIndex) {
    splitWords[rowIndex] = rowValue.split(/[|~ ]/);
    console.log(rowIndex, splitWords[rowIndex]);
});

Console.log output will be:

0 ["RBD", "X", "RBD", "C", "92173", "GJHGWO.NAYE", "SAMBORNSiPOSSSTHRa"]
1 ["RBD", "X", "RBD", "C", "92173", "GJHGX4.NAYE", "SAMBORNSiPOSSSTHRa"]
2 ["RBD", "X", "RBD", "C", "92173", "GJHGX6.NAYE", "SAMBORNSiPOSSSTHRa"]
3 ["RBD", "X", "RBD", "C", "92173", "GJHGX8.NAYE", "SAMBORNSiPOSSSTHRa"]
4 ["RBD", "X", "RBD", "C", "92173", "GJHGXA.NAYE", "SAMBORNSiPOSSSTHRa"]
5 ["RBD", "X", "RBD", "C", "92173", "GJHGXC.NAYE", "SAMBORNSiPOSSSTHRa"]

Then do whatever you want with 2-dimensional array splitWords...

Sergey Khalitov
  • 987
  • 7
  • 17
1

My proposal is:

<p id="demo"></p>

<textarea cols=150 rows=15 id="texto">
"RBD|X|RBD|C|92173~GJHGWO.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX4.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX6.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGX8.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGXA.NAYE" "SAMBORNSiPOSSSTHRa"
"RBD|X|RBD|C|92173~GJHGXC.NAYE" "SAMBORNSiPOSSSTHRa"
</textarea>

<script>
    var lines = document.getElementById("texto").value.split('\n');
    var splitWords  = lines.filter(function(v) { return v.length > 0})
                           .map(function(currentValue, index) {
        return currentValue.trim().replace(/^"([^"]+)"\s"([^"]+)"$/, '$1$2').split(/[|~]/);
    });
    console.log(JSON.stringify(splitWords, null, 4));
</script>
gaetanoM
  • 41,594
  • 6
  • 42
  • 61