2

I would like to split a String into an array using delimiters and keep those delimiters. I tried using IFS but it re,oves the delimiters.

For example:

ligne="this.is/just(an]example"
IFS='}|//|)|(| |{|[|]|.|;|/"|,' read -ra ADDR <<< "$ligne"
for i in "${ADDR[@]}"; do
   echo $i
done

I want the result to be like this:

this
.
is
/
just
(
an
]
example

Thanks for your help!

M. Ebner
  • 33
  • 5
  • `bash` isn't really intended for this level of data processing; whatever features it has are typically meant for simple filename manipulation. – chepner Sep 21 '18 at 15:17

2 Answers2

2

You may use grep with -o option:

grep -oE '[^][^./(){};:,"]+|[][^./(){};:,"]' <<< "$ligne"

this
.
is
/
just
(
an
]
example

Regex in use is alternation based with 2 alternations:

  • [^][^./(){};:,"]+: Match 1+ of any character that is not in character class
  • |: OR
  • [][^./(){};:,"]: Match any character that is in the character class
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    OP might want to remove the `+` from the second alternation, empty fields are often represented by consecutive delimiters, in which case it would be better to match every delimiter as a separate occurence rather than regroup them. As a side note I'm very surprised your character classes aren't broken by the unescaped `]`. Is it because neither `[]` nor `[^]` would be valid character classes? – Aaron Sep 21 '18 at 14:26
  • 1
    @Aaron If you want `]` in your bracket expression, it has to be the first character, optionally preceded by `^` for a negated class. – Benjamin W. Sep 21 '18 at 14:27
  • @Aaron: Good point on not using quantifier `+` for 2nd alternation (edited). – anubhava Sep 21 '18 at 14:28
0

There is no trivial solution to this with Bash builtins as far as I know, but if that's what you need, you could do something like this.

ligne="this.is/just(an]example"
array=()
while true; do
    for delim in '}' '//' ')' '(' ' ' '{' '[' ']' '.' ';' '/"' ','; do
        frag=${ligne#*"$delim"}
        [ "$frag" = "$ligne" ] || break
    done
    [ "$frag = "$ligne" ] && break
    head=${ligne%"$frag"}
    array+=("${head%"$delim"}" "$delim")
    ligne=$frag
done
tripleee
  • 175,061
  • 34
  • 275
  • 318