bash parse space delimited text file

Question

example file

aaa [bbb bb] ccc "ddd dd" eee
bbb [ccc cc] ddd "eee ee" fff

expected:

line1
s1="aaa", s2="bbb bb", s3="ccc", s4="ddd dd", s5="eee"
line2
s1="bbb", s2="ccc cc", s3="ddd", s5="eee ee", s5="fff"

Thanks in advance!

What did you try for yourself? and what are the variable `s1` to `s5`. You want them stored in multiple variables? Why not an array — Inian, Nov 30 '18 at 20:03
I tried AWK. awk -F" " '{print $1, $2, $3, $4, $5}' , not sure what to put delimiter -F — Alex Tang, Nov 30 '18 at 20:07
It would help to use a standard file format, and a language that already has a parser for that format. This type of data processing really isn't what the shell is intended for. — chepner, Nov 30 '18 at 20:09
agreed. I am a java guy however I have to use bash to parse this type of log file. Thank you — Alex Tang, Nov 30 '18 at 20:12
You posted input and output, or do you want to have bash variables set that way? I find this question as unclear. Also, bash uses no `,` as a separator nowhere, you want to have a `,` character suffixed to all variables? — KamilCuk, Nov 30 '18 at 20:31

score 1 · Answer 1 · answered Nov 30 '18 at 20:24

Using gnu awk you may use this:

awk -v OFS=", " -v FPAT='\\[[^]]*\\]|"[^"]*"|[^[:space:]]+' '{
   for (i=1; i<=NF; i++) {
      gsub(/^[["]|[]"]$/, "", $i)
      $i = "s" i "=\"" $i "\""
   }
   $0 = "line" NR ORS $0
} 1' file

Output:

line1
s1="aaa", s2="bbb bb", s3="ccc", s4="ddd dd", s5="eee"
line2
s1="bbb", s2="ccc cc", s3="ddd", s4="eee ee", s5="fff"

score 0 · Answer 2 · answered Nov 30 '18 at 22:10

bash-only -

$: IFS=']"[' read -a line < infile # read the "groups"
$: line=( "${line[@]% }" )         # strip training spaces
$: line=( "${line[@]# }" )         # strip leading spaces

The line array now has your scrubbed data.

Shown in steps -

$: IFS=']"[' read -a line < infile
$: printf "[%s]\n" "${line[@]}"
[aaa ]
[bbb bb]
[ ccc ]
[ddd dd]
[ eee]
$: line=( "${line[@]% }" )
$: printf "[%s]\n" "${line[@]}"
[aaa]
[bbb bb]
[ ccc]
[ddd dd]
[ eee]
$: line=( "${line[@]# }" )
$: printf "[%s]\n" "${line[@]}"
[aaa]
[bbb bb]
[ccc]
[ddd dd]
[eee]

bash parse space delimited text file

2 Answers2