0

I am parsing XML with regex. It is well known so there is no need to worry about escaping etc and proper XML parsing.

grep is returning multiple lines and I want to store each match to a file.

However, I either get each line in between my tags in my array array=( $list ) or I get the whole output array=( "$list" ).

How can I loop over each match from grep?

My script currently looks like this:

#!/bin/bash

list=$(cat result.xml|grep -ozP '(?s)<tagname.*?tagname>')
array=( "$list" )
arraySize=${#array[@]}
for ((i = 0; i <= $arraySize; i += 1)); do
  match="${array[$i]}"
  echo "$match" > "$i".xml
done
opticyclic
  • 7,412
  • 12
  • 81
  • 155

3 Answers3

1

According to this answer, the upcoming version of grep will change the meaning of the -z flag so that both input and output are NUL-terminated. So that will automatically do what you want, but it's only available today by downloading and building grep from the git repository.

Meanwhile, a rather hackish alternative is to use the -Z flag which terminates the file name with a NUL character. That means you need to print a "filename", which you can do by using -H --label=. That will print an empty filename followed by a NUL before each match, which is not quite ideal since you really want the NUL after each match. However, the following should work:

grep -ozZPH --label= '(?s)<tagname.*?tagname>' < result.xml | {
  i=0
  while IFS= read -rd '' chunk || [[ $chunk ]]; do
    if ((i)); then
      echo "$chunk" > $i.xml
    fi
    ((++i))
  done
}
Community
  • 1
  • 1
rici
  • 234,347
  • 28
  • 237
  • 341
0

Directly cat you lines to a while loop

my_spliting_command | grep something | while read line
do
    echo $line >myoutputfile.txt
done
quazardous
  • 846
  • 10
  • 15
0

You could use grep to grab all the matches first, and then use awk to save each matched pattern into separate files (e.g. file1.xml, file2.xml, etc):

cat result.xml | grep -Pzo '(?s)(.)<tagname.*?tagname>(.)' | awk '{ print $0 > "file" NR ".xml" }' RS='\n\n'   
Quinn
  • 4,394
  • 2
  • 21
  • 19