I have one file with several elements <elem>...</elem>
. I need to split this file into n
files with m
elements each one (argument passed to awk command I am using). For example if my original file has 40 elements, I would want to split in 3 files (10 elements, 13 elements and 17 elements).
The problem is that the original file has elements with different structures.
EDITED AFTER fedorqui comment:
I use as awk command as files I want to get at the end of the process.
That means If I need 3 files with m1, m2 and m3 elements, I will
execute 3 awk with different parameters
Example of input (file.txt) (5 elements)
<elem>aaaaaaaa1</elem>
<elem>aaaaaaaa2</elem>
<elem>bbbbbbbb
bbbbbbbbb
bbbbbbbbb</elem>
<elem>bbbbbbbb2</elem>
<elem>ccccc
cccc</elem>
As you can see, 1st/2nd/4th element is in one line, 3rd element is in 3 lines without blank lines and 5h element is in 3 lines with an blank line.
Blank lines between elements is not a problem but blank lines inside an element fails
Example of desired output:
file_1.txt (2 elements)
<elem>aaaaaaaa1</elem>
<elem>aaaaaaaa2</elem>
file_2.txt (2 elements)
<elem>bbbbbbbb
bbbbbbbbb
bbbbbbbbb</elem>
<elem>bbbbbbbb2</elem>
file_3.txt (1 element)
<elem>ccccc
cccc</elem>
AWK command
(suffixFile is the suffix number of the file. For example fileAux_1.txt, fileAux_2.txt...)
Attempt1
awk -v numElems=$1 -v suffixFile=$2 '{
for(i=1;i<=numElems;i++) {
printf "<doc>"$i > "fileAux_" suffixFile".txt"
}
}' RS='' FS='<doc>' file.txt
Works except for blank lines inside an element. I understand why it fails, because RS='' tells awk to split by blank lines
Attempt 2
awk -v numElems=$1 -v suffixFile=$2 '{
for(i=1;i<=numElems;i++) {
printf $i > "fileAux_" suffixFile".txt"
}
}' RS='<doc>' FS='<doc>' file.txt
Another aproach but it also fails
¿Can anyone help me?
Thanks in advance!