If the DataValue and DataKey items don't can't contain <
or >
or '-' chars or spaces you can do something like this:
Read your file in a string and to a replaceAll with a regex similar to this: ([^- \t]+)-([^- \t]+)
and use this as a replacement (<$1>$2</$1>
). This will convert something like this: DataKey01-DataValue01
into something like this: <DataKey01>DataValue01</DataKey01>
.
After that you need to run another global replace but this regex ^([^ \t]+)(\s+(?:<[^>]+>[^<]+</[^>]+>[\s\n]*)+)
and replace with <$1>$2</$1>
again.
This should do the trick.
I don't program in VB.net so i have no idea if the actual syntax is correct (you might need to double or quadruple the \
in some cases). You should make sure the enable the Multiline option for the second pass.
To explain:
([^- \t]+)-([^- \t]+)
- (
[^- \t]+
) will match any string of chars not containing
or -
or \t
. This is marked as $1 (notice the parentheses around it)
-
will match the -
char
- (
[^- \t]+
) will again match any string of chars not containing
or -
or \t
. This is also marked as $2 (notice the parentheses around it)
- The replacement will just convert a
ab-cd
string matched with <ab>cd</ab>
After this step the file looks like:
KEYWORD0 <DataKey00>DataValue00</DataKey00> <DataKey01>DataValue01</DataKey01>
<DataKey02>DataValue02</DataKey02> <DataKey0N>DataValue0N</DataKey0N>
KEYWORD1 <DataKey10>DataValue10</DataKey10> <DataKey11>DataValue11</DataKey11>
<DataKey12>DataValue12</DataKey12> <DataKey13>DataValue12</DataKey13>
<DataKey14>DataValue12</DataKey14> <DataKey1N>DataValue1N</DataKey1N>
^([^ \t]+)(\s+(?:<[^>]+>[^<]+</[^>]+>[\s\n]*)+)
^([^ \t]+)
mark and match any string of non
or \t
beginning at the line (this is $1
)
(
begin a mark
\s+
white space
(?:
non marked group starting here
<[^>]+>
match an open xml tag: <ab>
[^<]+
match the inside of a tag bc
</[^>]+>
match an closing tag </ab>
[\s\n]*
some optional white space or newlines
)+
close the non marked group and repeat at least one time
)
close the mark (this is $2
)
The replacement is straight forward now.
Hope it helps.
But you should probably try to make a simple parser if this is not a one off job :)