-1

I have a pair of large/long XML files that I'm using nawk to break apart, so that I can work more easily with the pieces that are actually relevant to my project. The code I have is doing what I want, but it's producing files that lack descriptive filenames, which makes it much more time consuming for me to identify which of the child XML files correspond to the data I want to work with. Here is what I have now:

First XML file source

Code that's splitting this file apart:

nawk ' {print > "kingresult"(NR%1?i:i++)".txt"; }' i=1 PI.txt

Second XML file source

Code that's splitting this file apart:

nawk -v RS="</?Results>" -v FS="<Result>" '{ for(N=1; N<=NF; N++) if($N ~ /<[/]/) print FS $N > "stateresult00"++C".xml" }' 20140805_AllState.xml

The first XML file is being split on a line-by-line basis; the second is being split apart wherever nawk finds a new "Result" element. In both cases, however, the resulting filenames look like this:

result1.xml result2.xml result3.xml

... and so on.

It would save a lot of time if the filenames were more descriptive, and looked like this:

result1-John.xml result2-Jane.xml result3-Jake.xml

In the case of the first file, it would be acceptable if only the first word of the line were incorporated into the filename.

In the case of the second XML file, it would be ideal if the first word in the < CandidateName > element could be added to the filename. How do I go about modifying my code to get nawk to create more descriptive filenames?

Seascape
  • 31
  • 7

1 Answers1

0

XSLT 2.0 solution:

<xsl:for-each select="/*/Result">
  <xsl:result-document 
      href="result{position()}-{tokenize(CandidateName, '\s+')[1]}.xml">
    <xsl:copy-of select="."/>
  </xsl:result-document>
</xsl:for-each>
Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Thanks for your answer, but I'm not sure how I would integrate this into my shell script. My server has the libxslt library installed and I added the xsltproc program, but it's throwing errors when I attempt to use this: http://pastebin.com/7PEimWde XSLT stylesheet: http://pastebin.com/b12ztxZa – Seascape Aug 01 '14 at 03:38
  • I did say it was an XSLT 2.0 stylesheet, so you will need an XSLT 2.0 processor such as Saxon to run it. Also, I try to avoid giving complete code that people can execute without attempting to understand it. Your stylesheet attempts to use xsl:for-each at the top level; it needs to be wrapped in an xsl:template. Please work through some XSLT tutorials before you try to use the language, it's too powerful a tool to use without any kind of training. – Michael Kay Aug 01 '14 at 22:01