0

I have an xml-file - call it myXML.xml - like this:

<?xml version="1.0" encoding="UTF-8"?>
<Metrics info1="1" info2="2" info3="3" xmlns="http://metrics.sourceforge.net/2003/Metrics-First-Flat">
    <Metric id = "NORM" description ="Number of Overridden Methods">
      <Values per = "type" total = "135" avg = "0.452" stddev = "0.94" max = "5">
        <Value name="a" source ="a.java" package ="package.a" value ="1"/>
        <Value name="b" source ="b.java" package ="package.b" value ="34"/>
        <Value name="c" source ="c.java" package ="package.c" value ="4"/>
        <Value name="d" source ="d.java" package ="package.d" value ="99"/>
        <Value name="e" source ="e.java" package ="package.e" value ="99"/>
        <Value name="f" source ="f.java" package ="package.f" value ="99"/>
        <Value name="g" source ="g.java" package ="package.g" value ="99"/>
      </Values>
    </Metric>

    <Metric id = "NOI" description ="Number of Overridden Methods">
      <Values per = "type" total = "135" avg = "0.452" stddev = "0.94" max = "5">
        <Value name="a" source ="a.java" package ="package.a" value ="10"/>
        <Value name="b" source ="b.java" package ="package.b" value ="340"/>
        <Value name="c" source ="c.java" package ="package.c" value ="40"/>
        <Value name="d" source ="d.java" package ="package.d" value ="990"/>
      </Values>
    </Metric>
</Metrics>

Because I have to evaluate dozens of such files (like myXML.xml) over dozens of attributes (here id=NORM and id=NOI) I tried to automate this in Apache Ant.

The best case scenario would be to get for a fixed file (myXML.xml) a csv-file in return - which will be saved as myXML.csv - and looks something like

NORM 1, 34, 4, 99, 99, 99, 99
NOI 10, 340, 40, 990

To approach this, I thought to create a property file <property file="metrics.properties"/> which looks like

p_1 = NORM
p_2 = NOI
...
p_N = VG

where N is arbitrary, so Ant has to figure out N (in the small example here N=2) and create the csv-file as mentioned above over all p_i's. Further I guess I should rewrite the below xquery as a function of the file (myXML.xml) and NORM and run it from the command line. But I don't see how to do either of this.

The following xquery is partially doing what I am interested in:

declare option db:stripns 'true';
for $x in doc("myXML.xml")/Metrics/Metric[@id="NORM"]/Values//Value/@value
return data($x)

but both myXML.xml and NORM are fixed and the output is simply 1 34 4 99 99 99 99 . I saved this file in query.xq and ran it in Ant:

<target name="ant" depends="#1">
 <echo> ant </echo>
 <exec executable="${pathToAnt}/basex.bat" dir="${basedir}" error="${basedir}/output/error.txt">
  <arg value = "query.xq"/> 
  <redirector output="${basedir}/output/myXML.csv" alwayslog="true"/>
 </exec>
</target>

That's what I have - little far from what I intend to get.

I hope it's clear what I am trying to achieve. I am new to xquery aswell to ant and I am using BaseX (not a must) under Windows, thus this is quite challenging to me ;-).

Thanks a lot for any help, hints, questions, etc.

amix
  • 133
  • 1
  • 12
  • ANT is a build tool, not a scripting language.... I must ask why? – Mark O'Connor Jul 22 '14 at 18:15
  • So you suggest to solve the automation differently. What tool would you recommend? What do you mean with: I must ask why? – user162037 Jul 23 '14 at 07:06
  • @user162037 if you want powerful programming ability, two choices: 1. **write Ant tasks** -- actually you are writing Java, just calling it in Ant, this seperates complex logic from your build file, so, let Ant do its job, off load complex things to tasks; 2. go with Gradle (or other programming language based build tool) so that you can use Groovy (or other language) -- but still make sure you can seperate your build logic and scripting logic before you go. – Dante WWWW Jul 23 '14 at 08:27
  • doing the "automation" in ant with help of antcontrib instead of Gradle/Groovy you would not recommend? – user162037 Jul 23 '14 at 10:17
  • @user162037 I'm not a fan of ant-contrib. If you want to do complex scripting within ANT I recommend embedding a scripting language like javascript or my favourite groovy. See: http://stackoverflow.com/questions/13990723/how-to-rename-n-files-and-n-folders-with-ant/13998196#13998196 . That's why I asked "why?". Why do this in ANT? – Mark O'Connor Jul 23 '14 at 20:28
  • @user162037 well you can use ant-contrib task `` + regex to select 1 ~ N to a comma seperated list, use a `` to iterate through the list, and use `` to remake each property name in each iteration. However, it's complicated, against the nature of Ant. If you find combination of several Ant tasks hard to understand and maintain, hide it behind Ant task, or use embedded script. – Dante WWWW Jul 25 '14 at 07:55

2 Answers2

0

Thanks for your help. I figured it out:

A for loop can be done using http://ant-contrib.sourceforge.net/tasks/tasks/for.html. I did an iteration over all my source files (their names are stored in fileNames) which looks like

<for list="${fileNames}" delimiter="," param="nameIter">
 <sequential>
  <echo> loop over fileNames: nameIter=@{nameIter} </echo>
  <exec executable="${pathToAnt}/basex.bat" dir="${basedir}" error="${basedir}/output/error_baseX/@{nameIter}Error.txt">
   <arg value="-b$importList=${metricsList}" />
   <arg value="-b$name=@{nameIter}"/>
   <arg value="./source_data/data/query.xq"/>
   <redirector output="${basedir}/output/@{nameIter}.csv" alwayslog="true"/>
  </exec>
 </sequential>
</for>

Now, the exec-part runs the following xquery from command line, where the variable metricsList consist of all the metrics I am interested in. In the xml above for instance this would be metricsList=NORM,NOI. The xquery file query.xq is

declare option db:stripns 'true';
declare variable $name external;
declare variable $importList external;
declare variable $list as xs:string* := tokenize($importList, ',');
for $i in $list
let $x := doc($name)/Metrics/Metric 
let $nl := "&#10;" (: this is a newline:)
return ($nl,data($x[@id=$i]/Values/../@id), data($x[@id=$i]/Values/Value/@value))
0

I see, this is nearly five years old, but for anyone coming later with a similar question in mind, this is a way it can be solved by only using XQuery without Ant.

This should be processor agnostic (I use BaseX here), as long as the processor supports the EXPath file module (the major ones do). It may be, that the collection() function behaves differently, BaseX either reads all XML files it finds in a directory (that's the method we use here) or interprets the path as a path within its own, internal database.

Since the XML has a named namespace ("http://metrics.sourceforge.net/2003/Metrics-First-Flat") we must acknowledge that in our XPath expressions. There is two ways to do so: We can declare a default namespace for elements in the prolog (our approach here) or we could just add a prefix wildcard in front of each element's name in our XPath expression (*:Values/*:Value).

Since the result will be a sequence of strings (and we need a single string for our CSV), we concatenate the segments and add a literal comma after all but the last segment via a little inline function, compose the final string via string-join() and write the CSV to disk.

declare default element namespace "http://metrics.sourceforge.net/2003/Metrics-First-Flat";

let $path := "/path/to/folder/with/XML/files/"
let $docs := collection($path)
let $decorate := function($sequence) {
  for $i in subsequence($sequence, 1, count($sequence) - 1)
    return $i || ","
 ,subsequence($sequence, count($sequence))
}
for $doc in $docs/Metrics
count $cnt                       (: this helps to create sequential file names:)
let $norm := ( "NORM",
               for $metric in $doc/Metric[@id="NORM"]
               return $metric/Values/Value/@value/data()
             )
let $noi := ( "NOI",
              for $metric in $doc/Metric[@id="NOI"]
              return $metric/Values/Value/@value/data()
            )
return
  file:write(
    concat("/path/to/file-", $cnt, ".csv")
   ,concat(
    string-join($decorate($norm))
   ,out:nl()                      (: BaseX specific, creates a 'newline' :)
   ,string-join($decorate($noi))
  ))
amix
  • 133
  • 1
  • 12