6

Hear is XML. I am trying to get Number of titles published by an author in a date range 15/02/2012 to 24/02/2012 order by highest to lowest (number of titles).

<entries>
<entry>
    <id>1</id>
    <published>23/02/2012</published>
    <title>Title 1</title>
    <content type="html">This is title one</content>
    <author>
        <name>Pankaj</name>
    </author>
</entry>
<entry>
    <id>2</id>
    <published>22/02/2012</published>
    <title>Title 2</title>
    <content type="html">This is title two</content>
    <author>
        <name>Pankaj</name>
    </author>
</entry>
<entry>
    <id>3</id>
    <published>21/02/2012</published>
    <title>Title 3</title>
    <content type="html">This is title three</content>
    <author>
        <name>Rob</name>
    </author>
</entry>
<entry>
    <id>4</id>
    <published>20/02/2012</published>
    <title>Title 4</title>
    <content type="html">This is title four</content>
    <author>
        <name>Bob</name>
    </author>
</entry>
<entry>
    <id>5</id>
    <published>19/02/2012</published>
    <title>Title 1</title>
    <content type="html">This is title five</content>
    <author>
        <name>Pankaj</name>
    </author>
</entry>

I am trying to get output from xquery:

<?xml version="1.0" encoding="UTF-8"?>
<results>
<result>
    <author>
        <name>Pankaj</name>
    </author>
    <numberOfTitles>3</numberOfTitles>
</result>
<result>
    <author>
        <name>Rob</name>
    </author>
    <numberOfTitles>1</numberOfTitles>
</result>
<result>
    <author>
        <name>Bob</name>
    </author>
    <numberOfTitles>1</numberOfTitles>
</result>

Please help me..

Pankaj
  • 177
  • 1
  • 9
  • It may depend on XQuery version you use. What XQuery processor/database is supposed to run that query? – Shcheklein Feb 22 '12 at 19:06
  • Im using oxygen (Saxon-PE Xquery9.2.0.6) for developement. Finally i have to run this query through XCC api on Marklogic. – Pankaj Feb 22 '12 at 19:22

5 Answers5

4

Here's my go at a solution:

<results>{
  for $entry in //entry
  let $date := xs:date(string-join(reverse(tokenize($entry/published, '/')), '-')),
      $author := $entry/author/string()
  where xs:date('2012-02-15') le $date and $date le xs:date('2012-02-24')
  group by $author
  order by count($entry) descending
  return <result>{
    <author>
      <name>{$author}</name>
    </author>,
    <numberOfTitles>{count($entry)}</numberOfTitles>
  }</result>
}</results>

When executed with BaseX, it yields the correct result.

It uses XQuery 3.0 features like group by, otherwise it would be more complicated. I don't know if MarkLogic supports that.

Leo Wörteler
  • 4,191
  • 13
  • 10
4

This XQuery 1.0 solution is executable by any compliant XQuery 1.0 processor:

Note: No group by and no distinct-values() are used.

<results> 
 {
 let $entries := 
    /*/entry
           [for $d in 
                    xs:date(string-join(reverse(tokenize(published, '/')), '-'))
                return
                   xs:date('2012-02-15') le $d and $d le xs:date('2012-02-24')
             ],

  $vals := $entries/author/name
      return
         for $a in  $vals[index-of($vals, .)[1]],
                $cnt in count(index-of($vals, $a)) 
           order by $cnt descending
             return
              <result>
                <author>
                  {$a}
                 </author>
                 <numberOfTitles>
                   {count(index-of($vals, $a))}
                 </numberOfTitles>
              </result>
    }
</results>

when applied on the provided XML document:

<entries>
    <entry>
        <id>1</id>
        <published>23/02/2012</published>
        <title>Title 1</title>
        <content type="html">This is title one</content>
        <author>
            <name>Pankaj</name>
        </author>
    </entry>
    <entry>
        <id>2</id>
        <published>22/02/2012</published>
        <title>Title 2</title>
        <content type="html">This is title two</content>
        <author>
            <name>Pankaj</name>
        </author>
    </entry>
    <entry>
        <id>3</id>
        <published>21/02/2012</published>
        <title>Title 3</title>
        <content type="html">This is title three</content>
        <author>
            <name>Rob</name>
        </author>
    </entry>
    <entry>
        <id>4</id>
        <published>20/02/2012</published>
        <title>Title 4</title>
        <content type="html">This is title four</content>
        <author>
            <name>Bob</name>
        </author>
    </entry>
    <entry>
        <id>5</id>
        <published>19/02/2012</published>
        <title>Title 1</title>
        <content type="html">This is title five</content>
        <author>
            <name>Pankaj</name>
        </author>
    </entry>
</entries>

produces the wanted, correct result:

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <result>
      <author>
         <name>Pankaj</name>
      </author>
      <numberOfTitles>3</numberOfTitles>
   </result>
   <result>
      <author>
         <name>Rob</name>
      </author>
      <numberOfTitles>1</numberOfTitles>
   </result>
   <result>
      <author>
         <name>Bob</name>
      </author>
      <numberOfTitles>1</numberOfTitles>
   </result>
</results>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
4

Here is a solution specific to MarkLogic, using maps to implement grouping efficiently. The input XML has been declared as $INPUT, but you could replace that with a call to doc() or any other accessor.

I also explored this topic in a blog post last year: http://blakeley.com/blogofile/archives/560/

element results {
  let $m := map:map()
  let $start := xs:date('2012-02-15')
  let $stop := xs:date('2012-02-24')
  let $group :=
    for $entry in $INPUT/entry
    let $key := $entry/author/name/string()
    let $date := xs:date(xdmp:parse-yymmdd("dd/MM/yyyy", $entry/published))
    where $date ge $start and $date le $stop
    return map:put($m, $key, 1 + (map:get($m, $key), 0)[1])
  for $key in map:keys($m)
  let $count := map:get($m, $key)
  order by $count
  return element result {
    element author { element name { $key }},
    element numberOfTitles { $count } } }
mblakele
  • 7,782
  • 27
  • 45
2

The following should work in most processors. There are likely more efficient queries in MarkLogic you could make, but this will get you started.

let $doc := <entries>
<entry>
    <id>1</id>
    <published>23/02/2012</published>
    <title>Title 1</title>
    <content type="html">This is title one</content>
    <author>
        <name>Pankaj</name>
    </author>
</entry>
<entry>
    <id>2</id>
    <published>22/02/2012</published>
    <title>Title 2</title>
    <content type="html">This is title two</content>
    <author>
        <name>Pankaj</name>
    </author>
</entry>
<entry>
    <id>3</id>
    <published>21/02/2012</published>
    <title>Title 3</title>
    <content type="html">This is title three</content>
    <author>
        <name>Rob</name>
    </author>
</entry>
<entry>
    <id>4</id>
    <published>20/02/2012</published>
    <title>Title 4</title>
    <content type="html">This is title four</content>
    <author>
        <name>Bob</name>
    </author>
</entry>
<entry>
    <id>5</id>
    <published>19/02/2012</published>
    <title>Title 1</title>
    <content type="html">This is title five</content>
    <author>
        <name>Pankaj</name>
    </author>
</entry>
</entries>

return
 <results>
    {
        for $author in distinct-values($doc/entry/author/name/string())
        return
        <result><author>
            <name>{$author}</name>
            <numberOfTitles>{count($doc/entry[author/name/string() eq $author])} </numberOfTitles>
        </author></result>
    }
 </results>
Eric Bloch
  • 2,882
  • 2
  • 20
  • 26
  • You could add a date constraint to the predicate on entries like $doc/entry[author/name/string() eq $author and XXXX ] ; replace XXX with logic that parses the date format you have and does the requisite comparisons. – Eric Bloch Feb 22 '12 at 19:31
  • This doesn't filter the dates and it doesn't sort, does it? – Leo Wörteler Feb 22 '12 at 19:33
  • Nah, I was lazy, but I'd do it by doing something similar to your answer. Adding another bit to the predicate to filter to the date range and then adding an order by count($doc/entry[author/name/string() eq $author]) to sort. – Eric Bloch Feb 22 '12 at 19:55
2

Here's another solution that is similar to Leo Wörteler:

declare function local:FormatDate($origDate as xs:string) as xs:date 
  {
      xs:date(string-join(reverse(tokenize($origDate, '/')), '-'))
  };

<results>
  {
  for $author in distinct-values(/entries/entry/author/name)
  let $startDate := xs:date('2012-02-15')
  let $endDate := xs:date('2012-02-24')
  order by count(/entries/entry[author/name=$author][$startDate <= local:FormatDate(published) and local:FormatDate(published) <= $endDate]) descending
  return
    <result>
      <author>
        <name>{$author}</name>
      </author>
      <numberOfTitles>{count(/entries/entry[author/name=$author][$startDate <= local:FormatDate(published) and local:FormatDate(published) <= $endDate])}</numberOfTitles>
    </result>
  }
</results>
Daniel Haley
  • 51,389
  • 6
  • 69
  • 95