1

I got some dataset with ads of diffrent companys.

for example

<jobs>
    <job>
        <company>A</company>
        <value>Testvalue</value>
    </job>
    <job>
        <company>A</company>
        <value>Testvalue</value>
    </job>
    <job>
        <company>B</company>
        <value>Testvalue</value>
    </job>
    <job>
        <company>C</company>
        <value>Testvalue</value>
    </job>
</jobs>

What I´m trying to do is generate a custom output. I want to have only 1 record for each company

Example output wanted:

<jobs>
    <job>
        <company>A</company>
        <value>Testvalue</value>
    </job>
    <job>
        <company>B</company>
        <value>Testvalue</value>
    </job>
    <job>
        <company>C</company>
        <value>Testvalue</value>
    </job>
</jobs>

What I try'ed is the following:

If company is not within array, append it to an array & append the item to another array.

(: loop through job in jobs :)
for $ad in //jobs/job
  (: firmenarray, "unique" ads :)
  let $companys := ()
  let $ads := ()
  (: declare company of ad:)
  let $company := $ad//company[1]
  (: if ad/company not within companyarray > add & concat to ads :)
  let $test := if(not(fn:index-of($companys, $company))) then(
    (: add ad/company to companys :)
    $companys = fn:insert-before($companys, 0, $company),
    (: add jobs/job to ads :)
    $ads = fn:insert-before($ads, 0, $ad)
  )
return $ads

somehow It does not work and im kind of stuck finding out why...

Snackaholic
  • 590
  • 7
  • 27

2 Answers2

5

The group by solution by Martin Honnen is the obvious and best one. But if you want to iteratively populate a sequence or array in XQuery, then it is important to understand that your approach cannot work in a functional language like XQuery, since all variables are immutable. Understanding the basics of functional programming is really important if you want to go beyond simple XPath and FLWOR expressions.

The "equivalent" to iteration in functional languages is recursion, so here is a recursive solution to your task using a useer-defined function:

declare function local:unique($companies, $unique) {
  if(empty($companies)) then $unique
  else if($companies[1]/company = $unique/company)
  then local:unique(tail($companies), $unique)
  else local:unique(tail($companies), ($unique, $companies[1]))
};

<jobs>{
  local:unique(/jobs/job, ())
}</jobs>

This specific pattern of iterating through a sequence and aggregating a result is so common that it is even abstracted out into its own standard function, namely fn:fold-left($sequence, $start-value, $aggregation-function). With its help the solution becomes pretty short:

<jobs>{
  fn:fold-left(/jobs/job, (), function($companies, $company) {
    if($company/company = $companies/company) then $companies
    else ($companies, $company)
  })
}</jobs>

But since you compare each new entry with all previously found unique company entries, this soution is still rather inefficient. A well implemented group by will probably always beat it.

Leo Wörteler
  • 4,191
  • 13
  • 10
  • Thank you for your in depth explanation why my attempt was not possible. I still dont understand though how recursion solves my problem here: First iteration with company A will be true since companies is empty. (if(empty($companies)) then $unique) Second iteration should deliver the same result since I dont see anything that adds the company to companies :S can you explain this to me aswell? – Snackaholic Dec 12 '18 at 09:31
  • 1
    The data movement is the other way around. The first argument `$companies` starts out with all `` elements from you input data (at `local:unique(/jobs/job, ())`), and then each of them is taken off the front of the sequence and potentially added to `$unique`. the recursion stops when all `job`s have been removed. – Leo Wörteler Dec 12 '18 at 10:31
4

Text book grouping example:

<jobs>
{
    for $job in jobs/job
    group by $company := $job/company
    return $job[1]
}
</jobs>

https://xqueryfiddle.liberty-development.net/b4GWVb

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110