3

Is there any way to ignore "stop words" while sorting.

For example: I have words like

dixit

singla

the marklogic

On sorting in descending order the result should be singla, the marklogic, dixit

As in the above example the is ignored.

Any way to achieve this?

Update:

Stop word can occur at any place. for example

the MarkLogic

MarkLogic is the best

the MarkLogic is awesome

while sorting should not consider any stop word in the text.

Above is just a small example to describe the problem.

In actual I am using search:search API. For sorting, I am using sort-order search options. The element on which I have to perform sorting is dynamic. There are approx 30-35 elements.

Is there any way to customize the collation at this level like to configure some words (stop words) which will be ignored while sorting.

Dixit Singla
  • 2,540
  • 3
  • 24
  • 39

2 Answers2

3

There is no standard collation URI that is going to do this for you (at least none that I've ever seen). You can do it dynamically, of course, by sorting on the result of a function invocation, but if you want it done efficiently at scale (and available to search:search), then you need to materialize the sortable string into your document. I've often done this as an attribute on the element:

<title sortable="Great Gatsby, The">The Great Gatsby</title>

Then you put a range index on the title/@sortable attribute.

You can also use the "envelope pattern" where materialized metadata like this is maintained in its own section of the document with the original kept in its own section. For things like this, I think it's a bit more elegant to decorate the elements directly, to keep the context.

hunterhacker
  • 6,378
  • 1
  • 14
  • 11
1

If I understand your question correctly you're trying to get rid of the definite article when sorting your result-set.

In order to do this you need to use some additional functions and create a 'sort' criteria. My solution would look like this (I'm also including some sample documents so that you can test this just by copy-pasting):

(:
xdmp:document-insert("/peter.xml", <person><firstName>Peter</firstName><lastName>O'Toole</lastName><age>60</age></person>);
xdmp:document-insert("/john.xml", <person><firstName>John</firstName><lastName>Adams</lastName><age>18</age></person>);
xdmp:document-insert("/simon.xml", <person><firstName>Simon</firstName><lastName>Petrov</lastName><age>22</age></person>);
xdmp:document-insert("/mark.xml", <person><firstName>Mark</firstName><lastName>the Lord</lastName><age>25</age></person>);
:)

for $person in /person
let $sort := fn:reverse(fn:tokenize($person/lastName, ' '))[1]
order by $sort
(: return $person :)
return $person/lastName/text()

Notice that now the sort order is going to be

 - Adams
 - the Lord
 - O'Toole
 - Petrov

I hope this will help.

Tamas
  • 10,953
  • 13
  • 47
  • 77
  • Thanks for your reply. stop words can occur at any place. Start of the text, middle of the text, end of the text. I have Updated the question accordingly. – Dixit Singla May 31 '17 at 05:17
  • In this case you need to update this logic to suit your needs. – Tamas May 31 '17 at 07:58
  • Can't use the above logic as I am using search:search API, to sort the data adding the `` in search options. The element I need sorting on is dynamic. I thought there would be an option to customize the collations to full fill my needs. Thanks for your valuable time :) – Dixit Singla May 31 '17 at 08:03
  • Updated the question. Please let me know if need more detail. – Dixit Singla May 31 '17 at 08:16