2

I am going through a final refinement posted by the client, which needs me to do a case-insensitive query. I will basically walk through how this simple program works.

First of all, in my Java class, I did a fairly simple webpage parsing:

title=(String)results.get("title");
doc = docBuilder.parse("http://" + server + ":" + port + "/exist/rest/db/wb/xql/media_lookup.xql?" + "&title="  + title);

This Java statement references an XQuery file "media_lookup.xql" which is stored on localhost, and the only parameter we are passing is the string "title".

Secondly, let's take at look at that XQuery file:

$title := request:get-parameter('title',""),

$mediaNodes := doc('/db/wb/portfolio/media_data.xml'),
$query := $mediaNodes//media[contains(title,$title)],

Then it will evaluate that query. This XQuery will get the "title" parameter that are passes from our Java class, and query the "media_data" xml file stored in the database, which contains a bunch of media nodes with a 'title' element node. As you may expect, this simple query will just match those media nodes whose 'title' element contains a substring of what the value of string 'title' is. So if our 'title' is "Chi", it will return media nodes whose title may be "Chicago" or "Chicken".

The refinement request posted by the client is that there should be NO case-sensitivity. The very intuitive way is to modify the XQuery statement by using a lower-case function in it, like:

$query := $mediaNodes//media[contains(lower-case(title/text(),lower-case($title))],

However, the question comes: this modified query will run my machine into memory overflow. Since my "media_data.xml" is quite huge and contains thouands of millions of media nodes, I assume the lower-case() function will run on each of the entries, thus causing the machine to crash.

I've talked with some experienced XQuery programmer, and they think I should use an index to solve this problem, and I will definitely research into that. But before that, I am just posting this problem here to get other ideas or any suggestions, do you think any other way may help? for example, could I tweak the Java parse statement to realize the case-insensitivity? Since I think I saw some people did some string concatenation by using "contains." in Java before passing it to the server.

Any idea or help is welcomed.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Kevin
  • 6,711
  • 16
  • 60
  • 107
  • Good question, +1. See my answer for an explanation why your concerns are not justified. :) – Dimitre Novatchev Jan 02 '11 at 16:25
  • Did you test it? Why a predicate with `fn:contains` wouldn't crash an the same predicate with `fn:lower-case` result as argument to `fn:contains` would crash? It should be slower, yes. That's why a would take the constant `lower-case($title)` out of the expression. But a real smart XQuery engine would do that for you, anyway. –  Jan 02 '11 at 20:44

2 Answers2

2

The refinement request posted by the client is that there should be NO case-sensitivity. The very intuitive way is to modify the XQuery statement by using a lower-case function in it, like:

$query := $mediaNodes//media
            [contains(lower-case(title/text(),lower-case($title))], 

However, the question comes: this modified query will run my machine into memory overflow. Since my "media_data.xml" is quite huge and contains thousands of millions of media nodes, I assume the lower-case() function will run on each of the entries, thus causing the machine to crash.

Such fears are not justified.

Any sane implementation of XPath uses automatic memory for its functions. This means that the memory required for evaluating a particular predicate, including the result of lower-case() becomes freed (in languages with no garbage collection) or unreferenced and ready for garbage collection immediately after the evaluation of the predicate.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
0

A table index probably is not the solution as absebse of an index will slow things down, but not trigger a memory overflow.

I think your best bet is to duplicate the title in your database copying it into an all-lowercase (or uppercase with makes more clear that it was converted) and query the alternate title while presenting the normal title.

To save some processing to you can do the case coversion of $product before the query.

You can drop the ampersand in your URL, I'm not sure all webservers parse the ?& correctly.

rsp
  • 23,135
  • 6
  • 55
  • 69
  • I agree this seems the most elegant solution, I don't have experience with XQuery but this doesn't look like something you can fix in the Java part. XQuery apparantly does not have a containsIgnoreCase method? That would come in handy. – Sebastiaan van den Broek Jan 02 '11 at 14:27