1

I am using Zend Search Lucene to index a number of DOCX files.

$index = Zend_Search_Lucene::create($indexpath);
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num_CaseInsensitive());
$doc = Zend_Search_Lucene_Document_Docx::loadDocxFile($file);
$index->addDocument($doc);

This indexes the last modified date under a field call modified in a format as follows

2012-01-19T11:56:00Z

If I attempt to perform a range search on this value e.g.

Zend_Search_Lucene_Search_QueryParser::parse('modified:[2012-01-01 TO 2012-04-01]');

I receive the following error message

Uncaught exception 'Zend_Search_Lucene_Search_QueryParserException' with message 'Range query boundary terms must be non-multiple word terms'

Does anyone know how to perform a range search on the date field created by the Zend DOCX parser?

Andrew
  • 1,179
  • 8
  • 15

2 Answers2

0

According to the documentation, Zend_Search_Lucene_Search_QueryParserException is thrown when there is an error in the query syntax.

So I checked out the source code, and this is where that error is thrown from:

$tokens = Zend_Search_Lucene_Analysis_Analyzer::getDefault()->tokenize($this->_rqFirstTerm, $this->_encoding);
if (count($tokens) > 1) {
   require_once 'Zend/Search/Lucene/Search/QueryParserException.php';
   throw new Zend_Search_Lucene_Search_QueryParserException('Range query boundary terms must be non-multiple word terms');
} else if (count($tokens) == 1) {
   require_once 'Zend/Search/Lucene/Index/Term.php';
   $from = new Zend_Search_Lucene_Index_Term(reset($tokens)->getTermText(), $this->_context->getField());
} else {
   $from = null;
}

$tokens = Zend_Search_Lucene_Analysis_Analyzer::getDefault()->tokenize($this->_currentToken->text, $this->_encoding);
if (count($tokens) > 1) {
   require_once 'Zend/Search/Lucene/Search/QueryParserException.php';
   throw new Zend_Search_Lucene_Search_QueryParserException('Range query boundary terms must be non-multiple word terms');
} else if (count($tokens) == 1) {
   require_once 'Zend/Search/Lucene/Index/Term.php';
   $to = new Zend_Search_Lucene_Index_Term(reset($tokens)->getTermText(), $this->_context->getField());
} else {
   $to = null;
}

This is contained in the openedRQLastTerm() function which it will Process last range query term (opened interval).

After looking into what is wrong then with the query and why it can't tokenize it, I discovered a possible solution in the documentation concerning how to do ranged queries:

Range queries allow the developer or user to match documents whose field(s) values are between the lower and upper bound specified by the range query. Range Queries can be inclusive or exclusive of the upper and lower bounds. Sorting is performed lexicographically.

mod_date:[20020101 TO 20030101]

So, you may have some luck by removing the hyphens in your date. Also, consider something mentioned in a forum:

You have to switch default analyzer to TextNum before indexing and search. Default analyzer skips numbers:

Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive())

And also:

'publishDate' fields need to be set as 'keyword' while indexing,otherwise ranged query fetches no results.

Hopefully all of that information helps you solve your problem! Good luck.

Jeremy Harris
  • 24,318
  • 13
  • 79
  • 133
  • Thanks for the assistance, just looking at the code made me realise that it was splitting my text into multiple words and that I needed to escape the hyphen character. Without the hyphen the search does not work unless your ranges are only in years i.e. 198001 TO 201212 would return 0 results. – Andrew Apr 18 '12 at 08:53
0

Found the answer so simple that I feel a little foolish.

I need to put my date in quotes to pass through the token as a single word e.g.

modified:["2012-04" TO "2012-01"]
Andrew
  • 1,179
  • 8
  • 15