Solr search for exact phrase / substring

Question

I am using solr for my work and it's excellent. However I am having trouble generating more elaborate search results.

I am searching for products by their title, brand, gender, and category (dress shoes, jackets, etc). Brands live in a "Brands" DB table, and the same for categories and genders. Products live in a "Products" DB table which is foreign-keyed to the Brands, Categories and Genders tables.

I am loading all of these into solr, and I can do a weighted ranked search accross them without trouble. This will give the most similar products, weighed by certain fields. What I would like is to do next is find exact matches from each field for any search string. For example:

SEARCH STRING: "Michael Kors Light Green Men's Dress Shoes"

SHOULD MATCH:

Brands:

Michael Kors

Colours:

Light Green
Green

Gender:

Mens

Category:

Dress Shoes
Shoes

I can then do a more restrictive - but categorised - intersect search. E.g. all products that are [light green] AND [michael kors] AND [Dress Shoes OR Shoes]

Thanks :)

I think this is related: https://stackoverflow.com/questions/12395990/exact-field-search-with-solr-lucene — mils, Sep 29 '15 at 04:43

score 1 · Answer 1 · answered Sep 28 '15 at 09:32

1

You can try with Boolean Query A boolean query contains multiple clauses.

http://localhost:8983/solr/query?q=(Brands:"Michael Kors") AND (Colours:"Light Green") AND (Category:(Dress Shoes OR Shoes))

answered Sep 28 '15 at 09:32

Abhijit Bashetti

8,518
7
35
47

1

@mils : Didnt get you? What exactly is the issue? – Abhijit Bashetti Sep 28 '15 at 10:19
1

I don't know how better to explain it. We want to find exact substring matches of a user's query. E.g. "Men's Light Green Shoes" will return colours "Light Green" and Green" and NOT "Light Blue" – mils Sep 30 '15 at 05:08
1

in that case you need to check how to build your fieldType for the same..try with ShingleFilterFactory...read more on https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory – Abhijit Bashetti Sep 30 '15 at 05:45
I don't think that makes sense. If we use a min shingle of 2, we won't match "green", only "light green", which is not what we want – mils Sep 30 '15 at 06:40
Try by having 2 fields and both the fields being indexed by different fieldType. One field with fieldType having shinglefilter and other fieldType having whitespace tokeniser... – Abhijit Bashetti Sep 30 '15 at 07:32

score 1 · Answer 2 · answered Sep 28 '15 at 19:18

@mils More looking for Search results you should consider using a different query parser. I think this link is worth a read if any of the available query parser work for you. https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

score 1 · Answer 3 · answered Sep 29 '15 at 11:31

1

You can change the schema fields from text to string. That would give you exact match - but at the expanse of having to handle upper/lower case by yourself.

The Dismax and Edismax parsers would give you the easiest option to search across several fields.

answered Sep 29 '15 at 11:31

Uri Shtand

1,717
11
14

Although it sounds like the right solution, the string type does not appear to be working for us – mils Sep 30 '15 at 05:06
String does not work with EDisMax. That is, the ENTIRE query has to match the indexed value exactly. – mils May 25 '16 at 02:17

score 1 · Answer 4 · answered Oct 01 '15 at 03:39

This is really a question about "text tagging" (also sometimes called "named entity recognition").

In the context you're pursuing, Daniel Tunkelang considers this an essential part of "Query Understanding".

Lucene has some data-structures which can be used to implement this sort of feature (see the OpenSextant project as an example), but Solr doesn't offer this feature (beyond approximate solutions using shingles as described above).

The reason that this is hard, is because you need document frequency information for each term/phrase in your query, across every field you care about, before you run your query!.

Slow, inelegant Solr solution:

If you're willing to run two queries, you can approximate your goal using facets:

Run normal text string query Q1: requesting term facets on brand, colour, gender and category (stored as strings)
Tokenize Q1 into 1 and 2-term shingles.
Compare your Q1 query shingles with the top facet values returned for each field requested in the Q1 results.
Whenever you see an exact match, apply your intersecting filter to a new query, Q2: the original query Q1 plus your new, restrictive criteria.
Run Q2

(A nice side-effect here is that your query narrower will be able to see the total-count and facet counts returned from Q1 while constructing Q2, so you can decide to omit/relax certain restrictions should the number of matching results drop too low)

Solr search for exact phrase / substring

4 Answers4