3

Forgive me if this is a silly question.

I have the following code in one of my migrations:

    DB::statement("
    CREATE MATERIALIZED VIEW searchable_stores AS
    SELECT
    chain_stores.id as chain_store_id,
    local_chain_stores.id as local_chain_store_id,
    chain_stores.website as website,
    chain_stores.name as name,
    chain_stores.cname as cname,
    chain_stores.flyer_url as flyer_url,
    local_chain_stores.city as city,
    local_chain_stores.shopping_mall as shopping_mall,
    local_chain_stores.postal_code as postal_code,
    local_chain_stores.street_address as street_address,
    to_tsvector('sv', chain_stores.name) ||
    to_tsvector('sv', replace(chain_stores.name, ' ', '')) ||
    to_tsvector('sv', local_chain_stores.city) ||
    to_tsvector('simple', to_char(local_chain_stores.postal_code, '99999')) ||
    to_tsvector('sv', coalesce(local_chain_stores.shopping_mall)) ||
    to_tsvector('sv', local_chain_stores.street_address) ||
    to_tsvector('sv', chain_stores.bio)
    as document
    FROM
        local_chain_stores, chain_stores
    WHERE
        local_chain_stores.chain_store_id = chain_stores.id
    ");

And this is how I query the table:

    $results = SearchableStore::
        whereRaw("document @@ plainto_tsquery('simple', ?)", array($searchQuery->getQuery()))
        ->orWhereRaw("document @@ plainto_tsquery('sv', ?)", array($searchQuery->getQuery()))
        ->orderBy('name', 'asc')
        ->get();
    return $results;

This works perfectly fine, unless someone spells some word(s) wrong. And that's what I'm trying to solve. You will also notice these 2 lines:

    to_tsvector('sv', chain_stores.name) ||
    to_tsvector('sv', replace(chain_stores.name, ' ', '')) ||

I know this is ugly, but what I am essentially doing is making sure that if someone does a search for e.g "BurgerKing" I want to return results for "Burger King". I know this looks like a silly example, but these mistakes happen a lot in the swedish language (not necessarily for Burger King, but other terms). Aside from that, I also want to make sure that when someone does a typo I still want to return relevant results.

So, my question is. How do I implement this? I tried playing around with fuzzy search, but I'm kind of stuck because I don't know how to query a column (document) consisting of tsvector values.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
user1904218
  • 237
  • 1
  • 6

2 Answers2

3

Full Text Search works with dictionaries and stemming and is not optimized for fuzzy search and general pattern matching.
Consider the additional module pg_trgm for that. Its operator classes allow GIN or GiST indexes that support LIKE and related pattern-matching operators.

See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks for your valuable input. I actually played around with that module. The problem is that I don't know what the user will be searching for. He could be searching for chain stores, cities, shopping malls, or all three at once. I don't see how using LIKE solves my problem. – user1904218 Jul 04 '14 at 14:29
1

What you looking for are Synonyms and Thesaurus dictionaries, it is a lot of work to compile them, yet is will make your searches way more cost effective in terms of CPU and disk usage.

For some languages you can find Thesaurus dictionaries that can be easily converted in the OpenOffice for others you are on your own. I am still searching for good sources and posted a question about this topic here: https://dba.stackexchange.com/questions/80632/where-to-download-dutch-postgresql-fulltext-search-dictionaries , no answers so far.

If you are running Linux it can also prove worth your will to install the myspell or hunspell packages of your language. For installation of these packages see: https://askubuntu.com/questions/72099/how-to-install-a-libreoffice-dictionary-spelling-check-thesaurus

If you want to know how to use them in PostgreSQL you should read up on configuring full text search, I really like this guy since here writes about it in a light and understandable manner: a full text search engine

Community
  • 1
  • 1
St. Jan
  • 284
  • 3
  • 17