How do I implement full text search in Chinese on PostgreSQL?

Question

This question has been asked before:

Postgresql full text search in postgresql - japanese, chinese, arabic

but there are no answers for Chinese as far as I can see. I took a look at the OpenOffice wiki, and it doesn't have a dictionary for Chinese.

Edit: As we are already successfully using PG's internal FTS engine for English documents, we don't want to move to an external indexing engine. Basically, what I'm looking for is a Chinese FTS configuration, including parser and dictionaries for Simplified Chinese (Mandarin).

As we were unable to find a solution for this (even with the bounty I offered) we eventually moved to SQL Server, which natively supports Chinese FTS. Luckily our application was designed to be fairly DB vendor agnostic, so this wasn't a huge problem for us. — Mike Chamberlain, Dec 20 '10 at 10:35

score 6 · Accepted Answer · answered May 21 '15 at 09:25

6

I know it's an old question but there's a Postgres extension for Chinese: https://github.com/amutu/zhparser/

answered May 21 '15 at 09:25

Rui Pacheco

110
1
7

I'm getting `text-search query contains only stop words or doesn't contain lexemes, ignored` issues. See https://stackoverflow.com/questions/41659909/fts-non-latin-text-search-query-contains-only-stop-words-or-doesnt-contain-lex – user3871 Jan 17 '17 at 15:33
@Growler page not found. – Weihang Jian Jan 22 '20 at 02:06

score 3 · Answer 2 · answered Jan 18 '13 at 06:08

3

I've just implemented a Chinese FTS solution in PostgreSQL. I did it by creating NGRAM tokens from Chinese input, and creating the necessary tsvectors using an embedded function (in my case I used plpythonu). It works very well (massively preferable to moving to SQL Server!!!).

answered Jan 18 '13 at 06:08

simon

15,344
5
45
67

Is it available anywhere or you are also using the zhparse above? – user3507584 Feb 11 '23 at 07:40

score 2 · Answer 3 · edited Feb 13 '13 at 04:19

2

Index your data with Solr, it's an open source enterprise search server built on top of Lucene.

You can find more info on Solr here:

http://lucene.apache.org/solr/

A good book on how-to (with PDF download immediately) here:

https://www.packtpub.com/solr-1-4-enterprise-search-server/book

And be sure to use a Chinese tokenizer, such as solr.ChineseTokenizerFactory because Chinese is not whitespace delimited.

edited Feb 13 '13 at 04:19

simon

15,344
5
45
67

answered Oct 22 '10 at 06:57

Chris Adragna

625
8
18

We need to use the FTS engine built into Postgres. We have already successfully implemented English FTS, and want to continue to use the same system for Chinese documents. – Mike Chamberlain Oct 24 '10 at 23:10
1

Oh, I see. Well, then my answer isn't helpful to you. I see your clarification/edit on the question since your original post. I'm not sure what your timeline will accomodate, but the Solr solutions are open source. You *may* be able to borrow from the ChineseTokenizerFactory -- it's logic overcomes the inherent problem as I understand it to be, that the language is not whitespace delimeted. Best of luck to you. – Chris Adragna Oct 25 '10 at 14:14

How do I implement full text search in Chinese on PostgreSQL?

3 Answers3

Linked