2

I need to configure thinking sphinx with Spanish stemming and I can't get it to work.

I learned [1] that I needed to compile the sphinx source code with the libstemmer_c library and install it. Additionally, I had to change the configuration of thinking sphinx by adding the libstemmer_es stemmer to morphology.

In detail, this is what I did

  1. Remove existing sphinx installation with apt-get

     apt-get remove sphinxsearch
    
  2. Download and unpack source code of sphinx and the libstemmer_c library and copy content of latter to libstemmer_c directory

     wget http://sphinxsearch.com/files/sphinx-2.2.11-release.tar.gz
     tar xvf sphinx-2.2.11-release.tar.gz
     wget http://snowball.tartarus.org/dist/libstemmer_c.tgz
     tar xvf libstemmer_c.tgz
     cp -rf libstemmer_c/* sphinx-2.2.11-release/libstemmer_c/
    
  3. Configure, compile and install sphinx with the libstemmer_c library

    cd sphinx-2.2.11-release
    ./configure --with-mysql-includes=/usr/include/mysql --with-mysql-libs=/usr/lib/x86_64-linux-gnu --with-libstemmer
    make          
    make install
    
  4. Add libstemmer_es stemmer to morphology in thinking_sphinx.yml

    development:
      mysql41: 3563
      address: <%= ENV['SPHINX_HOST'] || '' %>
      enable_star: true
      charset_type: utf-8
      min_infix_len: 2
      morphology: libstemmer_es
      ...
    
  5. Reconfigure sphinx and regenerate indices

    bundle exec rake ts:configure
    bundle exec rake ts:generate
    
  6. Restart docker containers and rails server

I'm working on a website with various products that are indexed with sphinx. With stemming enabled searching for "cameras" should yield all products with "cameras" or "camera". Currently, searching "cameras" only returns products with "cameras" in the string, but no products with "camera" only.

I'm using Rails 3.2, thinking-sphinx 3.2 and sphinx 2.2.11 on Ubuntu 14.04.4 LTS. Maybe worth to mention that I'm using docker containers. The searchd runs in a separate container apart from the rails application.

UPDATE 1: I can't do rake ts:regenerate since I'm running searchd in a separate docker-container, i.e. my sphinx container. Instead I stop the sphinx container, enter a worker container, run rake ts:clear_rt and rake ts:configure, then restart the sphinx container which also restarts -searchd, enter the sphinx container and then finall run rake ts:generate

UPDATE 2: Content of log/development.searchd.log is

[Thu Mar 16 12:24:59.147 2017] [  127] listening on all interfaces, port=3563
[Thu Mar 16 12:24:59.161 2017] [  127] binlog: replaying log .../development/binlog.001
[Thu Mar 16 12:24:59.161 2017] [  127] binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 indexes
[Thu Mar 16 12:24:59.162 2017] [  127] binlog: finished replaying /opt/sharetribe/tmp/binlog/development/binlog.001; 0.0 MB in 0.000 sec
[Thu Mar 16 12:24:59.162 2017] [  127] binlog: finished replaying total 1 in 0.001 sec
[Thu Mar 16 12:24:59.163 2017] [  127] DEBUG: SaveMeta: Done.
[Thu Mar 16 12:24:59.163 2017] [  127] accepting connections
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
... /* many more ReadLock and Unlock */
[Thu Mar 16 12:28:50.467 2017] [  128] listening on all interfaces, port=3563
[Thu Mar 16 12:28:50.478 2017] [  128] DEBUG: SaveMeta: Done.
[Thu Mar 16 12:28:50.478 2017] [  128] accepting connections
[Thu Mar 16 12:28:55.503 2017] [  128] DEBUG: ReadLock 0x1522ef8
[Thu Mar 16 12:28:55.503 2017] [  128] DEBUG: Unlock 0x1522ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
... /* many more ReadLock and Unlock */
[Thu Mar 16 12:29:09.806 2017] [  128] caught SIGHUP (seamless=1, in queue=1)
[Thu Mar 16 12:29:09.806 2017] [  128] DEBUG: CheckRotate invoked
[Thu Mar 16 12:29:09.806 2017] [  128] DEBUG: /opt/sharetribe/db/sphinx/development/custom_field_value_core.new.sph is not readable. Skipping
[Thu Mar 16 12:29:09.806 2017] [  128] DEBUG: /opt/sharetribe/db/sphinx/development/listing_core.new.sph is not readable. Skipping
[Thu Mar 16 12:29:09.806 2017] [  128] WARNING: nothing to rotate after SIGHUP ( in queue=0 )
[Thu Mar 16 12:29:10.541 2017] [  128] DEBUG: ReadLock 0x1522ef8
[Thu Mar 16 12:29:10.541 2017] [  128] DEBUG: Unlock 0x1522ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
... /* many more ReadLock and Unlock */

UPDATE 3: I'm defining a real time index on listings of products with attributes such as title, description, author name etc.

ThinkingSphinx::Index.define :listing, :with => :real_time do
  indexes title
  indexes description
  indexes custom_field_values_sphinx
  indexes origin_loc.google_address
  indexes author.given_name
  indexes author.username
  indexes location.province
...

This the underlying model

class Listing < ActiveRecord::Base 

  after_save ThinkingSphinx::RealTime.callback_for(:listing)
...

The Listing.search method is called in a public method of the model

  Listing.search(
      escaped_query,
      :select => "*, #{SPHINX_WEIGHT_FUNCTION} as w",
      :sql => {:include => params[:include]},
      :star => true,
      :with => with,
      :with_all => with_all,
      :order => params[:sort],
      :per_page => per_page,
      :page => page
  )

[1] http://freelancing-gods.com/thinking-sphinx/advanced_config.html#word-stemming--morphology

forste
  • 1,103
  • 3
  • 14
  • 33

1 Answers1

0

From what I can see, you've got everything configured correctly.

You may want to run rake ts:regenerate to ensure Sphinx has the new configuration loaded correctly (ts:generate is for updating data, but doesn't update configuration).

If that doesn't change anything (and the morphology setting is appearing in the generated configuration file), then the problem may not be with TS, but with Sphinx itself? I wonder if the Sphinx logs have any clues.

pat
  • 16,116
  • 5
  • 40
  • 46
  • Thanks @pat. I can't do `rake ts:regenerate` since I'm running _searchd_ in a separate docker-container, i.e. my sphinx container. Instead I stop the sphinx container, enter a worker container, run `rake ts:clear_rt` and `rake ts:configure`, then restart the sphinx container which also restarts _searchd_, enter the sphinx container and then finall run `rake ts:regenerate` – forste Mar 16 '17 at 12:16
  • last command is `rake ts:generate` and NOT `rake ts:regenerate` – forste Mar 16 '17 at 12:29
  • I added the content of log/development.searchd.log – forste Mar 16 '17 at 12:35
  • Hrm, sounds a bit complex, but Docker's a tool I've really not grokked. Granted, if you're sure that the Sphinx daemon has restarted with the new configuration and the new data has been populated, then it sounds like more of a Sphinx issue. Can you talk through a specific example of what's being indexed and what you're searching for (i.e. is "cameras" how you're testing? Or something else?) – pat Mar 16 '17 at 15:25
  • yes, sphinx daemon has restarted with the new configuration – forste Mar 16 '17 at 19:59
  • regarding specific example, I'm indexing product listings through a real time index. products have various attributes such as title, description author name, etc. for my concrete example, I have 3 documents that have "camera" in their title. When I search for "camera" using _Listing.search_, all 3 documents are returned. When I search for "cameras", none of them is returned, i.e. the result is empty – forste Mar 16 '17 at 20:02
  • I've updated my question with some code from the index and underlying model – forste Mar 16 '17 at 20:08
  • So, I've just tested this locally with both stem_en and libstemmer_es (and compared it to no morphology setting), and with both en and es, searches for 'cameras' return both 'camera' and 'cameras', and so do searches for 'camera'. I followed your compiling instructions (I've recently had to reformat my machine, hence the fresh start), and I'm using real-time indices. In short: what you're doing _should_ work. – pat Mar 17 '17 at 14:32
  • Also: created a demo app to confirm the behaviour I'm seeing. The README outlines usage: https://github.com/pat/ts-realtime-morphology – pat Mar 17 '17 at 15:14
  • Thanks @pat, I feel it might have to do with docker. I will investigate and let you know – forste Mar 20 '17 at 09:39
  • Seems like the issue was enabling search with "star": Listing.search(query) works while Listing.search(query, :star => true) doesn't. I will do some more tests and confirm later – forste Mar 20 '17 at 21:03
  • Ah, you're right. I've just added wildcard tests to the demo app, and plural-with-wildcards only returns the plural match, not the singular. It sounds like using morphology and wildcards together isn't something Sphinx supports quite correctly: see http://sphinxsearch.com/forum/view.html?id=10882#46907 and http://stackoverflow.com/a/28763638/54500 – pat Mar 21 '17 at 08:34