3

I am currently using Sphinx for indexing a MySQL query with 20+ million records.

I am using a delta index to update the main index and add all new records.

Unfortunately allot of changes to the tables are deleted.

I understand that I can use sql_query_killlist to get all document ID's that need to be deleted or updated. Unfortunately I don't understand how this actually works and the documentation from Sphinx does not have a good enough example for me to understand.

If I use the following example, how could I implement the killlist?

in MySQL

CREATE TABLE sph_counter
(
    counter_id INTEGER PRIMARY KEY NOT NULL,
    max_doc_id INTEGER NOT NULL
);

in sphinx.conf

source main
{
    # ...
    sql_query_pre = SET NAMES utf8
    sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(id) FROM documents
    sql_query = SELECT id, title, body FROM documents \
        WHERE id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}

source delta : main
{
    sql_query_pre = SET NAMES utf8
    sql_query = SELECT id, title, body FROM documents \
        WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}

index main

{
    source = main
    path = /path/to/main
    # ... all the other settings
}

note how all other settings are copied from main, but source and path are overridden (they MUST be) index delta : main

{
    source = delta
    path = /path/to/delta
}
gt-info
  • 139
  • 2
  • 12
  • Do you have columns to capture updated and deleted on your documents table? Without it the killlist will be difficult to implement. – Imraan Aug 11 '12 at 04:04

1 Answers1

6

The specifics depend a lot on how you mark deleted documents. But would just add something like

 sql_query_killist = SELECT id FROM documents 
                     WHERE status='deleted' 
                           AND id<=( SELECT max_doc_id FROM sph_counter 
                                     WHERE counter_id=1 )

to the delta index. That would capture ids of deleted record that are in the main index, and add them to the killlist so they would never appear in search results.

If want to capture updated records, need to arrange for the new rows to be included in the main sql_query of the delta, AND their ids to be in the kill-list.

Alex
  • 16,739
  • 1
  • 28
  • 51
barryhunter
  • 20,886
  • 3
  • 30
  • 43
  • Hi Barry. How can I select something if it is deleted out of my table? – gt-info Aug 09 '12 at 22:05
  • Well if you really do 'delete' rather than just changing some sort of status flag. Then you will need another way to get a list of deleted documented. When you delete a document in the application, could insert the id into a new table. And use that? – barryhunter Aug 10 '12 at 12:41
  • It is clear to me now Barry, thanks. I am going to add a table which consists of all deleted ID's from the main table. I can do a select * from table, for my kill-list. Does the row actually get deleted from the index? Or only ignored? – gt-info Aug 12 '12 at 08:27
  • 1
    The kill list just kills it from the result set, its still in the actual index. searchd doesnt in general modify indexes once they have been created by indexer. (UpdateAttributes is a basically the only exception) – barryhunter Aug 13 '12 at 11:08
  • Worth to mention that " Kill-list for a given index suppresses results from other indexes, depending on index order in the query" [source: http://sphinxsearch.com/docs/current.html#conf-sql-query-killlist] – Alex Jun 26 '15 at 15:46