0

I've got three tables in my DB

organization
| id | name | attribute_1 | attribute_2 |

document
| id | org_id | content |

person
| id | org_id | name | status |

Relation between organization and document is 1:1. Organization may have only one document.

Relation between organization and person is 1:n. Organization may have many persons.

Table document has about 12 000 000 records and takes more than 500 Gb on disk. Records in the table never change.

Table organization has about 30 000 000 records but takes only 8 Gb on disk. Records in the table are changing quite often, so I need do full re-indexing.

Here are my indices:

source organization {
        sql_query = select * from organization
        sql_attr_string attribute_1
        sql_attr_string attribute_2
}

source document {
        sql_query = select * from document
        sql_attr_uint org_id
}

source person {
        sql_query = select * from person
        sql_attr_uint org_id
        sql_attr_uint status
}

The typical request looks like this:

Find the first 100 organization containing "Acme" in name, "bankrupt" in document and has person with name "John".

How can I do this?

P.S. I use Sphinx API, but I can switch to SphinxQL

1 Answers1

0

Well Sphinx itself can't do JOINs between indexes. Its only multi-index capability, is to do UNION style queries.

... so to do this need to build a 'prejoined' index. A single index combineing data from all three indexes. So can then search in any combination.

sql_query is just an arbitrary database query, and most database engines support joins, so can do the JOIN there. As the query is probably going to be expensive, can break it down with Ranged Queries. http://sphinxsearch.com/docs/current.html#ex-ranged-queries

Sphinx does have GROUP BY, so can group by either organization or document, if just want the one row for either. (as the index may contain multiple rows, becuase of multiple people

barryhunter
  • 20,886
  • 3
  • 30
  • 43