10

I am just starting with Sphinx. So far I got it installed successfully, got a table called profiles on my MySQL database indexed and am able to get the correct results back using the PHP API. I am using CodeIgniter so I wrapped the default PHP API as a CodeIgniter library.

Anyway this is how my code looks like:

$query = $_GET['q'];
$this->load->library('sphinxclient');
$this->sphinxclient->setMatchMode(SPH_MATCH_ANY);
$result = $this->sphinxclient->query($query);

$to_fetch = array();
foreach($result['matches'] as $key => $match) {
  array_push($to_fetch, $key);
}

The array $to_fetch contains the ids of the matched table rows. Now I can use a typical MySQL query to get all the relevant users to display on the search page like so:

$query = 'SELECT * FROM profiles WHERE id IN('. join(',', $to_fetch) . ')';

My question are:

  1. is this the right way to go about it? or is there a default "Sphinx way of doing it" that would be better for performance .

  2. secondly, all I get back at the moment is the id of the matched table rows. I also want the part of the text in the column that matched. For example if a someone searches for the keyword dog and a user on the profiles table had in their about column the following text:

    I like dogs. I also like ice cream.

I would like Sphinx to return:

I like <strong>dogs</strong>. I also like ice cream.

How can I do that? I tried to play around with the buildExcerpts() function but can't get it to work.

EDIT

This is how I am getting excerpts now:

// get matched user ids
$to_fetch = array();
foreach($result['matches'] as $key => $match) {
  array_push($to_fetch, $key);
}

// get user details of matched ids
$members = $this->search_m->get_users_by_id($to_fetch);

// build excerpts
$excerpts = array();
foreach($members as $member) {

    $fields = array(
        $member['about'],
        $member['likes'],
        $member['dislikes'],
        $member['occupation']
    );

    $options = array(
        'before_match'      => '<strong class="match">',
        'after_match'       => '</strong>',
        'chunk_separator'   => ' ... ',
        'limit'             => 60,
        'around'            => 3,
    );

    $excerpt_result = $this->sphinxclient->BuildExcerpts($fields, 'profiles', $query, $options);
    $excerpts[$member['user_id']] = $excerpt_result;
}

$excerpts_to_return = array();
foreach($excerpts as $key => $excerpt) {
    foreach($excerpt as $v) {
        if(strpos($v, '<strong class="match">') !== false) {
            $excerpts_to_return[$key] = $v;
        }
    }
}

As you can see I am searching each query across 4 different mysql columns:

about
likes
dislikes
occupation

Because of this I don't know which of the 4 columns contains the matched keyword. It could be any of them or even more than one. So I have no choice but to run the contents of all 4 columns through the BuildExcerpts() function.

Even then I don't know which one the BuildExcerpts() returned with the <strong class="match"> tags. So I run a stpos check on all values returned by BuildExcerpts() to finally get the proper excerpt and map it to the user whose profile it belongs to.

Do you see a better way than this given my situation where I need to match against the contents of 4 different columns?

1 Answers1

6

Yes that looks good way. One thing to remember the rows coming back from Mysql probably won't be in the order from sphinx.

See the FAQ on sphinx site for how to use FIELD() but personally I like to put the rows from sphinx into associative array, then just loop though the sphinx I'd list and get the row from the array. Avoids a sorting phase altogether at the expense of memory!

As for highlighting, yes do persevere with buildExcerpts - that's is the way to do it.


edit to add, this demo http://nearby.org.uk/sphinx/search-example5-withcomments.phps demonstrates both getting rows from mysql and "sorting" in the app. And buildExcerpts.

barryhunter
  • 20,886
  • 3
  • 30
  • 43
  • Regarding the ORDER, why not use `FIND_IN_SET` function? Like this: `SELECT * FROM profiles WHERE id IN (7,10,5,3,8) ORDER BY FIND_IN_SET(id, '7,10,5,3,8')` – J. Bruni Jun 18 '12 at 14:19
  • 1
    You could use find_in_set, behaves virtually the same as Field(). Dont know if the string parsing makes it slower or not. But in the more general sense, I found it best to avoid the sorting - particully in mysql - which will probbaly end up doing a filesort. Basically on our old server could sustain about 50 queries a second with mysql doing the sorting. Moving the sorting to the application (and in fact avoiding a sort at all) gave us closer to 120 queries a second. (testing using ab against the front end application) – barryhunter Jun 18 '12 at 16:40
  • @barryhunter Thanks a lot for your response Barry. I updated my question with the way I am currently getting excerpts. Please let me know if this is a good way to go about it. It just feels really dirty to do it this way since I have to feed all 4 columns to the `BuildExcerpts()` function and then on top of that run a `strpos` check on all values returned by that function. And do all this inside a loop for each and every single returned match. –  Jun 19 '12 at 00:13
  • 1
    Thats generally, ok, but could add "allow_empty" to the options. That way easy to check which match, as non-matching will be empty. But if you just displaying the excerpt anyway, it doesnt matter which it came from, so can just forward one string to the function. Also can just use one BuildExcerpts call, rather than one per result. Less API roundtrips. – barryhunter Jun 19 '12 at 14:54
  • Nice trick to avoid using field() ... hadn't considered that. – gabe. Oct 21 '12 at 18:49
  • I love Barry!! Over the past two to three days, he's helped me on a few Sphinx issues. Top man. – TheCarver Aug 12 '14 at 15:24