2

I'm trying to show "related articles" block on my website. For this purpose I use the following query:

SELECT *, MATCH(title, content) AGAINST('search string') AS score 
FROM news_items 
WHERE MATCH(title, content) AGAINST('search string') 
ORDER BY score DESC LIMIT 4

However, it often shows me very old articles while I want to show the latest ones so I fixed my query as following:

SELECT *, MATCH(title, content) AGAINST('search string') AS score 
FROM news_items 
WHERE MATCH(title, content) AGAINST('search string') > 4 
ORDER BY ctime DESC, score DESC LIMIT 4

But in this case I can't get most relevant articles :(

Any ideas on how to find MOST RELEVANT AND FRESH article at the same time?

Thanks in advance!

Gregory
  • 51
  • 1
  • 9

1 Answers1

4

You could change to an aggregate score... something like this:


SELECT *, 
  (
    MATCH(title, content) AGAINST('search string')
    - 
    (ABS(DATEDIFF(`timestampfield`, NOW())) / 365)
  ) AS score 
FROM news_items 
WHERE 
  MATCH(title, content) AGAINST('search string') > 4 
ORDER BY score DESC LIMIT 4

In that there's one kinda funky addition, which you'd want to clean up:

- (ABS(DATEDIFF(`timestampfield`, NOW())) / 365)

This is your age component of the score... currently scaled by <year> = 1 point

To get that, we start by getting the number of days between the timestamp field and now (absolute value):

ABS(DATEDIFF(`timestampfield`, NOW()))

Then we scale...

I decided you probably didn't want to loose score based on number of days, because if something was 30 days old it would be -30... seems too harsh. So I chose years... if you want to scale on number of weeks, divide by 52 instead of 365... and so forth.

This scaling factor will be how you control value between scoring matching and age.

So it ends up being something like: <match score> - <yearsAgo>


If you do that:

  1. 5 (match score) - 0.1 (<1 year ago) = 4.9 (ok match, but newest)
  2. 5 (match score) - 0.01 (<1 year ago) = 4.99
  3. 5 (match score) - 1 (1 year ago) = 4
  4. 6 (match score) - 2 (2 years ago) = 4
  5. 9 (match score) - 5 (5 years ago) = 4 (best match, but old)
  6. 7 (match score) - 10 (10 years ago) = -3

NOTE this assumes your timestamp field is a full date-time field... if otherwise, you will need to re-cast to a date, or the logic to manipulate the unix timestamp directly.

And here's a debugging version of the query:

SELECT
    `created`,
    MATCH(title, content) AGAINST('awesome') as match_score,
    (ABS(DATEDIFF(`created`, NOW())) / 365) as years_ago,
  (
    MATCH(title, content) AGAINST('awesome')
    - 
    (ABS(DATEDIFF(`created`, NOW())) / 365)
  ) AS score 
FROM news_items 
WHERE 
  MATCH(title, content) AGAINST('awesome') > 4 
ORDER BY score DESC LIMIT 4
zeroasterisk
  • 2,199
  • 1
  • 23
  • 28
  • OK, if I delete ">4" then I can get the result. However, this query gives me OLDEST articles while I need newest. Here is the result without your solution -- http://clip2net.com/s/3cQ1F0u, here is using your solution -- http://clip2net.com/s/3cQ22cG As you can see it gives totally different results but what is most important it gives me the oldest articles I have in DB, not newest ones. – Gregory Feb 20 '15 at 13:57
  • The funniest thing is that if MATCH...AGAINST gives me 12 points and ABS(DATEDIFF()) gives me 1 point the result is like 2.5, not 11. O_o – Gregory Feb 20 '15 at 14:26
  • 1
    It looks like your `timestamp` field is not a date, but instead a unix epoch... either cast it as a date, or just manipulate it's value directly. http://puu.sh/g53Ww/a5a1eb4fce.png I've edited my answer with a debugging version of the query... – zeroasterisk Feb 20 '15 at 15:04