0

I have a problem in making SQL query. I am making a small Search Engine in which the word to page mapping or indexes are kept like this. Sorry I wasn't able to post images here so I tried writing the output like this.

+---------+---------+-----------+--------+

| word_id | page_id | frequency | degree |

+---------+---------+-----------+--------+

|    2331 |      29 |         2 |      1 |

|    2332 |      29 |         7 |      1 |

|    2333 |      29 |         4 |      1 |

|    2334 |      29 |         1 |      1 |

|    2335 |      29 |         1 |      1 |

|    2336 |      29 |         1 |      1 |

|    2337 |      29 |         2 |      1 |

|    2338 |      29 |         7 |      1 |

|    2343 |      29 |         1 |      3 |

|    2344 |      29 |         1 |      3 |

......
......
...... and so on.

Word_id points to Words present in other table and page_id points to URLs present in other table.

Now Suppose I want to search "Rapid 3D Prototyping Services". I brought the union of results corresponding to individual words by query ->

select * from words_detail where word_id=2353 or word_id=2364 or word_id=2709 or word_id=2710;

In above query the word_ids corresponds to the 4 words in the search query and the results are as below.

Union of page_id corresponding to individual words...

mysql>

select * from words_detail where word_id=2353 or word_id=2364 or word_id=2709 or word_id=2710;


+---------+---------+-----------+--------+

| word_id | page_id | frequency | degree |

+---------+---------+-----------+--------+

|    2353 |      29 |         2 |      4 |

|    2353 |      33 |         2 |      2 |

|    2353 |      36 |         5 |      9 |

|    2353 |      40 |         1 |      4 |

|    2353 |      41 |         1 |      9 |

|    2353 |      45 |         4 |      9 |

|    2353 |      47 |         2 |      9 |

|    2353 |      49 |         4 |      9 |

|    2353 |      52 |         1 |      4 |

|    2353 |      53 |         1 |      9 |

|    2353 |      66 |         2 |      9 |

|    2364 |      29 |         1 |      4 |

|    2364 |      34 |         1 |      4 |

|    2364 |      36 |         9 |      2 |

|    2709 |      36 |         1 |      9 |

|    2710 |      36 |         1 |      9 |

+---------+---------+-----------+--------+

16 rows in set (0.00 sec)

But I want the result to be sorted according to maximum match. The earlier result should be where all 4 words match, next result should be with 3 match and so on. In other words earlier results should have those page_id which are common to 4 word_ids, next should be those which are common in 3 words_ids and so on.

I checked here but this is not working in my case because in my case OR conditions are not matched in a single row.

How can such a query can be designed?

Cœur
  • 37,241
  • 25
  • 195
  • 267

3 Answers3

0

Use the occurence of you page_id as your matching count and then order by it.

select * from words_detail A
inner join 
(SELECT PAGE_ID
, COUNT(PAGE_ID) matchCount
from words_detail 
where word_id=2353 or word_id=2364 or word_id=2709 or word_id=2710
group by PAGE_ID) B
on A.PAGE_ID=B.PAGE_ID
where word_id=2353 or word_id=2364 or word_id=2709 or word_id=2710
order by matchCount desc
SoulTrain
  • 1,904
  • 1
  • 12
  • 11
  • Thanks and If in the result of the above query, we want to sort the result again on the basic of 'frequency' or say 'degree' attribute in the table WHERE the 'matchCount' is same then how it can be done ? The sorting should only be done internally where matchCount is equal. – Sanchit Saxena Feb 25 '15 at 07:33
0

Try this

   select p.*
  from words_detail p
 , (select word_id, count(1) as count
  from words_detail where 
   word_id in (2353,2364,2709,2710) group by word_id) t
 where p.word_id = t.word_id
 order by t.count desc;
Vivek Gupta
  • 955
  • 4
  • 7
0

You can do a subquery to get the number of apperances for each page. Then you have to join the subquery with your table and you will be able to order the results by the number of page appearances.

Your final query should look like this:

  SELECT *
    FROM words_detail,
         (
              SELECT page_id, 
                     COUNT(*) AS npages
                FROM words_detail 
               WHERE word_id IN (2353, 2364, 2709, 2710)
            GROUP BY page_id
         ) AS matches

   WHERE words_detail.page_id = matches.page_id
     AND word_id IN (2353, 2364, 2709, 2710)
ORDER BY matches.npages DESC
Jordi Llull
  • 810
  • 6
  • 17
  • If in the result of the above query, we want to sort the result again on the basic of 'frequency' or say 'degree' attribute in the table WHERE the 'npages' is same then how it can be done ? – Sanchit Saxena Feb 25 '15 at 07:29