11

I am making a synonyms list that I will store it in database and retrieve it before doing full text search.

When users enters like: word1

I need to lookup for this word in my synonyms table. So if the word is found, I would SELECT all the synonyms of this word and use it in the fulltext search on the next query where I contruct the query like

MATCH (columnname) AGAINST ((word1a word1b word1c) IN BOOLEAN MODE)

So how do I store the synonyms in a table? I found 2 choices:

  1. using key and word columns like

    val  keyword
    -------------
    1    word1a
    1    word1b
    1    word1c
    2    word2a
    2    word2b
    3    word3a
    etc.
    

So then I can find exact match of the entered word in one query and find it's ID. In the next select I get all the words with that ID and somehow concate them using a recordset loop in server side langauge. I can then construct the real search on the main table that I need to look for the words.

  1. using only word columns like

    word1a|word1b|word1c
    word2a|word2b|word2c
    word3a
    

Now I so the SELECT for my word if it is inside any record, if it is, extract all the record and explode it at | and I have my words again that I can use.

This second approach lookes easier to maintain for the one who would make this database of synonyms, but I see 2 problems:

a) How do I find in mysql if a word is inside the string? I can not LIKE 'word1a' it because synonims can be very alike in a way word1a could be strowberry and strowberries could be birds and word 2a could be berry. Obviously I need exact match, so how could a LIKE statement exact match inside a string?

b) I see a speed problem, using LIKE would I guess take more mysql take than "=" using the first approach where I exact match a word. On the other hand in the first option I need 2 statements, one to get the ID of the word and second to get all the words with this ID.

How would you solve this problem, more of a dilemma which approach to take? Is there a third way I don't see that is easy for admin to add/edit synonyms and in the same time fast and optimal? Ok I know there is no best way usually ;-)

UPDATE: The solution to use two tables one for master word and second for the synonym words will not work in my case. Because I don't have a MASTER word that user types in search field. He can type any of the synonyms in the field, so I am still wondering how to set this tables as I don't have master words that I would have ID's in one table and synonims with ID of the master in second table. There is no master word.

Jerry2
  • 2,955
  • 5
  • 30
  • 39
  • Thanx for fixing my text and made it clearer, I'll try to do the same next time! – Jerry2 Oct 20 '10 at 04:18
  • No, 1 table is problematic, since word B can be in more grups and if WORD is unique, that cannot be done with this approach. –  May 23 '12 at 13:14

3 Answers3

16

Don't use a (one) string to store different entries.

In other words: Build a word table (word_ID,word) and a synonym table (word_ID,synonym_ID) then add the word to the word table and one entry per synonym to the synonyms table.

UPDATE (added 3rd synonym)

Your word table must contain every word (ALL), your synonym table only holds pointers to synonyms (not a single word!) ..

If you had three words: A, B and C, that are synonyms, your DB would be

WORD_TABLE            SYNONYM_TABLE
ID | WORD             W_ID | S_ID
---+-----             -----+-------
1  | A                  1  |  2
2  | B                  2  |  1
3  | C                  1  |  3
                        3  |  1
                        2  |  3
                        3  |  2  

Don't be afraid of the many entries in the SYNONYM_TABLE, they will be managed by the computer and are needed to reflect the existing relations between the words.



2nd approach

You might also be tempted (I don't think you should!) to go with one table that has separate fields for word and a list of synonyms (or IDs) (word_id,word,synonym_list). Beware that that is contrary to the way a relational DB works (one field, one fact).

lexu
  • 8,766
  • 5
  • 45
  • 63
  • You mean 2 tables: word table with WordID and word columns and a synonims table with WordID and synonim column? And I can search a word table and join it with synonyms table to get the results in a recordset that I loop then and put them together somehow? – Jerry2 Oct 20 '10 at 04:12
  • You are correct ... I was trying to add that to my answer while you commented. – lexu Oct 20 '10 at 04:13
  • SELECT synonim FROM wordtable w LEFT JOIN synonymstable sy ON sy.WordID=w.WordId where word = 'mysearchword'? Is there a way to get results back in ONE row not number of synonims rows? To concat the rows somehow in the same SELECT? – Jerry2 Oct 20 '10 at 04:14
  • I think you made a very good point yes about the tables. I can then select only once. If there would only be a way to somehow CONCAT the rows into one row to get the result as word1a word1b word1c somehow so it is easier to enter into a main search query (no RS loop would be needed then) – Jerry2 Oct 20 '10 at 04:16
  • I think I found answer to my own question: It may be group_concat is what I am looking for ;-) – Jerry2 Oct 20 '10 at 04:28
  • Seems my problem is still not solved, please take a look at my update post. – Jerry2 Oct 20 '10 at 09:44
  • @Jerry2: please see my explanation/example how the two table approach was intended – lexu Oct 20 '10 at 11:22
  • Thanx, this works well if you have 2 synonims, as you need 2 entries in synonims table. But if you have like 10 synonims you would have to store... all the combinations in the second table? That would make hundreds... or I don't understand properly. Can you please make example with 3 synonims of a word? – Jerry2 Oct 20 '10 at 11:37
  • @Jerry2: I've added a 3rd synonym... and yes, there would/could be hundreds of entries, but since they actually reflect the relationships between the words, and you want to be able to search for each word, this can't be helped. "Space saving" approaches, like the '2nd approach' I mentined, aren't any better (space, performance, complexity), if you look at them closely. – lexu Oct 21 '10 at 06:30
  • It's only hard to manage this database if I add or remove any synonyms, but I guess this can not be helped. – Jerry2 Oct 22 '10 at 19:53
  • @Jerry2 : look at Hasan Amin Sarand's suggestion below ... very clever – lexu Apr 21 '13 at 16:23
9

I think 3 columns and only one table is better WORD_TABLE

ID | WORD | GroupID
---+----------------
1  |  A   |   1
2  |  B   |   1
3  |  C   |   1
Vincent Ramdhanie
  • 102,349
  • 23
  • 137
  • 192
  • you make a good suggestion... this would solve the OP's problrm, that he added in a comment, regarding managing the table .. – lexu Apr 21 '13 at 16:22
6

Another approach is to store meaning (this does not use master words, but a meaning table that groups instead)

would be to store the words in a words table without synonyms and with only text, like this:

Many words, one meaning

meaning_table

meaning_id
---
1
2
3

And store the words in another table, for example if A, B and C were all synonyms of 1 meaning

word_table

word_id | meaning_id | word
--------+------------+------
1       |  1         |   A
2       |  1         |   B
3       |  1         |   C

Even though it looks a lot like what Hasan Amin Sarand suggests, it has the key difference that you don't select from the WORD_TABLE but instead select from the MEANING_TABLE, this is much better and I learned that the hard way.

This way you store the meaning in one table and as many words for that meaning as you like in another.

Although it assumes that you have 1 meaning per word.

Many words, many meanings

if you want to store words with multiple meanings then you need another table for the many to many relationship and the whole thing becomes:

meaning_table
-------------
meaning_id
-------------
1
2
3

word_meaning_table
--------------------
word_id | meaning_id
--------+-----------
1       |  1         
2       |  1        
3       |  1         

word_table
--------------
word_id | word
--------+-----
1       |   A
2       |   B
3       |   C

Now you can have as many words with as many meanings as you want, where any word can mean anything you want and any meaning can have many words.

If you want to select a word and it's synonyms then you do

SELECT 
meaning_id,word_id,word
FROM meaning_table 
INNER JOIN word_meaning_table USING (meaning_id)
INNER JOIN word_table USING (meaning_id)
WHERE meaning_id=1

You can also then store meaning that does not have a word yet or that you don't know the word of.

If you don't know what meaning it belongs to then you can just insert a new meaning for every new word and fix the meaning_id in the word_table later.

You can then even store and select the words that are the same but mean different things

SELECT 
meaning_id,word_id,word
FROM meaning_table 
INNER JOIN word_meaning_table USING (meaning_id)
INNER JOIN word_table USING (meaning_id)
WHERE word_id=1
Community
  • 1
  • 1
Timo Huovinen
  • 53,325
  • 33
  • 152
  • 143