1

I have a sentence like the cat sat on the mat stored as a single sql field. I want to periodically search for keywords which are not not in a stop list, in this case cat sat mat What's the best way to store them in an SQL table for quick searching?

As far as I can see it I see the following options

  1. Up to [n] additional columns per row, one for each word.
  2. Store all of the interesting words in a single, comma separated field.
  3. A new table, linked to the first with either of the above options.
  4. Do nothing and search for a match each time I have a new word to search on.

Which is best practice and which is fastest for searching for word matches? I'm using sqlite in python if that makes a difference.

Toby
  • 131
  • 1
  • 5

3 Answers3

1

I would suggest giving your sentences a key, likely IDENTITY. I would then create a second table linking to your sentence table, with a row for each interesting word.

If you'd like to search for say, words starting with ca- if you stored these words in a comma delimited you'd have to wildcard the start and end, whereas if they are each in a separate row you can bypass the beginning wildcard.

Also, assuming you find a match, in a comma separated list you'd have to parse out which word is actually a hit. With the second table you simply return the word itself. Not to mention the fact that storing multiple values in one field a major no-no in a relational database.

Mike M.
  • 12,343
  • 1
  • 24
  • 28
  • Will this scale well? I currently have ~80k sentences to split and it could easily be an order of magnitude more. – Toby Apr 12 '11 at 05:58
  • We'd need to know how the tables are going to be used. I don't see any issues with scaling. Of course your table could have a couple of million rows, but depending on how you're doing your searching this shouldn't be an issue. If you end up searching using '%%' every time, there isn't too much of a point going through this work. – Mike M. Apr 12 '11 at 13:08
1

Your best bet is probably to do full text searching.

These questions FULL-TEXT Search in SQLite and SQLite full text search catalog will hopefully get you going in the right direction.

Community
  • 1
  • 1
Conrad Frix
  • 51,984
  • 12
  • 96
  • 155
  • Looks great but I'm likely to need fuzzy searching. Had no idea this existed though – Toby Apr 12 '11 at 18:17
1

I do something similar with SQLite too. In my experience it's not as fast as other db's in this type of situation so it pays to make your schema as simple as possible.

  • Up to [n] additional columns per row, one for each word.
  • Store all of the interesting words in a single, comma separated field.
  • A new table, linked to the first with either of the above options.
  • Do nothing and search for a match each time I have a new word to search on.

Of your 4 options, 2) and 4) may be too slow if you're looking to scale and matching using LIKE. Matching using full text is faster though, so that's worth looking into. 1) looks to be bad database design, what if there's more words than columns ? And if there's less, it's just wasted space. 3) is best IMO, if you make the words the primary key in their own table the searching speed should be acceptably fast.

Michael Low
  • 24,276
  • 16
  • 82
  • 119