0

I'm trying to export a dictionary of words in sqlite made up only of words that start with, contain, or end with specific filters.

If one filter was 'ment' and could be found anywhere in the word; it would include words such as 'moment', 'mentioned' and 'implemented'. If another was 'under' and could only be a prefix; it would match words such as 'underachieve' and 'undercharged' but not 'plunder'.

I've found a few similar questions around - however I haven't been able to get any to work, or they are for full versions of sql and contain functions not in sqlite. Mostly my issue is with the fact that it's not just 'match every substring' - there's prefixes, suffixes and phrases(matches anywhere in word)

Already Tried:
* Select rows from a table that contain any word from a long list of words in another table
* Search SQL Server string for values from another table
* SQL select rows where field contains word from another table's fields
* https://social.msdn.microsoft.com/Forums/sqlserver/en-US/b9bb1003-80f2-4e61-ad58-f6856666bf85/how-to-select-rows-that-contain-substrings-from-another-table?forum=transactsql

My database looks like this:
dictionary_full

------------------
word
------------------
abacuses
abalone
afterthought
auctioneer
before
biologist
crafter
...
------------------

filters

------------------
name    | type_id
------------------
after   | 1
super   | 1
tion    | 2
ses     | 3
logist  | 3
...

type

------------------
name
------------------
prefix
phrase
suffix

I can select all phrases from the db by using this query:

SELECT name FROM filters WHERE type_id = (SELECT ROWID FROM type WHERE name='phrase');

however I haven't been able to work that successfully into the solutions I've found. It will either return no results, or duplicate results.

e.g.
Duplicates:

SELECT d.word FROM dictionary_full d
JOIN filters f ON instr(d.word, (
SELECT name FROM filters WHERE type_id = (SELECT ROWID FROM type WHERE name='phrase')
)) > 0

 

Expected Results:
A comination of all words that:
- start with the prefixes 'after' / 'super'
- OR contain anywhere the phrase 'tion'
- OR end with the suffix 'ses' / 'logist'

------------------
word
------------------
abacuses
afterthought
auctioneer
biologist
Scott
  • 279
  • 3
  • 14
  • 1
    Look into using full text search. – Tim Biegeleisen Apr 30 '19 at 13:02
  • Post your expected results because it's not obvious what you need to do. – forpas Apr 30 '19 at 13:24
  • @TimBiegeleisen Sqlite FTS is by default for finding words in longer sentences, not for matching substrings in single words. You'd have to write your own custom tokenizer, which I suspect is way out of OP's skill level. Plus there's no suffix search. – Shawn Apr 30 '19 at 13:58

1 Answers1

1

Sounds like you want LIKE.

After creating some sample data (skipping mapping filter type names to integers for the sake of brevity and clarity):

CREATE TABLE words(word TEXT PRIMARY KEY) WITHOUT ROWID;
INSERT INTO words(word) VALUES ('abacuses'), ('abalone'), ('afterthought'),
  ('auctioneer'), ('before'), ('biologist'), ('crafter');
CREATE TABLE filters(name TEXT, type TEXT, PRIMARY KEY(name, type)) WITHOUT ROWID;
INSERT INTO filters(name, type) VALUES ('after', 'prefix'), ('super', 'prefix'),
  ('tion', 'phrase'), ('ses', 'suffix'), ('logist', 'suffix');

This query

SELECT *
FROM words AS w
JOIN filters AS f ON (CASE f.type
                      WHEN 'prefix' THEN w.word LIKE f.name || '%'
                      WHEN 'suffix' THEN w.word LIKE '%' || f.name
                      WHEN 'phrase' THEN w.word LIKE '%' || f.name || '%'
                      END)
GROUP BY w.word -- eliminate duplicate matches
ORDER BY w.word;

results in

word          name        type      
------------  ----------  ----------
abacuses      ses         suffix    
afterthought  after       prefix    
auctioneer    tion        phrase    
biologist     logist      suffix
Shawn
  • 47,241
  • 3
  • 26
  • 60
  • Oh man thank you so much!!! I've been stuck on this for hours! This is my first time using sqlite and I didn't even realise that || was the string concat operator - I've been trying to use + the whole time – Scott Apr 30 '19 at 14:01