0

I believe the answer is no, but perhaps someone has a brilliant workaround: Can (you make) SQL Group By word variations?

Variation examples Run: Run, Ran, Running, Runs or Goose: Goose, Geese

I know there are a number of reasons why not to do this type of thing, I'm sure some will be shared here, but I have a little pet project that this would help. Short of creating my own personal dictionary table, which is not happening, or having an absurd amount of replace or case statements, does anyone have a practical way of doing this? I am currently working in SQL Server, but any SQL language would be a welcome answer.

S Ayo
  • 13
  • 5
  • You are grouping on the STEM of the word. Look for SQL stemming: https://stackoverflow.com/questions/48144890/stemming-words-in-mysql/48145080 and https://stackoverflow.com/questions/4051572/sql-word-root-matching – Y.L Dec 11 '19 at 19:09
  • What RDBMS are you on? – tim Dec 11 '19 at 19:36
  • Wow @Y.L, thank you for that ... didn't realize what it was called in order to figure out how to deal with it. Stemming it is, ran it and got the results I wanted. I am not 100% how to do this more on the fly, but that's probably just because i'm too tired to think anymore. Your answer works for me, thank you kindly. – S Ayo Dec 11 '19 at 21:14

2 Answers2

0

You are grouping on the STEM of the word. Look for SQL stemming: Stemming Words in MySQL and SQL word root matching

Or you can use SQL soundex function to find if two words are similar:

SELECT SOUNDEX(word), COUNT(*) FROM tbl GROUP BY SOUNDEX(word)
Y.L
  • 694
  • 12
  • 26
  • Good thought, but given my examples, it would not work. You will get 4 results (instead of 2). – S Ayo Dec 11 '19 at 20:05
0

@Y.L deserves the credit on this one, but since the answer is in a comment I'm not sure how to mark it as the correct answer? Anyway, here's the basics of the code now:

drop table if exists dbo.attempt1
create table dbo.attempt1 ( tempid int IDENTITY(1,1) PRIMARY KEY, word varchar(50) )

CREATE UNIQUE INDEX cx_uid ON dbo.attempt1(tempid);
CREATE FULLTEXT INDEX ON dbo.attempt1(word) 
 KEY INDEX cx_uid
 WITH STOPLIST = SYSTEM;

insert into dbo.attempt1 (word)
     values ('Run'), ('Runs'), ('Running'), ('Ran'), ('Goose'), ('Geese')

and finally

SELECT * FROM dbo.attempt1 WHERE contains(word,'FORMSOF(INFLECTIONAL, goose)')

or

SELECT * FROM dbo.attempt1 WHERE freetext(word,'run')
S Ayo
  • 13
  • 5