1

How do you get a list of all unique words and their frequencies ("tally") in one text column of one table of an SQL database? An answer for any SQL dialect in which this is possible would be appreciated.

In case it helps, here's a one-liner that does it in Ruby using Sequel:

Hash[DB[:table].all.map{|r| r[:text]}.join("\n").gsub(/[\(\),]+/,'').downcase.strip.split(/\s+/).tally.sort_by{|_k,v| v}.reverse]

To give an example, for table table with 4 rows with each holding one line of Dr. Dre's Still D.R.E.'s refrain in a text field, the output would be:

{"the"=>4,
 "still"=>3,
 "them"=>3,
 "for"=>2,
 "d-r-e"=>1,
 "it's"=>1,
 "streets"=>1,
 "love"=>1,
 "got"=>1,
 "i"=>1,
 "and"=>1,
 "beat"=>1,
 "perfect"=>1,
 "to"=>1,
 "time"=>1,
 "my"=>1,
 "taking"=>1,
 "girl"=>1,
 "low-lows"=>1,
 "in"=>1,
 "corners"=>1,
 "hitting"=>1,
 "world"=>1,
 "across"=>1,
 "all"=>1,
 "gangstas"=>1,
 "representing"=>1,
 "i'm"=>1}

This works, of course, but is naturally not as fast or elegant as it would be to do it in pure SQL - which I have no clue if that's even in the realm of possibilites...

Sixtyfive
  • 1,150
  • 8
  • 19

2 Answers2

2

I guess it would depend on how the SQL database would look like. You would have to first turn your 4 row "database" into a data of single column, each row representing one word. To do that you could use something like String_split, where every space would be a delimiter.

STRING_SPLIT('I'm representing for them gangstas all across the world', ' ')

https://www.sqlservertutorial.net/sql-server-string-functions/sql-server-string_split-function/ This would turn it into a table where every word is a row.

Once you've set up your data table, then it's easy.

Your_table:

[Word]
I'm
representing
for 
them
...
world

Then you can just write:

SELECT Word, count(*) 
FROM your_table
GROUP BY Word;

Your output would be:

Word   |    Count
I'm          1
representing 1
  • thanks for that one, @RocketJumpTrist. so it *is* possible. i had kept the question open to all SQL dialects on purpose but was most-interested in SQLite, which is missing `string_split()` or something similar to that. for what it's worth, if anyone wants to actually do this using SQLite, take a look at, e.g. https://stackoverflow.com/a/32051164/5354137 to go with this answer. – Sixtyfive May 12 '22 at 14:18
0

I had a play using XML in sql server. Just an idea :)

with cteXML as
(
select * 
,cast('<wd>' + replace(songline,' ' ,'</wd><wd>') + '</wd>' as xml) as XMLsongline
from @tSongs
),cteBase as
(
select p.value('.','nvarchar(max)') as singleword
from cteXML as x
cross apply x.XMLsongline.nodes('/wd') t(p)
)
select b.singleword,count(b.singleword)
from cteBase as b
group by b.singleword
Nathan_Sav
  • 8,466
  • 2
  • 13
  • 20