How do you get a list of all unique words and their frequencies ("tally") in one text column of one table of an SQL database? An answer for any SQL dialect in which this is possible would be appreciated.
In case it helps, here's a one-liner that does it in Ruby using Sequel:
Hash[DB[:table].all.map{|r| r[:text]}.join("\n").gsub(/[\(\),]+/,'').downcase.strip.split(/\s+/).tally.sort_by{|_k,v| v}.reverse]
To give an example, for table table
with 4 rows with each holding one line of Dr. Dre's Still D.R.E.'s refrain in a text
field, the output would be:
{"the"=>4,
"still"=>3,
"them"=>3,
"for"=>2,
"d-r-e"=>1,
"it's"=>1,
"streets"=>1,
"love"=>1,
"got"=>1,
"i"=>1,
"and"=>1,
"beat"=>1,
"perfect"=>1,
"to"=>1,
"time"=>1,
"my"=>1,
"taking"=>1,
"girl"=>1,
"low-lows"=>1,
"in"=>1,
"corners"=>1,
"hitting"=>1,
"world"=>1,
"across"=>1,
"all"=>1,
"gangstas"=>1,
"representing"=>1,
"i'm"=>1}
This works, of course, but is naturally not as fast or elegant as it would be to do it in pure SQL - which I have no clue if that's even in the realm of possibilites...