I'm trying to figure out if there is a shorter more ruby-like way of finding the most frequent substring of length n?
I wrote the following code:
def most_frequent_kmers(length)
dna = text.each_char.to_a
array_dna_substrings = dna.each_cons(length).to_a
counts = Hash.new 0
array_dna_substrings.each do |elem|
#count[elem] += 1
counts[elem.join] += 1
end
counts = counts.sort_by { |substring, count| count}.reverse
res = []
for i in 0..counts.length-1
res << counts[i] if counts[i][1] >= counts[0][1]
end
res = Hash[res.map {|key, value| [key,value]}]
s = Set.new(res.keys)
p [s,res.values.first]
end
dna1 = DNA.new('ATTGATTCCG')
dna1.most_frequent_kmers(2)
dna1.most_frequent_kmers(2)
dna1.most_frequent_kmers(3)
dna1.most_frequent_kmers(4)
Example output:
>> dna1 = DNA.new('ATTGATTCCG') => ATTGATTCCG
>> dna1.most_frequent_kmers(1) => [#<Set: {"T"}>, 4]
>> dna1.most_frequent_kmers(2) => [#<Set: {"AT", "TT"}>, 2]
>> dna1.most_frequent_kmers(3) => [#<Set: {"ATT"}>, 2]
>> dna1.most_frequent_kmers(4) => [#<Set: {"ATTG", "TTGA", "TGAT", "GATT", "ATTC", "TTCC", "TCCG"}>, 1]
The code above works flawlessly, but there has to be a shorter more concise, ruby way of searching for substrings within a string a set length.
I do believe it can be done with a set using the divide method but I haven't been able to figure it out.
Any help would be great!
Cheers