3

I'm trying to figure out if there is a shorter more ruby-like way of finding the most frequent substring of length n?

I wrote the following code:

def most_frequent_kmers(length)
      dna = text.each_char.to_a
      array_dna_substrings = dna.each_cons(length).to_a

      counts = Hash.new 0
      array_dna_substrings.each do |elem|
        #count[elem] += 1
        counts[elem.join] += 1
      end

      counts = counts.sort_by { |substring, count| count}.reverse
      res = []

      for i in 0..counts.length-1
        res << counts[i] if counts[i][1] >= counts[0][1]
      end

      res = Hash[res.map {|key, value| [key,value]}]
      s = Set.new(res.keys)
      p [s,res.values.first]
end

dna1 = DNA.new('ATTGATTCCG')
dna1.most_frequent_kmers(2)
dna1.most_frequent_kmers(2)
dna1.most_frequent_kmers(3)
dna1.most_frequent_kmers(4)

Example output:

>> dna1 = DNA.new('ATTGATTCCG') => ATTGATTCCG 
>> dna1.most_frequent_kmers(1) => [#<Set: {"T"}>, 4] 
>> dna1.most_frequent_kmers(2) => [#<Set: {"AT", "TT"}>, 2] 
>> dna1.most_frequent_kmers(3) => [#<Set: {"ATT"}>, 2] 
>> dna1.most_frequent_kmers(4) => [#<Set: {"ATTG", "TTGA", "TGAT", "GATT", "ATTC", "TTCC", "TCCG"}>, 1]

The code above works flawlessly, but there has to be a shorter more concise, ruby way of searching for substrings within a string a set length.

I do believe it can be done with a set using the divide method but I haven't been able to figure it out.

Any help would be great!

Cheers

Al V
  • 1,227
  • 2
  • 11
  • 15

0 Answers0