Your problem with scaling the solution is because you repeatedly scan the input for each query, generating sub-arrays and looking for minimum values directly. This is inefficient when you have a lot of queries to process. For example, if any sub-string contains "A", than a string which contains that sub-string also contains "A", but your solution throws away that prior knowledge and re-calculates. The end result is your solution not only scales by the size of input string, but you multiply that by the number of queries. When s
is long and [p,q]
also, this leads to poor performance.
You can improve the scaling of your code by pre-processing s
into an indexed structure that is designed to answer the query most efficiently. Discovering the right structure to use is a significant part of the challenge in the coding question. Getting purely "correct output" code is only half way there, so the score metric of 62/100 seems valid.
Here is an index structure that can efficiently find the minimum character in a given index range from a fixed string.
Start by analysing the string into a two-part index
s = "AGTCTTCGATGAAGCACATG"
len = s.length
# Index to answer "what count of each character type comes next in s"
# E.g. next_char_instance["A"][7] returns the instance number of "A" that is
# at or after position 7 ( == 1 )
next_char_instance = Hash[ "A" => Array.new(len), "C" => Array.new(len),
"G" => Array.new(len), "T" => Array.new(len) ]
# Index to answer "where does count value n of this character appear in s"
# E.g. pos_of_char_instance["A"][1] returns the index position of
# the second "A" ( == 8 )
pos_of_char_instance = Hash[ "A" => Array.new, "C" => Array.new,
"G" => Array.new, "T" => Array.new ]
# Basic building block during iteration
next_instance_ids = Hash[ "A" => 0, "C" => 0, "G" => 0, "T" => 0 ]
# Build the two indexes - O( N )
(0...len).each do |i|
next_instance_ids.each do | letter, next_instance_id |
next_char_instance[letter][i] = next_instance_id
end
this_letter = s[i]
pos_of_char_instance[ this_letter ] << i
next_instance_ids[ this_letter ] += 1
end
So that's O( N )
because you have iterated the string once, all the other effects are (effectively) constant; ok, creating the arrays is also O( N )
, but probably 10 times faster, and if you find yourself thinking O( 1.4 * N )
, then no panic, your throw away the constant 1.4 when considering purely scaling issues.
Now you have this index, it is possible to ask in turn "Where is the next A (or C or G) at or after this position" really efficiently, and you can use that to quickly find the minimal character inside a particular range. In fact as it will be fixed-cost lookups and a few comparisons, it will be O( 1 )
for each query, and therefore O( M )
overall:
# Example queries
p = [ 0, 3, 2, 7 ]
q = [ 6, 4, 2, 9 ]
# Search for each query - O( M )
p.zip(q).map do | a, b |
# We need to find lowest character possible that fits into the range
"ACGT".chars.find do | letter |
next_instance_id = next_char_instance[ letter ][ a ]
pos_next_instance = pos_of_char_instance[ letter ][ next_instance_id ]
true if pos_next_instance && pos_next_instance <= b
end
end
# => ["A", "C", "T", "A"] is output for example data
I've left this mapped to the letters, hopefully you can see that output 1,2,3,4 is trivial addition to this. In fact the numbering, and use of genome-style letters, are red herrings in the puzzle, they have no real impact on the solution (other than generating structure for just 4 letters is easier to code as fixed values).
The above solution is one of possibly many. You should note it relies on number of allowed letters to be fixed, and would not work as an answer to finding a minimum value in a range of where the individual entries were integers or floats.