I really need some advice on what data structure and functions to use to solve a task I'm trying to perform. I'm just not sure of the best approach here.
The problem/task: I have a list of chromosomal start and end positions. I'm trying to figure the best way to push this data into a list of tuples(?) or something similar then bisect these coordinates given a start_end range value.. I have used bisect before, but only for lists containing a single value entries so just not sure what the best way is to approach multi-value comparisons.
For example, if I have the genes below,
gene_name start_pos end_pos
gene_A 100 200
gene_B 300 400
gene_C 500 600
gene_D 700 800
gene_E 900 1000
and I want to query this list with a start and end position that don't match the normal start and end to return the matching gene;
query_start = 550 query_end = 580 > should return gene_C
query_start = 110 query end = 180 > should return gene_A
I have tried to plough my way through and have made some ridiculously ugly complicated code but I know there must be a simple/logical way to do this and I'm struggling to ask the right questions documentation/forum-searching wise.
Any helpful advice would be greatly appreciated.
Thanks