3

Given a list such as:

lst = ['abc123a:01234', 'abcde123a:01234', ['gfh123a:01234', 'abc123a:01234']]

is there a way of quickly returning the index of all the items which start with a user-defined string, such as 'abc'?

Currently I can only return perfect matches using:

print lst.index('abc123a:01234')

or by doing this in a number of steps by finding all the elements that start with 'abc' saving these to a new list and searching the original list for perfect matches against these.

If the only quick way is to use regex how could I still have the flexibility of a user being able to input what the match should be?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
PaulBarr
  • 919
  • 6
  • 19
  • 33
  • 5
    You have a nested list. What is your expected output, exactly? – thefourtheye Apr 23 '14 at 10:47
  • I haven't tried index with a nested list but I was hoping for [0][0], [0][1] [0][2][1], is this wrong? – PaulBarr Apr 23 '14 at 10:50
  • @PaulBarr: it complicates matters somewhat. – Martijn Pieters Apr 23 '14 at 10:50
  • @PaulBarr Simply put, what do you say as the index of `gfh123a:01234`? – thefourtheye Apr 23 '14 at 10:51
  • @PaulBarr: also, `[0]` is not an index, it is Python indexing *syntax*; you cannot use it to retrieve the original value. – Martijn Pieters Apr 23 '14 at 10:52
  • 2
    What is your *use case* here? What problem are you trying to solve? Why is your input list arbitrarily nested? – Martijn Pieters Apr 23 '14 at 10:52
  • So how would I retrieve the original value? Apologies if im asking basic questions! – PaulBarr Apr 23 '14 at 10:53
  • I think you need to zoom out a little here; *why* do you need to find those values. What are you going to do with them once you find them? – Martijn Pieters Apr 23 '14 at 10:55
  • basically this is part of a much larger problem which I am struggling to solve. This is a genetics problem, where the list above represents a tree written in newick format. I ultimately want to find all outgroups which match a certain character string – PaulBarr Apr 23 '14 at 10:55
  • Do you just want to find the groups or do you *need* the indices? – Jayanth Koushik Apr 23 '14 at 10:56
  • 3
    Right, so now we are getting somewhere; any reason you are not using BioPython for this? The [`Phylo` module](http://biopython.org/wiki/Phylo) supports newick trees, for example. I am not a BioPython user myself, but it looks as if you can at least traverse such trees (and thus search). – Martijn Pieters Apr 23 '14 at 10:56
  • I want to find the groups but to determine which ones are outgroups I was going to look at the indices. If you refer to [link](http://stackoverflow.com/questions/23172293/use-python-to-extract-branch-lengths-from-newick-format) you can see the type of list im dealing with. I have written code that allows me to extract just the subtree of interest but Im stuck on the next step. – PaulBarr Apr 23 '14 at 10:58
  • I have updated my answer. Hope it helps. – sshashank124 Apr 23 '14 at 10:59
  • I have never heard of BioPython before (I have only been using python for the last month or so!), I appreciate that I am probably going around this in a very long winded way – PaulBarr Apr 23 '14 at 10:59

1 Answers1

2

You can accomplish that using the following script/method (which I admit is quite primitive):

lst = ['abc123a:01234', 'abcde123a:01234', ['gfh123a:01234', 'abc123a:01234']]

user_in = 'abc'

def get_ind(lst, searchterm, path=None, indices=None):
    if indices is None:
        indices = []
    if path is None:
        path = []
    for index, value in enumerate(lst):
        if isinstance(value, list):
            get_ind(value, searchterm, path + [index], indices)
        elif value.startswith(searchterm):
            indices.append(path + [index])
    return indices

new_lst = get_ind(lst, user_in)

>>> print new_lst
[[0], [1], [2, 1]]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
sshashank124
  • 31,495
  • 9
  • 67
  • 76