0

Really simple question here but bugging me for long enough to ask. Code looks like this:

f4 = open("genomekey2.txt", 'rb')
keyline = f4.readline()
keygenomes = []
for keyline in f4:
   keygenomes.append(keyline[:-1])

the genomekey2.txt file format looks like this

['Prochlorococcus marinus str. MIT 9202']
['Prochlorococcus marinus str. NATL1A']
['Synechococcus sp. RS9917']
['Nostoc sp. PCC 7120']
['Synechococcus sp. JA-2-3B'a(2-13)']

The problem being when I print the genomekey list it has all of the entries I want but with quotation marks around each of the [ ] found within the list. I want to get rid of the quotation marks so I can compare it with another list but so far haven't found a way. I tried...

for a in keygenomes:
    a.replace('"', '')

But that didn't seem to work. I would rather a solution where it just doesn't add the quotation marks on at all. What are they for anyway and which part of the code (.append, .readline()) is responsible for adding them? Massively beginner question here but you guys seem pretty nice.

Edit: I eventually want to compare it with a list which is formatted as such

[['Arthrospira maxima CS-328'], ['Prochlorococcus marinus str. MIT 9301'], ['Synechococcus sp. CC9605'], ['Synechococcus sp. WH 5701'], ['Synechococcus sp. CB0205'], ['Prochlorococcus marinus str. MIT 9313'], ['Synechococcus sp. JA-3-3Ab'], ['Trichodesmium erythraeum IMS101'], ['Synechococcus sp. PCC 7335'], ['Trichodesmium erythraeum IMS101'], ...

Edit: So I think I got something to work with a combination of answers, thank you all for your help! The quotations were interfering with the list comparison so I just added them on to the first list as well, even though I think it's only mimicking the list being entered as a string (of which I now think I understand the distinction) it seems to work

f4 = open("genomekey2.txt", 'rb')
keyline = f4.readline()
keygenomes = []
for keyline in f4:
    keygenomes.append(keyline[:-1])

specieslist = " ".join(["%s" % el for el in specieslist])

nonconservedlist = [i for i in keygenomes if i not in specieslist]

Edit: Yeah the above worked but the more elegant solution I found here (http://forums.devshed.com/python-programming-11/convert-string-to-list-71857.html) after understanding the problem better thanks to your guys help is like this:

for keyline in f4:
    keyline = eval(keyline)
    keygenomes.append(keyline)

Thanks!

cc211
  • 387
  • 1
  • 4
  • 12
  • 1
    possible duplicate of [Python: Printing a list without the brackets and single quotes?](http://stackoverflow.com/questions/5750042/python-printing-a-list-without-the-brackets-and-single-quotes) – Wooble Apr 04 '12 at 16:09
  • If you know for sure that each line of the input file is of the format: ['...'] then you should just be able to go from character [2:-2], I think it is in Python. – Jesus is Lord Apr 04 '12 at 16:12
  • I don't think it's exactly the same problem? – cc211 Apr 04 '12 at 16:12
  • @Wooble: The OP wants to compare lists. How does the linked thread help with that? – Sven Marnach Apr 04 '12 at 16:13
  • 2
    @cc211: You are confusing what gets printed by `print my_list` with `my_list` itself. If you want to compare two lists, the output of `print my_list` is immaterial. – Sven Marnach Apr 04 '12 at 16:15
  • @SvenMarnach: maybe you're right. I suppose I was fixated on the bit about quotes showing when the list is printed. There's not much of a real question here in any event. – Wooble Apr 04 '12 at 16:15
  • @Wooble: Agreed, the question isn't clear. – Sven Marnach Apr 04 '12 at 16:16
  • So you're saying in the comparison python makes the quotation marks are not there, it's just to show that they are printed for the user in the terminal? – cc211 Apr 04 '12 at 16:21
  • Also thank you for your quick responses! – cc211 Apr 04 '12 at 16:22
  • If so maybe the problem is downstream with my comparison as for the nonconserved list which I used this for nonconservedlist = [i for i in keygenomes if i not in specieslist] I get ["['Prochlorococcus marinus str. NATL1A']", "['Synechococcus sp. RS9917']", "['Nostoc sp. PCC 7120']", "['Synechococcus sp. JA-2-3B'a(2-13)']" ... with no subtraction really from keygenomes? – cc211 Apr 04 '12 at 16:25
  • @cc211 Because your strings contain '' they are printed with "" around them, so that it clear what is part of the string. It would be useful it you added the `repr()` of both keygenomes and specieslist. I think specieslist might be a list of lists? – Douglas Leeder Apr 04 '12 at 21:58

5 Answers5

2

Based on what you want to compare your list to, it seems like you are wanting a list of lists and not a list of strings.... Maybe this?

f4 = open("genomekey2.txt", 'rb')
keygenomes = []
for keyline in f4.readlines():
    if keyline:
        keygenomes.append(eval(keyline.strip()))

You are going to have issues with lines line this:

['Synechococcus sp. JA-2-3B'a(2-13)']

The quotes are not correct and it will break the eval. Is it possible to mix the quotes? Like this instead...

["Synechococcus sp. JA-2-3B'a(2-13)"]
Robert
  • 36
  • 3
  • Hadn't seen this! this is what I came back to and have put in the edit. I will just manually edit those few species for which those ' are in the way I think. – cc211 Apr 05 '12 at 08:48
1

A quick and dirty solution is to skip the first two and last two chars of the line

f4 = open("genomekey2.txt", 'rb')
keyline = f4.readline()
keygenomes = []
for keyline in f4:
   # CHANGE HERE
   keygenomes.append(keyline[2:-2])

otherwise use a regexp like

g = re.match(("^\['(?P<value>.*)'\]"), "['Synechococcus sp. JA-2-3B'a(2-13)']")
g.group(1)
"Synechococcus sp. JA-2-3B'a(2-13)"
fabrizioM
  • 46,639
  • 15
  • 102
  • 119
  • The quick and dirty solution doesn't take away the quotation marks it takes away other characters, this is what was throwing me before – cc211 Apr 04 '12 at 16:13
  • 1
    @cc211: which quotation marks are you trying to get rid of? I think you're confusing a python list object with a list of strings which look like the __repr__ of a list object in python. – Joel Cornett Apr 04 '12 at 17:11
  • @cc211: can you edit your question to show EXACTLY how you would like your desired output to look? Preferably in a python interpreter prompt. – Joel Cornett Apr 04 '12 at 17:13
1

a.replace(...) returns the modified string, it doesn't modify a.

Therefore you need to actually replace the entries in your array, or fix them before you put them in your array.

keygenomes = [ a.replace('"', '') for a in keygenomes ]

Edit:

I think I had not read the question carefully enough - the " comes when you print a string - it's not part of the string itself.

Douglas Leeder
  • 52,368
  • 9
  • 94
  • 137
  • Even with that code in place I still get the same output ["['Prochlorococcus marinus str. NATL1A']", "['Synechococcus sp. RS9917']", "['Nostoc sp. PCC 7120']", "['Synechococcus sp. JA-2-3B'a(2-13)']", with f4 = open("genomekey2.txt", 'rb') keyline = f4.readline() keygenomes = [] for keyline in f4: keygenomes.append(keyline[:-1]) keygenomes = [ a.replace('"', '') for a in keygenomes ] print keygenomes Which bit am I doing wrong? – cc211 Apr 04 '12 at 17:16
  • @cc211 I think I had not read the question carefully enough - the " comes when you print a string - it's not part of the string itself. – Douglas Leeder Apr 04 '12 at 21:53
0

Your replace is using the wrong string; you're trying to remove single quotes, but your string is a double quote. Also the replacement isn't in-place since strings aren't mutable, you have to use the return value.

keygenomes.append(keyline[:-1].replace("'", ""))
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • so the slightly horrible thing is, I've just added something to the edit, the thing I'm comparing it to needs single quotations inside the [ ], I think the output I get with your suggestion above is '[Cyanothece sp. PCC 7822]', '[Oscillatoria sp. PCC 6506]', '[Cylindrospermopsis raciborskii CS-505]', '[Nostoc azollae 0708]', '[Synechococcus sp. WH 7805]'] – cc211 Apr 04 '12 at 16:20
0

Try something like that:

keygenomes = []
f4 = open("genomekey2.txt", 'rb')
keyline = f4.readline()
for keyline in f4:
    keyline = keyline.strip()
    if keyline and keyline.startswith("['") and keyline.endswith("']"):
        keygenomes.append(keyline[2:-2])
Maksym Polshcha
  • 18,030
  • 8
  • 52
  • 77
  • This has got me the closest to the output I need so far but I still get ['Prochlorococcus marinus str. NATL1A', 'Synechococcus sp. RS9917', 'Nostoc sp. PCC 7120', "Synechococcus sp. JA-2-3B'a(2-13)", 'Synechococcus sp. RS9916', 'Prochlorococcus marinus str. AS9601', When ideally there would be [['Prochlorococcus marinus str. NATL1A'], ['Synechococcus sp. RS9917'], ['Nostoc sp. PCC 7120'], ["Synechococcus sp. JA-2-3B'a(2-13)"], ['Synechococcus sp. RS9916'], ['Prochlorococcus marinus str. AS9601'], etc. – cc211 Apr 04 '12 at 17:04