0

I have the following table generated in a text file,"fasta.txt"

                A               C               G               T               
0               0.195965417867  0.322766570605  0.35446685879   0.126801152738  
A1              0.25            0.1875          0.3125          0.25            
C1              0.25            0.475           0.225           0.05            
G1              0.135135135135  0.243243243243  0.405405405405  0.216216216216  
T1              0.142857142857  0.285714285714  0.285714285714  0.285714285714  
A2              0.125           0.208333333333  0.625           0.0416666666667 
C2              0.0833333333333 0.416666666667  0.305555555556  0.194444444444  
G2              0.111111111111  0.361111111111  0.388888888889  0.138888888889  
T2              0.1             0.15            0.55            0.2             
A3              0.333333333333  0.25            0.416666666667  0.0             
C3              0.314285714286  0.4             0.171428571429  0.114285714286  
G3              0.254901960784  0.372549019608  0.333333333333  0.0392156862745 
T3              0.235294117647  0.235294117647  0.470588235294  0.0588235294118 

and I want to take the data from the text file and put them in either a dict or a list of lists. I tried to use strip method

with open('fasta.txt') as f:
   for l in f:
      print l.strip().split("\t")

but it doesn't really work out as what I want.

I want to know how to collect the stats(except the 0 row) into a dict as shown below:

d = {"AA":{"1":0.25,"2":0.125,"3":0.333333333333}, "AC":{"1":0.1875,"2":0.208333333333,"3":0.25}, "AG":{"1":0.3125,"2":0.625,"3":0.416666666667}, "AT":{"1":0.25,"2":0.0416666666667,"3":0.0}, "CA":{"1":...,"2":...,"3":...}, "CC":{"1":...,"2":...,"3":...}, "CG":{"1":...,"2":...,"3":...}, "CT":{"1":...,"2":...,"3":...}, "GA":{"1":...,"2":...,"3":...}, "GC":{"1":...,"2":...,"3":...}, "GG":{"1":...,"2":...,"3":...}, "GT":{"1":...,"2":...,"3":...}, "TA":{"1":...,"2":...,"3":...}, "TC":{"1":...,"2":...,"3":...}, "TT":{"1":...,"2":...,"3":...}, "TG":{"1":...,"2":...,"3":...}}  #(... represents the respective data from the table)

Thank you in advance and I am really new to Python. The problem here is to collect data from a text file and not a csv file which uses it's unique csv module to collect data

mcgag
  • 325
  • 2
  • 14
  • *doesn't work* means nothing and is helpless. And is the file tab delimited, or is it a fixed size fields file where padding is done with spaces ? – Serge Ballesta Jun 16 '15 at 21:56
  • possible duplicate of [Creating a dictionary from a CSV file](http://stackoverflow.com/questions/14091387/creating-a-dictionary-from-a-csv-file) – skrrgwasme Jun 16 '15 at 22:11

1 Answers1

1

I think I understand the table that you have, but if the following does not work let me know. I have tried to make this code as generic as possible (i.e. reading in the header line and not assuming 4 bases as header so this could work for say a protein file as well). This code should produce the dict you want:

from collections import defaultdict

d=defaultdict(dict)
with open('fasta.txt') as f:
    headerFields=f.readline().split()
    # discard "0" line
    foo=f.readline()
    for line in f:
        fields = line.split()
        for i, stat in enumerate(fields[1:]):
            d[''.join((fields[0][0], headerFields[i]))][fields[0][1]] = stat

print dict(d)

Also note that you don't need to strip() if you are going to simply split on white space (default for split) as you can see in my code. Hope this helps!

cr1msonB1ade
  • 1,716
  • 9
  • 14
  • Wow! Thank you! It works out perfectly! However, I did not really get what "lambda:dict()" is doing here from the line "d=defaultdict(lambda: dict())". May I know what does it mean? – mcgag Jun 17 '15 at 19:48
  • I actually don't need that and have edited it to be a simple defaultdict. I use that syntax when I want a two layered defaultdict, which I was thinking about for a second. For example: `d=defaultdict(lambda: defaultdict(list))` allows you to do this: `d['AA'][1].append(0.25)`. This is useful if there might be more than one stat for each double key pair. Not necessary here though. – cr1msonB1ade Jun 17 '15 at 20:16