0

I'm trying to make an script which takes all rows starting by 'HELIX', 'SHEET' and 'DBREF' from a .txt, from that rows takes some specifical columns and then saves the results on a new file.

#!/usr/bin/python
import sys

if len(sys.argv) != 3:
    print("2 Parameters expected: You must introduce your pdb file and a name for output file.")`
    exit()

for line in open(sys.argv[1]):
    if 'HELIX' in line:
        helix = line.split()
        cols_h = helix[0], helix[3:6:2], helix[6:9:2]
    elif 'SHEET'in line:
        sheet = line.split()
        cols_s = sheet[0], sheet[4:7:2], sheet[7:10:2], sheet [12:15:2], sheet[16:19:2]
    elif 'DBREF' in line:
        dbref = line.split()
        cols_id = dbref[0], dbref[3:5], dbref[8:10]


modified_data = open(sys.argv[2],'w')
modified_data.write(cols_id)
modified_data.write(cols_h)
modified_data.write(cols_s)

My problem is that when I try to write my final results it gives this error:

Traceback (most recent call last):
  File "funcional2.py", line 21, in <module>
    modified_data.write(cols_id)
TypeError: expected a character buffer object

When I try to convert to a string using ''.join() it returns another error

Traceback (most recent call last):
  File "funcional2.py", line 21, in <module>
    modified_data.write(' '.join(cols_id))
TypeError: sequence item 1: expected string, list found

What am I doing wrong? Also, if there is some easy way to simplify my code, it'll be great. PS: I'm no programmer so I'll probably need some explanation if you do something...

Thank you very much.

Alejandra_RS
  • 25
  • 1
  • 1
  • 7
  • can you do `print cols_id` before the `modified_data.write(cols_id)`? – Kobi K Sep 17 '14 at 11:05
  • `write` expects a "character buffer object", e.g. a `str`, but you are giving it a `tuple` - convert this to a string first. See any one of the thousand other questions on SO for ["\[python\] TypeError character buffer object"](http://stackoverflow.com/search?q=%5Bpython%5D+TypeError+character+buffer+object). – jonrsharpe Sep 17 '14 at 11:07
  • Kobi K, thanks for asking that, I can only print cols_id inside my for loop. Outside only prints the first row. Do you know why? – Alejandra_RS Sep 17 '14 at 11:22

3 Answers3

0

cols_id, cols_h and cols_s seems to be lists, not strings. You can only write a string in your file so you have to convert the list to a string.

modified_data.write(' '.join(cols_id))

and similar.

'!'.join(a_list_of_things) converts the list into a string separating each element with an exclamation mark

EDIT:

#!/usr/bin/python
import sys

if len(sys.argv) != 3:
print("2 Parameters expected: You must introduce your pdb file and a name for output     file.")`
exit()

cols_h, cols_s, cols_id = []

for line in open(sys.argv[1]):
  if 'HELIX' in line:
    helix = line.split()
    cols_h.append(''.join(helix[0]+helix[3:6:2]+helix[6:9:2]))
  elif 'SHEET'in line:
    sheet = line.split()
    cols_s.append( ''.join(sheet[0]+sheet[4:7:2]+sheet[7:10:2]+sheet[12:15:2]+sheet[16:19:2]))
  elif 'DBREF' in line:
    dbref = line.split()
    cols_id.append(''.join(dbref[0]+dbref[3:5]+dbref[8:10]))

modified_data = open(sys.argv[2],'w')
cols = [cols_id,cols_h,cols_s]
for col in cols:
  modified_data.write(''.join(col))
Hrabal
  • 2,403
  • 2
  • 20
  • 30
  • The same thing to the previous comment, it still doesn't work. [Traceback (most recent call last): File "funcional2.py", line 21, in modified_data.write(' '.join(cols_id)) TypeError: sequence item 1: expected string, list found] – Alejandra_RS Sep 17 '14 at 11:30
  • I just noticed that cols_id is a list of lists, so you have to do the following: `cols_id = ''.join(dbref[0])+''.join(dbref[3:5])+''.join(dbref[8:10])` – Hrabal Sep 17 '14 at 12:44
  • And also... every time your loop takes a new line, you sobstitute the content of your cols_whatever variables, so in the end you'll have only the last found. If you want all the cols found you have to initialize your variables (`cols_s, cols_h, cols_id = ''`) and do the `cols_whathever += new stuff` – Hrabal Sep 17 '14 at 12:48
  • I've tried to to this: cols = '' cols_id = cols + (''.join(dbref[0])+ '-' + '-'.join(dbref[3:5])+ '-' + '-'.join(dbref[8:10])) but when I use print command out of my for loop it still prints only the last row :( – Alejandra_RS Sep 17 '14 at 13:19
  • Didn't understood what you tried :) but I added final code in my answer. Basically you need to initialize with an empty string your variables, then you need to add (+=) the found data to the existing string. What you wrote in the comment seems like: "cols is empty, every time you find DBREF cols_id becomes cols (empty) + the foundthing.. so the last time it founds DBREF you put empty + your last result in cols_id – Hrabal Sep 17 '14 at 13:29
  • Now I really don't understand why it still doesn't work. I've used your suggestions but now it returns another kind of error! " File "1funcional2.py", line 17 elif 'DBREF' in line: ^ " – Alejandra_RS Sep 17 '14 at 13:33
  • Haha, well, I tried to do what I thought you were suggesting, but as I said, I'm not a programmer XD (in fact I'm biologist). I've tried your final code and now it does another strange thing... and I can't see why. – Alejandra_RS Sep 17 '14 at 13:36
  • @Hrabal There are `+` operators missing to make this syntactically correct and repeatedly concatenating strings in a loop is an „anti pattern” in Python. Strings are immutable so every time you add something the old values are copied into a new string, and the strings grow bigger each iteration, so more and more data has to be copied around. The idiomatic approad would be collecting the strings into a list and use a string `join()` at the end. – BlackJack Sep 17 '14 at 14:07
  • Thank you everybody! I've finally achieved what I was looking for. – Alejandra_RS Jan 16 '15 at 20:20
0

Here is a solution (untested) that separates data and code a little more. There is a data structure (keyword_and_slices) describing the keywords searched in the lines paired with the slices to be taken for the result.

The code then goes through the lines and builds a data structure (keyword2lines) mapping the keyword to the result lines for that keyword.

At the end the collected lines for each keyword are written to the result file.

import sys
from collections import defaultdict


def main():
    if len(sys.argv) != 3:
        print(
            '2 Parameters expected: You must introduce your pdb file'
            ' and a name for output file.'
        )
        sys.exit(1)
    input_filename, output_filename = sys.argv[1:3]
    # 
    # Pairs of keywords and slices that should be taken from the line
    # starting with the respective keyword.
    # 
    keyword_and_slices = [
        ('HELIX', [slice(3, 6, 2), slice(6, 9, 2)]),
        (
            'SHEET',
            [slice(a, b, 2) for a, b in [(4, 7), (7, 10), (12, 15), (16, 19)]]
        ),
        ('DBREF', [slice(3, 5), slice(8, 10)]),
    ]
    keyword2lines = defaultdict(list)
    with open(input_filename, 'r') as lines:
        for line in lines:
            for keyword, slices in keyword_and_slices:
                if line.startswith(keyword):
                    parts = line.split()
                    result_line = [keyword]
                    for index in slices:
                        result_line.extend(parts[index])
                    keyword2lines[keyword].append(' '.join(result_line) + '\n')

    with open(output_filename, 'w') as out_file:
        for keyword in ['DBREF', 'HELIX', 'SHEET']:
            out_file.writelines(keyword2lines[keyword])


if __name__ == '__main__':
    main()

The code follows your text in checking if a line starts with a keyword, instead your code which checks if a keyword is anywhere within a line.

It also makes sure all files are closed properly by using the with statement.

BlackJack
  • 4,476
  • 1
  • 20
  • 25
-1

You need to convert the tuple created on RHS in your assignments to string.

 # Replace this with statement given below
 cols_id = dbref[0], dbref[3:5], dbref[8:10]

 # Create a string out of the tuple
 cols_id = ''.join((dbref[0], dbref[3:5], dbref[8:10]))
Ashwinee K Jha
  • 9,187
  • 2
  • 25
  • 19
  • It doesn't work... Traceback (most recent call last): File "funcional2.py", line 18, in cols_id = ''.join((dbref[0], dbref[3:5], dbref[8:10])) TypeError: sequence item 1: expected string, list found – Alejandra_RS Sep 17 '14 at 11:20