-1

I'm super new to this, I honestly don't understand that much. Can someone help me to create a code to get the sum of column # 3, sorry if this is too silly, hope you can help me. Thanks

It's a tab file.

#Open file (must be a .tab file)

file = open("chromosome_length.tab")

#According to the READ ME file, chromosome 17 is the mitochondrial chromosome.

##Print line 17

lines_to_print = [16]

for index, line in enumerate(file):
  if ( index in lines_to_print):
    print("Mitochondrial chromosome:")
    print(line)

#How long are the chromosome?

with open("chromosome_length.tab") as f:
    lines = f.read().split('\n')

values = [int(i.split()[2]) for i in lines]
print(sum(values))

#Error:

Traceback (most recent call last):
  File "/Users/vc/Downloads/assig.py", line 19, in <module>
    values = [int(i.split()[2]) for i in lines]
  File "/Users/vc/Downloads/assig.py", line 19, in <listcomp>
    values = [int(i.split()[2]) for i in lines]
IndexError: list index out of range

Process finished with exit code 1

FILE:

3   NC_001135   316620
4   NC_001136   1531933
5   NC_001137   576874
Valy1004
  • 43
  • 5

1 Answers1

0

You can do this:

with open('chromosome_length.tab') as f:
    lines = f.read().split('\n')

values = [int(i.split()[2]) for i in lines if i]
print(sum(values))

Explanation:

Opening the file chromosome_length.tab in reading mode, reading all the text, splitting the text by new line (\n)
At this point, we have something like this in our lines list:

[
    "1 NC1234 1234",
    "2 NC4321 5678",
    ...
]

In order to get the 3rd column of each line, we iterate through each line in lines, split the line by space, so we have ["1", "NC1234", "1234"], get the 3rd column by [2] and convert it to int.

So, we have all the values in our values list: [1234, 5678, ...]

In the end, we use the built-in function sum() to sum the values in the values list and print them


UPD: Problem was in the empty string '' at the end of the list. Adding filter if i for our inline for loop solved this issue.


Hope that helps :)

GooDeeJAY
  • 1,681
  • 2
  • 20
  • 27
  • Thanks for taking the time to help me. I'm sorry that I don't post it in the right format. I did what you suggested, but I get an error. I edited my question. – Valy1004 Apr 20 '21 at 15:45
  • 1
    Seems like they are separated by tabs instead of space, try `int(i.split("\t")[2]`. If it will not work, then try uploading your tab file to any sharing service and sharing the link here. – GooDeeJAY Apr 20 '21 at 16:29
  • 1
    Ok, this was an empty string issue, there was a new empty line at the end of the file. I've updated the answer. If it solves your problem don't forget to upvote and mark the answer as an 'Accepted Answer') – GooDeeJAY Apr 20 '21 at 17:45
  • 1
    Amazing! Now it did work, thank you so much for your help, I hope this can be of use to others. I was also able to learn something, thank you very much. – Valy1004 Apr 20 '21 at 17:55