3

I have a file that has a oneline header and a long column with values. I want to add a second column with values since 10981 (step = 1) until the end of the file (ommiting the header, of course). The problem is that the script needs a lot of memory and my pc crashes, probably due to the script is not well made (sorry, I am new programming!). The script that I have done is this:

with open ('chr1.phyloP46way.placental2.wigFix', 'w') as file_open:
    num = 10981
    text = file_open.readlines()
    next (text)
    for line in text:
        num = num + 1
        print line.strip() + '\t' + str(num)

As my PC crashes when I run it, I tried to test it in pycharm with the following error, what I have seen is probably due to lack of memory:

Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

Any idea to solve this?

Thank you very much!

Pablo
  • 4,821
  • 12
  • 52
  • 82
JaimeG
  • 35
  • 5
  • Welcome to SO! Please correct the formatting of the code so we don't have to assume how far the block goes. The easiest way to do it is usually to paste the code in, select it all, and click the code formatting button (looks like `{}`). Also, if the script itself is crashing, please provide the traceback. – glibdud Aug 11 '16 at 12:07

2 Answers2

3

If your system is running out of resources, the likely culprit is the readlines() call, which causes Python to try to load the entire file into memory. There's no need to do this... a file object can itself be used as an iterator to read the file line by line:

with open ('chr1.phyloP46way.placental2.wigFix', 'w') as file_open:
    num = 10981
    next (file_open)
    for line in file_open:
        num = num + 1
        print line.strip() + '\t' + str(num)
glibdud
  • 7,550
  • 4
  • 27
  • 37
1

It is difficult to verify if it works without the .txt, but give a try to that one

f = open(os.path.join(data_path, 'chr1.phyloP46way.placental2.wigFix'), 'r')
lines = f.readlines()
num = 10981

for line_num in range(len(lines)):

    line_in = lines[line_num]

    num = num + 1
    print line_in.strip() + '\t' + str(num)

---- Update: following Rory Daulton comment

I had some time to do a small test. Maybe this one will help: save the following code in a file named converter.py

import os

def add_enumeration(data_path, filename_in, filename_out, num=10981):

    # compose the filenames:
    path_to_file_in  = os.path.join(data_path, filename_in)
    path_to_file_out = os.path.join(data_path, filename_out)

    # check if the input file exists: 
    if not os.path.isfile(path_to_file_in):
        raise IOError('Input file does not exists.')

    # open the files:
    # if f_out does not exists it will be created.
    # if f_out is not empty, content will be deleted
    f_in  = open(path_to_file_in, 'r')
    f_out = open(path_to_file_out, 'w+')  

    # write the first line of the file in:
    f_out.write(f_in.readline())

    for line_in in f_in:

        f_out.write(line_in.strip() + '    ' + str(num) + '\n')
        num = num + 1

    f_in.close()
    f_out.close()

then from an ipython terminal:

In: run -i converter.py

In: add_enumeration('/Users/user/Desktop', 'test_in.txt', 'test_out.txt')

Note that if test_out is not empty, its content will be deleted. This should avoid importing all the lines in a list with readlines(). Let me know if the memory problem is still there.

SeF
  • 3,864
  • 2
  • 28
  • 41