python pandas read 20GB text file using read_csv

Question

chunksize = 10 **2
data = pd.read_csv('C:\\Users\\log.txt', sep=" ", header = None,chunksize =chunksize )

this is what I tried with that 20GB txt file, I used chunksize to chunk it into only 100 lines at a time, and hopefully assign the first 100 lines into variable called data. The problem is whenever i do this, the ipython console dies immediately. Any idea how to solve it?

PS: I want to chunk the whole file into pieces so that I can process them one at a time and upload them into my database

You've not stated what you're trying to achieve here? Basically if your file is too large to fit into memory then you need to decide how to process this in chunks — EdChum, Jan 20 '17 at 15:00
http://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas — rafaelvalle, Jan 20 '17 at 15:01

score 0 · Accepted Answer · answered Jan 20 '17 at 15:52

0

ok, so i figured it out by using this:

import csv
reader = csv.reader(codecs.open('C:\\log.txt', 'rU', 'utf-16'))  
for each in reader:
    # process each line

the csv package inside python turns out to be pretty useful

answered Jan 20 '17 at 15:52

dome some

479
7
22

python pandas read 20GB text file using read_csv

1 Answers1