0
chunksize = 10 **2
data = pd.read_csv('C:\\Users\\log.txt', sep=" ", header = None,chunksize =chunksize )

this is what I tried with that 20GB txt file, I used chunksize to chunk it into only 100 lines at a time, and hopefully assign the first 100 lines into variable called data. The problem is whenever i do this, the ipython console dies immediately. Any idea how to solve it?

PS: I want to chunk the whole file into pieces so that I can process them one at a time and upload them into my database

dome some
  • 479
  • 7
  • 22
  • You've not stated what you're trying to achieve here? Basically if your file is too large to fit into memory then you need to decide how to process this in chunks – EdChum Jan 20 '17 at 15:00
  • http://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas – rafaelvalle Jan 20 '17 at 15:01

1 Answers1

0

ok, so i figured it out by using this:

import csv
reader = csv.reader(codecs.open('C:\\log.txt', 'rU', 'utf-16'))  
for each in reader:
    # process each line

the csv package inside python turns out to be pretty useful

dome some
  • 479
  • 7
  • 22