I have file of around 1.5 Gb and I want to divide file into chunks so that I can use multi processing to process each chunk using pp(parallel python) module in python. Till now i have used f.seek in python but it takes a lot of time, as it may be seek increment the byte by byte.So what can be the alternate way? Can i do this through mrjob(map-reduce package) of python?
Sample code: I am doing something like this
def multi(i,slots,,file_name,date):
f1=open(date+'/'+file_name,"rb")
f1.seek(i*slots*69)
data=f1.read(69)
counter=0
print 'process',i
while counter<slots:
##do some processing
counter+=1
data=f1.read(69)
My each row contains a 69 bytes tupple data and Multi function is called parallely n time(here n is equal to slots) to do the job