I am trying to pivot a data that has 1 billion rows and 3 columns. To do this I am trying to read the file in chunks and apply pivot on each chunk. The following script is only pivoting the last row but not the entire file. Does any one know how to apply this on complete data ?
input data
r_id g_id exp
c1 g1 1
c2 g1 2
c3 g1 3
c1 g2 4
c2 g2 5
c3 g2 6
c1 g3 7
c2 g3 8
c3 g3 9
Script - Working
import pandas as pd
my_data1 = pd.read_csv("test.input", sep='\t')
my_data2 = pd.DataFrame(my_data1)
my_data3 = my_data2.pivot('r_id', 'g_id', 'exp')
my_data3.to_csv("test.output", sep='\t')
Chunk Script - not working
import pandas as pd
chunker = pd.read_csv('test.input',sep='\t', chunksize=1)
tot = pd.DataFrame()
for piece in chunker:
tot = piece.pivot('r_id', 'g_id', 'exp')
tot.to_csv('test.output', sep='\t')
Desired output
r_id g1 g2 g3
c1 1 4 7
c2 2 5 8
c3 3 6 9