I am new to python & trying to compare two large CSV files (300 Million rows & 50 Columns). wondering how to do this in pandas if it is a better option. The input & output expected are given below
file 1:
key,field1,field2,field3
001,belgium,1000,123.56
002,usa,200,345.65
003,canada,3000,675.00
file 2:
key,field1,field2,field3
001,belgium,500,0
002,usa,200,345.65
004,Brazil,2500,458.00
output (with comparison indicators)
(s-same values, C-value changed, O-value changes from nonzero to zero, record deleted in new file, N- record newly added in new file)
Output expected:
key,field1,field2,field3
001,S,C,O
002,S,S,S
003,D,D,D
004,N,N,N