I am using Apache PIG to process some data, and at the end of my script I use
store data into '/mypath/tempp2' using PigStorage('\t','-schema');
fs -getmerge /mypath/tempp2 /localpath/data.tsv;
That way I have a tsv
file that I readable with read_csv(headers=0)
in Pandas.
The problem is that the tsv
file now contains the headers on the first row (which is nice) but also the schema concatenated to the first observation in the second row such as:
col1 col2 col3
{pigschema}0 1 2
assuming the first row is [0,1,2]
. So unless I use skiprows=1
in read_csv
(losing that row), I get this weird observation in my data.
So I wonder if there is a better way to export my data, while getting the headers.
Many thanks!