I have 2 feather files based on the same data. The only difference is the way the data is obtained.
File 1 has a list of queries, broken out by month, that are each saved as individual files. Then each file is read into a dictionary and concatenated with pd.concat(dict[values])
in python.
File 2 is another list of queries, broken out into quarters, that are each saved as individual files. Each file is then concatenated through some process in R that I'm not familiar with.
Upon reading both files, I can see that the data is the same. Same number of rows, sums, etc.
But File 1 is 3GB and File 2 is 6GB. Why is that?