I've been experimenting with Apache Arrow. I have used the column oriented memory mapped files for many years. In the past, I've used a separate file for each column. Arrow seems to like to store everything in one file. Is there a way to add a new column without rewriting the entire file?
Asked
Active
Viewed 246 times
1 Answers
2
The short answer is probably no.
Arrow's in-memory format & libraries support this. You can add a chunked array to a table by just creating a new table (this should be zero-copy).
However, it appears you are talking about storing tables in files. None of the common file formats in use (parquet, csv, feather) support partitioning a table in this way.
Keep in mind, if you are reading a parquet file, you can specify which column(s) you want to read and it will only read the necessary data. So if your goal is only to support individual column retrieval/query then you can just build one large table with all your columns.

Pace
- 41,875
- 13
- 113
- 156