Does Parquet support storing various data frames of different widths (numbers of columns) in a single file? E.g. in HDF5 it is possible to store multiple such data frames and access them by key. So far it looks from my reading that Parquet does not support it, so alternative would be storing multiple Parquet files into the file system. I have a rather large number (say 10000) of relatively small frames ~1-5MB to process, so I'm not sure if this could become a concern?
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
dfs = []
df1 = pd.DataFrame(data={"A": [1, 2, 3], "B": [4, 5, 6]},
columns=["A", "B"])
df2 = pd.DataFrame(data={"X": [1, 2], "Y": [3, 4], "Z": [5, 6]},
columns=["X", "Y", "Z"])
dfs.append(df1)
dfs.append(df2)
for i in range(2):
table1 = pa.Table.from_pandas(dfs[i])
pq.write_table(table1, "my_parq_" + str(i) + ".parquet")