Let's say I have a pandas Dataframe
import pandas as pd
df = pd.DataFrame()
df
Column1 Column2
0 0.189086 -0.093137
1 0.621479 1.551653
2 1.631438 -1.635403
3 0.473935 1.941249
4 1.904851 -0.195161
5 0.236945 -0.288274
6 -0.473348 0.403882
7 0.953940 1.718043
8 -0.289416 0.790983
9 -0.884789 -1.584088
........
An example of a query is df.query('Column1 > Column2')
Let's say you wanted to limit the save of this query, so the object wasn't so large. Is there "pandas" way to accomplish this?
My question is primarily for querying at HDF5 object with pandas. An HDF5 object could be far larger than RAM, and therefore queries could be larger than RAM.
# file1.h5 contains only one field_table/key/HDF5 group called 'df'
store = pd.HDFStore('file1.h5')
# the following query could be too large
df = store.select('df',columns=['column1', 'column2'], where=['column1==5'])
Is there a pandas/Pythonic way to stop users for executing queries that surpass a certain size?