I am trying to run this vaex program to perform a graphql query. I believe the operation is not lazily-evaluated. I could confirm that the memory consumption for my python program increases continuously.
import vaex
import time
from vaex.graphql import DataFrameAccessorGraphQL
vaex.cache.off()
start = {'start' : time.ctime()}
print('reading from parquet...', time.ctime())
df = vaex.open('./File_00000.parquet')
print('trying to execute graphql', time.ctime())
result = df.graphql.execute("""
{
df {
min {
col_int10
}
mean {
col_float10
}
max {
col_int20
}
groupby {
col_str10 {
count
mean {
col_float20
}
}
}
}
}
""")
print("The execution is done", time.ctime())
stop = {'stop' : time.ctime()}
print(result.data, start, stop)
The parquet file used here is around 50MB with 20,000 rows and 500 columns.
I am running vaex using conda in my Mac M1 that has 32GB RAM. I killed the python program when RAM usage was showing 16.7GB.
These are the version info:
Name Version Build Channel
vaex 4.8.0 pypi_0 pypi
vaex-astro 0.9.0 pypi_0 pypi
vaex-core 4.8.0 pypi_0 pypi
vaex-graphql 0.2.0 pypi_0 pypi
vaex-hdf5 0.12.0 pypi_0 pypi
vaex-jupyter 0.7.0 pypi_0 pypi
vaex-ml 0.17.0 pypi_0 pypi
vaex-server 0.8.1 pypi_0 pypi
vaex-viz 0.5.1 pypi_0 pypi
Is there any way i could run this operation with lazy loading (i.e. similar to df operations that don't use RAM much), i.e. as a graphql query?
I couldn't find information about this in the documentation:
https://vaex.readthedocs.io/en/docs/api.html#graphql-operations
Thanks!! Happy to provide any more info.