0

I am trying to run this vaex program to perform a graphql query. I believe the operation is not lazily-evaluated. I could confirm that the memory consumption for my python program increases continuously.

import vaex
import time
from vaex.graphql import DataFrameAccessorGraphQL

vaex.cache.off()
start = {'start' : time.ctime()}
print('reading from parquet...', time.ctime())
df = vaex.open('./File_00000.parquet')
print('trying to execute graphql', time.ctime())
result = df.graphql.execute("""
    {
        df {
            min {
                col_int10
            }
            mean {
                col_float10
            }
            max {
                col_int20
            }
            groupby {
                col_str10 {
                   count
                   mean {
                       col_float20
                   }
                }
            }
        }
    }
    """)
print("The execution is done", time.ctime())
stop = {'stop' : time.ctime()}
print(result.data, start, stop)

The parquet file used here is around 50MB with 20,000 rows and 500 columns.

I am running vaex using conda in my Mac M1 that has 32GB RAM. I killed the python program when RAM usage was showing 16.7GB.

These are the version info:

Name                     Version                  Build   Channel
vaex                      4.8.0                    pypi_0    pypi
vaex-astro                0.9.0                    pypi_0    pypi
vaex-core                 4.8.0                    pypi_0    pypi
vaex-graphql              0.2.0                    pypi_0    pypi
vaex-hdf5                 0.12.0                   pypi_0    pypi
vaex-jupyter              0.7.0                    pypi_0    pypi
vaex-ml                   0.17.0                   pypi_0    pypi
vaex-server               0.8.1                    pypi_0    pypi
vaex-viz                  0.5.1                    pypi_0    pypi

Is there any way i could run this operation with lazy loading (i.e. similar to df operations that don't use RAM much), i.e. as a graphql query?

I couldn't find information about this in the documentation:

https://vaex.readthedocs.io/en/docs/api.html#graphql-operations

Thanks!! Happy to provide any more info.

0 Answers0