50 million records from the oracle database to vaex using from_pandas

Asked Aug 20 '20 at 17:34

Active Aug 26 '20 at 09:45

Viewed 986 times

The code below is from the vaex documentation:

pandas_df = pd.read_sql_query('SELECT * FROM MYTABLE', con=engine)
df = vaex.from_pandas(pandas_df, copy_index=False)

Description

I have data more than RAM. But, when I use above code, it try and pull all data in panda dataframe. So to solve this I used chunksize attribute which gives a generator.

To convert from generator to pandas dataframe again it is needs memory. Below is the code I tried.

import vaex
df = pd.read_sql_query('select * from "user"."table"', conn, chunksize=1000000)
chunk_list = []
for i in df:
    chunk_list.append(i)
    data = pd.concat(chunk_list)
    df2 = vaex.from_pandas(data)
    alldat=df2.concat(df2)

Please help me with this issue.

edited Aug 26 '20 at 09:45

Vadim Kotov

8,084
8
48
62

asked Aug 20 '20 at 17:34

komal kakade

This is my first post here so please ignore spelling mistakes and spacing errors. Thanks. – komal kakade Aug 20 '20 at 17:38
Are you getting Memory Error? – Bhuvan Kumar Aug 26 '20 at 10:02
1

I would pass to vaex each chunk, and export each chunk to hdf5. Then you can lazily open and concatenate all chunks, without wasting any memory. – Joco Oct 19 '20 at 23:10

50 million records from the oracle database to vaex using from_pandas

0 Answers0