0

If I have a table on my server and I am producing a connection string to it, how can I, using Vaex, load it to a dataframe?

Here is what I am doing but with Pandas:

from sqlalchemy import types, create_engine, text
import pandas as pd
import pymysql

def connect_to_data(driver='mysql+pymysql://', conn_string=''):
    try:
        conn = create_engine(driver + conn_string)
        print("MySQL Connection Successfull!")
    except Exception as err:
        print("MySQL Connection Failed!")
        print(err)
    return conn

# Connect to the db:
conn_string = 'xxxxxxxx'
conn = connect_to_data(conn_string=conn_string)

# Get all requests from the db:
query = '''SELECT * FROM table_name'''

result = conn.execute(text(query))

# Desired dataframe:
df = pd.read_sql_query(query, conn)

How can I do the same with Vaex (because of it's high performance)?

SteveS
  • 3,789
  • 5
  • 30
  • 64

1 Answers1

2

For now at least, you can't do it directly. But vaex can easily read a pandas dataframe so you can

# Following your example..
pandas_df = pd.read_sql_query(query, conn)

df = vaex.from_pandas(pandas_df)
Joco
  • 803
  • 4
  • 7
  • But if I have 10 billion rows in my query, pandas won't be able to do it. – SteveS Mar 18 '22 at 15:59
  • 1
    Yeah so the FAQ of the docs of vaex explains how you do it (https://vaex.io/docs/faq.html). Basically, load a chunk of the data (as much as you can carry) export to hdf5/arrow/parqet, then load another chunks and so on. When you have all the data, you can open all of those files together. The power of vaex comes from reading/working with data directly from disk and not putting it all in memory, so you need to basically find a way to "download" it locally, whatever your original source is. – Joco Mar 18 '22 at 23:22