I'm trying to load a Dask dataframe from a SQL connection. Per the read_sql_table documentation, it is necessary to pass in an index_col. What should I do if there's a possibility that there are no good columns to act as index?
Could this be a suitable replacement?
# Break SQL Query into chunks
chunks = []
num_chunks = math.ceil(num_records / chunk_size)
# Run query for each chunk on Dask workers
for i in range(num_chunks):
query = 'SELECT * FROM ' + table + ' LIMIT ' + str(i * chunk_size) + ',' + str(chunk_size)
chunk = dask.delayed(pd.read_sql)(query, sql_uri)
chunks.append(chunk)
# Aggregate chunks
df = dd.from_delayed(chunks)
dfs[table] = df