0

I've built some tools that create front-end list boxes for users that reference dynamic Redshift tables. New items in the table, they appear automatically in the list.

I want to put the list in alphabetical order in the database so the dynamic list boxes will show the data in that order.

After downloading the list from an API, I attempt to sort the list alphabetically in a Pandas dataframe before uploading. This works perfectly:

df.sort_values(['name'], inplace=True, ascending=True, kind='heapsort')

But then when I try to upload to Redshift in that order, it loses the order while it uploads. The data appears in chunks of alphabetically ordered segments.

db_conn = create_engine('<redshift connection>')

obj.to_sql('table_name', db_conn, index = False, if_exists = 'replace')

Because of the way the third party tool (Alteryx) works, I need to have this data in alphabetical order in the database.

How can I modify to_sql to properly upload the data in order?

dataflux
  • 7
  • 1
  • 4
  • No sure, but it may be because of index. Try `df.sort_values(['name'], inplace=True, ascending=True, kind='heapsort').reset_index(drop=True)` – Arthur Gouveia Oct 06 '17 at 14:40
  • 1) If you load data to redshift this way it will be very slow (that may be ok for you) 2) you cannot guarantee the order of results from a select without an order statement. I suggest you add a column on to your table to store the order in which you want them to appear. – Jon Scott Oct 06 '17 at 14:57

1 Answers1

0

While ingesting data into redshift, data gets distributed between slices on each node in your redshift cluster.
My suggestion would be to create a sort key on a column which you need to be sorted. Once you have sort key on that column, you can run Vacuum command to get your data sorted.
Sorry! I cannot be of much help on Python/Pandas

If I’ve made a bad assumption please comment and I’ll refocus my answer.

Rahul Gupta
  • 1,744
  • 2
  • 17
  • 28
  • This is a bit misleading and does not really answer the question. Also you have oversimplified the selection of sort key as "a column which you need to be sorted". – Jon Scott Oct 09 '17 at 16:05
  • What I've done is sorted the data after pulling it from the db, so it's just another way around it. But thanks for reminding me to use sort keys. – dataflux Oct 12 '17 at 14:49