0

I tried to connect teradata using sqlalchemy in order to use read_sql and to_sql methods form pandas.

However, the connection is so slow. Even simple stuff, such as pd.read_sql('select current_date'), will cost more than 30 seconds to complete.

I don't really understand why this is so slow. If anyone experienced similar issue before, please tell me why and how you solved this. Thanks!

Updates:
I tried cProfile and sqlTAP and realize the slowness is due to some queries that the dialect generated. The has_table meathod will run a query to dbc.tablesvx view and this query will take around 100 seconds to finish while the view is only about 55k rows. For pd.to_sql, this has_table might be called more than once and some other queries to system tables will be required.

-- query from has_table method   
SELECT tablename
FROM dbc.tablesvx
WHERE DatabaseName=?
  AND TableName=? 
-- query form drop_table method
SELECT tablename
FROM dbc.TablesVX
WHERE DatabaseName = ?
  AND (tablekind='T'
       OR tablekind='O')

It seems like all I need to do is some tuning to the system tables to make the queries run faster. However, our sql help persons told me that those system tables are already in the best performance. Is this possible? Is there anyone who have done any tuning to teradata DBC view? Thanks.

yl2526
  • 1
  • 2
  • It's difficult to say since it could be your teradata box, network, odbc driver, sql alchemy, or whatever is sitting in the stack above it. Perhaps check out [this article](http://docs.sqlalchemy.org/en/latest/faq/performance.html) to dig in deeper about what is causing the problem. – JNevill Jan 09 '17 at 19:21
  • @JNevill, Thanks! I will definitely look into this after the business trip. The thing makes this weird is that, if I used pyodbc to set up the teradata connection, the speed is normal. So, I totally got lost about what could be wrong about this and don't know where to start. I feel like this finally gives me something to start with. – yl2526 Jan 12 '17 at 00:29
  • That particular view `tablesvx` is pretty weighty. It takes 4-6 seconds for my system to respond to a query for a specific table and database against that view. Since the view takes into account the user's access there's a lot of testing in the WHERE clause. Can you just hit `dbc.tablesize` or do you need to take into account the logged in user's access? – JNevill Feb 15 '17 at 18:27
  • I also agree with your sql help person. Don't touch the DBC tables. They are tuned fine. Changing how they are tuned may affect other things since those tables hold meta on the tables, database, security and other whatnots that make the system run. – JNevill Feb 15 '17 at 18:29
  • @JNevill, Thanks for you response. Really appreciate it. If the query only takes 4-6 seconds or just under 10 seconds, I will have no problem about it but it is taking over 100 seconds. Also, to drop the table, the dialect will try run a query taking over 200 seconds. The queries are generated by sqlalchemy-teradta dialect and I cannot change it. What do you think I can do to alleviate this overhead? Should I mask the dialect's methods? – yl2526 Feb 15 '17 at 20:58
  • You can't tune the system views besides collecting statistics. To check if you DBAs implemented statistics simply run `select * from dbc.StatsV where databasename = 'dbc'`. If no rows are returned point them to http://developer.teradata.com/blog/carrie/2010/05/if-you-re-not-collecting-statistics-on-your-dictionary-tables-do-it-now – dnoeth Feb 15 '17 at 21:28
  • @denoth, Thanks. I tried `select * from dbc.StatsV where databasename ='dbc' and lower(trim(tablename)) in ('indices', 'tablesvx', 'all_ri_childrenv') `before and found nothing. I thought the reason is that you can only collect stats on tables but not on view. Can you do collect stat on view? – yl2526 Feb 15 '17 at 22:16
  • No, you can't collect stats on views (you can collect on calculations, but only on a table level). Are there stats on dbc? – dnoeth Feb 16 '17 at 07:42
  • @dnoeth, Thanks. Yes, there are. However, some stats were collected one year ago and some columns that the view is joining or filtering on doesn't have stats. So, for the underling tables, I should collect stats on all the columns on which the system views do the calculation, join or filter and all columns that I want to filter on the view. Should I collect stats on some columns together or one by one? I never understand this. – yl2526 Feb 16 '17 at 14:58
  • You can't collect stats on system tables, only dbc (i.e your DBAs) can. As I said, point them to Carrie Ballinger's blog. Of course stats should be periodically refreshed, too. – dnoeth Feb 16 '17 at 15:43
  • @dnoeth, Sure, thanks. – yl2526 Feb 16 '17 at 15:44

0 Answers0