For context I'm attempting to use the teradataml module in order to try and leverage the sever side speed and fastload in order to get it in to Pandas for further processing and visualisation.
I can run the below query just fine by creating a connection with either teradatasql or sqlqlchemy and using the pandas read_sql command. However it takes between 5 and 6 minutes to run the 6 million rows in to the dataframe, which is roughly the same as outputting it from SQL Assistant or Terada Studio. So I guess behind the scenes it's writing it in a similar way as it would output it in the native Teradata programs.
I've also tried using the read_sql and running it in chunks of various sizes, but it doesn't seem to make all that much of an appreciable difference.
I tried to use teradataml to run the query but I get the error 'Error 3524 The user does not have CREATE VIEW access to database'. Which makes sense as I only have Select access and not Create access to that particular database. There's also no way I'd be able to get Create access to that database.
from teradataml import *
td_connection = create_context(host = 'host', user = 'user1', password = 'mypassword', database = 'DATABASE_A')
query = """ SELECT
REPORTING_DATE,
COLUMN_1,
UPPER(COLUMN_2) AS COLUMN_2,
CASE WHEN COLUMN_3 NOT IN ('A', 'B') THEN 'N'
WHEN COLUMN_3 IN ('A', 'B') THEN COLUMN_3
ELSE 'ERROR'
END AS COLUMN_3,
COLUMN_4,
COLUMN_5,
Sum(ZeroIfNull(COLUMN_6)) AS AMOUNT
FROM DATABASE_A.TABLE_B
GROUP BY 1,2,3,4,5 """
df = DataFrame.from_query(query)
Is there a way to run it as a query when I don't have Create access to the database? Running it as a query cuts the umber of rows from tens of millions down to about 6 million.
I've also tried creating a dataframe from the table as a whole, which seems to work, and takes less than a minute. However when trying to pass that to Pandas it takes longer than the 6 minute time I'm trying to decrease.
from teradataml import *
td_connection = create_context(host = 'host', user = 'user1', password = 'mypassword', database = 'DATABASE_A')
df = DataFrame('TABLE_B')
If I can't use an SQL query to cut the data down before passing it to Pandas, is there a way to perform the same aggregations and case statement using the native language teradataml?
I don't enough about the teradataml command structure (which seems similar to Pandas, but not similar enough for me to use my knowledge of it) in order to make it work.
I've tried a number of different iterations of the query as well as using the database both in the connection and fully in the query.