1

I have created glue job to read the data from oracle by using below code.

WhereQuery="select * from test where dated>==CURRENT_DATE-4
connection_oracle11_options = {
    "url": URL,
    "dbtable": tableName,
    "user": USERNAME,
    "password": PASSWORD,
    "query": WhereQuery,
    "hashfield": "testID",
    "hashpartitions": '100'
    }
transaction_item_df = glueContext.create_dynamic_frame.from_options(connection_type="oracle", connection_options=connection_oracle11_options)

if i am using query option it is taking 8 hours and if I am not executing the query it is taking 45mins is query option is correct ?

my data size is 318049228 and I am using Worker type: G1.X and number of workers :100 and "hashpartitions": '100' it is taking 45mins and what is the relation ship between hashpartitions and no of workers?

Sai
  • 1,075
  • 5
  • 31
  • 58

0 Answers0