0

python shell jobs run on AWS Glue so they use the DPUs assigned to the GLUE, I was going thru the some tutorials where they were running sql queries which were trigging redshift .My concern was that the computation is happening on redshift which doesn't happen in the case of the SPARK GLUE job which does all the processing on its own platform, creating sql queries and doing computation and other stuff.

How can I achieve the same thing if my job is not very resource intensive and I don't need spark to process it and process it on python shell but using the glue resources rather than running the Job on the database it self.

Please help me to understand how can I achieve it ?

bigDataArtist
  • 141
  • 1
  • 12
  • It sounds like you want to pay costs for glue and not for redshift. If you are pulling data from redshift with glue, you will incur costs for querying redshift, and costs for glue running those queries (and doing and other processing). There isn't a way to not pay for what you use. – jonlegend Jul 08 '21 at 15:55
  • no no I meant that like in spark we process the data in glue using spark frame work or Spark SQL statements , underline it uses the spark framework , similary can we use python to process the data by using only python glue recourse like write sql statements which uses only python sql libraries to process the data and now the actual underlying DB like redshift or any other. Hope that makes sense ? – bigDataArtist Jul 11 '21 at 20:13

0 Answers0