0

I'm using AWS Glue, and I want to overwrite a Glue catalog with a Glue job. During my Glue job, I call

glueContext.purge_table(glue_database, glue_table, options={"retentionPeriod": 0})

My next line is me trying to write out the current dataframe out to the catalog:

sink = glueContext.write_dynamic_frame_from_catalog( frame=master_dyf, database=glue_database, table_name=glue_table, additional_options=additionalOptions)

But this throws an error of:

An error occurred while calling o362.pyWriteDynamicFrame. No such file or directory 's3://dev.some.bucket/dev/somepath/part-00009-d324e8e6-dbd5-41e2-b216-c1933d0c120a-c000.snappy.parquet'

What's going on here? I can't figure out why purging a catalog is also removing a my parquets.

Any help is greatly appreciated.

Black Dynamite
  • 4,067
  • 5
  • 40
  • 75
  • I haven't used that func, but the docs say that it will. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-glue-context.html#aws-glue-api-crawler-pyspark-extensions-glue-context-purge_table – Bob Haffner Apr 28 '22 at 20:27
  • @BobHaffner I'm writing out a whole new dataframe, so it shouldn't exist in the catalog yet to be available to be deleted. It's as if the purge is an async process, and the code continues on to write out WHILE the purge is ongoing and the file gets snatched midprocess. – Black Dynamite Apr 28 '22 at 20:29
  • Ok, sorry I misunderstood. Not sure whats going on – Bob Haffner Apr 28 '22 at 21:11
  • Maybe its not the file but the path. Does your path contain a partition by chance? Perhaps the purge deletes a partition (ie `somepath`) and then the write freaks out. Maybe an underlying bug? – Bob Haffner Apr 28 '22 at 21:23
  • Are you reading from the same table that you are purging and ultimately writing to? – Roman Czerwinski Aug 15 '22 at 21:46

0 Answers0