I was able to create a small glue job to ingest data from one S3 bucket into another, but not clear about few last lines in the code(below).
applymapping1 = ApplyMapping.apply(frame = datasource_lk, mappings = [("row_id", "bigint", "row_id", "bigint"), ("Quantity", "long", "Quantity", "long"),("Category", "string", "Category", "string") ], transformation_ctx = "applymapping1")
selectfields2 = SelectFields.apply(frame = applymapping1, paths = ["row_id", "Quantity", "Category"], transformation_ctx = "selectfields2")
resolvechoice3 = ResolveChoice.apply(frame = selectfields2, choice = "MATCH_CATALOG", database = "mydb", table_name = "order_summary_csv", transformation_ctx = "resolvechoice3")
datasink4 = glueContext.write_dynamic_frame.from_catalog(frame = resolvechoice3, database = "mydb", table_name = "order_summary_csv", transformation_ctx = "datasink4")
job.commit()
- From the above code snippet, what is the use 'ResolveChoice'? is it mandatory?
- When I ran this job, It has created a new folder and file(with some random file name) in the destination(order_summary.csv) and ingested data instead of ingesting directly into my order_summary_csv table(a CSV file) residing in the S3 folder. Is it possible for spark(Glue) to ingest data into a desired CSV file?