0

I have created the glue job and its creating duplicate column once I run the crawler on transformed file .How to drop the duplicate column in it

I have know there is DropNullFields function but it will drop the null field not duplicate coulmn.

What is the way to drop the duplicate column? and stored in csv

Here is code

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "sample", table_name = "test", transformation_ctx = "datasource0")

dfc = datasource0.relationalize("root", "s3://testing/")

for df_name in dfc.keys():
    m_df = dfc.select(df_name)
    dropNullfields = DropNullFields.apply(frame = m_df)
    datasink2 = glueContext.write_dynamic_frame.from_options(frame = DropNullFields , 
connection_type = "s3", connection_options = {"path": "s3://sample/" + 
df_name +"/"}, format = "csv", transformation_ctx = "datasink2")


job.commit()
Parag Shahade
  • 57
  • 3
  • 8

1 Answers1

1

You can use the .dropFields() function. Example:

droppedFields = dropNullfields.drop_fields(paths=["lname", "userid"])
Robert Kossendey
  • 6,733
  • 2
  • 12
  • 42