How to drop the duplicate column in glue job. As glue is creating duplicate column

Question

I have created the glue job and its creating duplicate column once I run the crawler on transformed file .How to drop the duplicate column in it

I have know there is DropNullFields function but it will drop the null field not duplicate coulmn.

What is the way to drop the duplicate column? and stored in csv

Here is code

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "sample", table_name = "test", transformation_ctx = "datasource0")

dfc = datasource0.relationalize("root", "s3://testing/")

for df_name in dfc.keys():
    m_df = dfc.select(df_name)
    dropNullfields = DropNullFields.apply(frame = m_df)
    datasink2 = glueContext.write_dynamic_frame.from_options(frame = DropNullFields , 
connection_type = "s3", connection_options = {"path": "s3://sample/" + 
df_name +"/"}, format = "csv", transformation_ctx = "datasink2")


job.commit()

Which column is it? Where does this duplicate column come from? — Robert Kossendey, Aug 17 '21 at 10:37
The root table occurs 3 duplicated column as fname,lname,userid — Parag Shahade, Aug 17 '21 at 11:47

score 1 · Answer 1 · answered Aug 17 '21 at 12:05

1

You can use the .dropFields() function. Example:

droppedFields = dropNullfields.drop_fields(paths=["lname", "userid"])

answered Aug 17 '21 at 12:05

Robert Kossendey

6,733
2
12
42

Thank you!!! How to automatize this in the upcoming glue job – Parag Shahade Aug 17 '21 at 12:17

How to drop the duplicate column in glue job. As glue is creating duplicate column

1 Answers1