AWS Glue crawler drop unused columns

Asked Nov 23 '20 at 09:01

Active Nov 23 '20 at 09:01

Viewed 971 times

The crawler 1st crawl will create a table schema, some time it not detect " & , correctly and break the data row.

I fix it by updating the table Serde serialization lib.

But now I have a problem, the additional columns that created in 1st crawl still remain, even I re-run the crawler. It has thousands of columns, very annoying.

Is it possible to remove unnecessary columns (col30, col31, col32, ... col3034) on 2nd crawl?

asked Nov 23 '20 at 09:01

Js Lim

3,625
6
42
80

Can you try re-creating a new crawler and run it with the updated serde properties? Also drop the existing table before you re-run the new crawler. – Prabhakar Reddy Nov 23 '20 at 13:05
@PrabhakarReddy [Cannot set serde properties in crawler](https://stackoverflow.com/questions/57498330/specify-a-serde-serialization-lib-with-aws-glue-crawler/63398790#63398790) – Js Lim Nov 24 '20 at 00:48
have you tried custom classifier ? https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html – Prabhakar Reddy Nov 24 '20 at 00:58
tried already, seems not working also – Js Lim Nov 24 '20 at 01:09

AWS Glue crawler drop unused columns

0 Answers0