0

The crawler 1st crawl will create a table schema, some time it not detect " & , correctly and break the data row.

break schema

I fix it by updating the table Serde serialization lib.

But now I have a problem, the additional columns that created in 1st crawl still remain, even I re-run the crawler. It has thousands of columns, very annoying.

enter image description here

Is it possible to remove unnecessary columns (col30, col31, col32, ... col3034) on 2nd crawl?

Js Lim
  • 3,625
  • 6
  • 42
  • 80
  • Can you try re-creating a new crawler and run it with the updated serde properties? Also drop the existing table before you re-run the new crawler. – Prabhakar Reddy Nov 23 '20 at 13:05
  • @PrabhakarReddy [Cannot set serde properties in crawler](https://stackoverflow.com/questions/57498330/specify-a-serde-serialization-lib-with-aws-glue-crawler/63398790#63398790) – Js Lim Nov 24 '20 at 00:48
  • have you tried custom classifier ? https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html – Prabhakar Reddy Nov 24 '20 at 00:58
  • tried already, seems not working also – Js Lim Nov 24 '20 at 01:09

0 Answers0