6

I'm setting a new crawler that execute on schedular but fail with double quotes that have commas inside

I search and find that OpenCSVSerDe lib is used to edit table details but I'm creating new tables and I want to know how to add some config that allows the crawler to generate data catalog correctly

If csv file have value like "$3.62","4,406" the data catalog should be

col0     col1
"$3.62"  "4,406"

but I'm getting:

col0     col1  col2
"$3.62"  "4    406"
Daniel I. Cruz
  • 1,235
  • 1
  • 12
  • 13

1 Answers1

1

Try to create a classifier (Crawlers → Classifiers) and assign it to the specific crawler (Crawler Info → Tags, description, security configuration, and classifiers).

I've tried the following settings and it works perfectly: enter image description here

pdanchenko
  • 204
  • 2
  • 7
  • 1
    that was my first idea but don't works because the crawler keep using `LazySimpleSerDe` as serialization lib that have no support to double quoted string with commas inside. – Daniel I. Cruz Aug 28 '19 at 13:37
  • 1
    Any luck solving this issue? I have the same problem and it is absolutely annoying... – thijsvdp Jan 20 '21 at 22:25
  • @thijsvdp Any luck? I changed the Serde serialization lib to OpenCSVSerde, and set the appropriate Serde parameters but Glue does not seem to notice these changes – Angelo Bovino Apr 29 '22 at 17:29
  • This has solved my issue: https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html – Amit Jun 12 '23 at 13:15