AWS Glue cannot detect correct schema from CSV

Question

I have a csv with the current structure:

name, path, date
aurora, Chicago, 20200130
mark, "Syracuse, 2365", 2020131

The result table in Glue looks like:

name, path, date
aurora, Chicago, 20200130
mark, Syracuse, 2365

I tried to build a classifier for CSV and add it to the Crawler but, as quotes are present only in some rows, it doesn't help Glue to find the right schema, that would be:

name, path, date
aurora | Chicago | 20200130
mark | Syracuse, 2365 | 2020131

Any idea?

Does this answer your question? [AWS Glue issue with double quote and commas](https://stackoverflow.com/questions/50354123/aws-glue-issue-with-double-quote-and-commas) — Shailesh, May 14 '20 at 17:58

score 1 · Answer 1 · answered May 14 '20 at 17:55

You should use the OpenCSV SerDe for this.

Your CREATE TABLE query will look like this:

CREATE EXTERNAL TABLE IF NOT EXISTS testtimestamp1(
 `profile_id` string,
 `creationdate` date,
 `creationdatetime` timestamp
 )
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
 LOCATION 's3://<location>'

Create the table in Athena using the above query (instead of using Glue)

Once the table is created, use MSCK REPAIR TABLE <table_name>, to actually load the partitions.

AWS Glue cannot detect correct schema from CSV

1 Answers1