0

I have a csv with the current structure:

  • name, path, date
  • aurora, Chicago, 20200130
  • mark, "Syracuse, 2365", 2020131

The result table in Glue looks like:

  • name, path, date
  • aurora, Chicago, 20200130
  • mark, Syracuse, 2365

I tried to build a classifier for CSV and add it to the Crawler but, as quotes are present only in some rows, it doesn't help Glue to find the right schema, that would be:

  • name, path, date
  • aurora | Chicago | 20200130
  • mark | Syracuse, 2365 | 2020131

Any idea?

  • 3
    Does this answer your question? [AWS Glue issue with double quote and commas](https://stackoverflow.com/questions/50354123/aws-glue-issue-with-double-quote-and-commas) – Shailesh May 14 '20 at 17:58

1 Answers1

1

You should use the OpenCSV SerDe for this.

Your CREATE TABLE query will look like this:

CREATE EXTERNAL TABLE IF NOT EXISTS testtimestamp1(
 `profile_id` string,
 `creationdate` date,
 `creationdatetime` timestamp
 )
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
 LOCATION 's3://<location>'

Create the table in Athena using the above query (instead of using Glue)

Once the table is created, use MSCK REPAIR TABLE <table_name>, to actually load the partitions.

Shailesh
  • 2,116
  • 4
  • 28
  • 48