0

While loading the file from mainframe into Hadoop in ORC format,some of the data loaded with Single Quotes(') and remaining with Double quotes(").But the complete source file is in Single Quote ('). To specify custom delimiters used Hive Cobol Serde.

Example:

Source data:

First_Name Last_name Address

Rev 'Har' O'Amy 4031 'B' Ave

Loaded into Hadoop as,some data with correct format(') and some with double quotes(") as below:

First_Name Last_name Address

Rev "Har" O"Amy 4031 "B" Ave

what could be the issue and how to solve this?

dtolnay
  • 9,621
  • 5
  • 41
  • 62
Revathi
  • 31
  • 2
  • 6

1 Answers1

0

one possible issue might be delimiter given while your table creation

so try ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’ WITH SERDEPROPERTIES(“serialization.encoding”=’UTF-8′); while creating hive table and then load the data.

also try using udf given in this link to remove all special characters if you want your data clean https://github.com/ogrodnek/csv-serde

Rijul
  • 1,418
  • 11
  • 21