Questions tagged [hive-serde]

SerDe is short for Serializer/Deserializer, an interface used by Hive for both serialization and deserialization during IO and also interpreting the results of serialization as individual fields. A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats.

Official documentation page: SerDe

There are many SerDe bundled with Hive as well as third-party SerDe, such as:

  • LazySimpleSerDe
  • OpenCSVSerDe
  • RegexSerDe
  • JsonSerDe
  • AvroSerDe
  • ParquetHiveSerDe
  • OrcSerDe
  • MultiDelimitSerDe
164 questions
1
vote
1 answer

Optional fields when matching log file rows using regex

I'm trying to parse a web log with regular expressions using RegexSerDe. It works by matching each regex group with a column in a table and if the regex group is empty it assigns a null to that column. I'm having trouble matching log rows with…
Mete Kural
  • 11
  • 1
0
votes
0 answers

Is there a way I can load accented letters to hive using openCSVSerde?

I am trying to load a table in Hive which has accented letters. I was initially using openCSVSerde to parse the CSV file and load it to the table. However, when I come to the accented letters, it prints a � in place of the accented letters. I have…
thecuriouscat
  • 59
  • 2
  • 8
0
votes
0 answers

Why "missing fields" warning was triggered by Hive Serde2 lazy binary struct?

I'm seeing this: Missing fields! Expected 8 fields but only got 8! Last field end 2876176 and serialize buffer end 2876175. in my Hive log. It clearly does not make sense "expect 8 got 8", so I checked the source code: // Missing fields? …
dz902
  • 4,782
  • 38
  • 41
0
votes
0 answers

Reading output produced by Pig in Hive properly

I have a pig script outputting a map[float] using PigStorage. When I try to read this output in hive, the square brackets surrounding the map are not read properly (or maybe an extra pair of brackets is added when reading it as a map in hive). The…
beygel
  • 11
  • 3
0
votes
0 answers

Hive table show all fields NULL after add new columns

I have a hive table with three columns, delimited by spaces hive (database)> describe formatted my_table; # col_name data_type comment field1 string field2 string field3 string ... ... Storage Desc Params: input.regex (\\S+)\\s+(\\S+)\\s+(.*) //…
0
votes
0 answers

Alter table in hive is not working for serde 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' in Hive "Apache Hive (version 2.1.1-cdh6.3.4)"

Environment: Apache Hive (version 1.1.0-cdh5.14.2) I tried creating a table with below DDL. create external table test1 (v_src_code string,d_extraction_date date) partitioned by (d_mis_date date) row format serde…
0
votes
1 answer

How to preserve case of json key inside glue table which use serde?

I have created a glue table which converts the the json to parquet files .In one of the column which is defined as Map having a nested json .I see the nested json key is getting converted to lowercase always irrespective of input…
Rajesh Kumar Dash
  • 2,203
  • 6
  • 28
  • 57
0
votes
1 answer

How to Handle Multiline record in Hive table

Json File : { "buyer": { "legalBusinessName": "test1 Company","organisationIdentifications": [{ "type": "abcd", "identification": "test.bb@tesr" }, { "type": "TXID","identification": "12345678" } ] }, "supplier": { "legalBusinessName": "test…
Sonu
  • 77
  • 11
0
votes
1 answer

how to configure serde with different value data formats?

I have the following code: private Properties getStreamProperties(String suffix) { Properties streamsConfiguration = new Properties(); streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, groupId + "-" + suffix); …
0
votes
1 answer

Hive 3.x causing error for compressed (bz2) json in external table

I have some JSON data (about 60GB) that I have to load in Hive external table. I am using Hive 3.x with Hadoop 3.x. The schema of table is as follows: CREATE TABLE people(a string, liid string, link string, n string, t string, e string) ROW…
Hafiz Muhammad Shafiq
  • 8,168
  • 12
  • 63
  • 121
0
votes
1 answer

How to row format a column with tsv file with sqlite in athena

So I want to add these to a table in athena from a tsv file which I can do except for the last column genres. I mean I can add it but I want it to be like for example ["Comedy", "Mystery"] but it comes out as [Comedy,Mystery] which makes it…
0
votes
0 answers

How to get a better performance to load ElasticSearch data into Hive?

We created the Hive external table using ElasticSearch StorageHandler as shown below: CREATE EXTERNAL TABLE DEFAULT.ES_TEST ( REG_DATE STRING , STR1 STRING , STR2 STRING , STR3 STRING , STR4 STRING , STR5 STRING ) ROW FORMAT…
SeungCheol Han
  • 113
  • 1
  • 7
0
votes
1 answer

AWS Glue crawler able to parse the struct definition but Athena fails to read correctly

So we have CSV files in a S3 bucket and when the AWS Glue crawler crawls through all the files its able to identify the schema of a struct field correctly as follows: struct and the CSV file contents are as…
Shubham
  • 352
  • 3
  • 14
0
votes
1 answer

Presto - using serde on lists?

I have a JSON-file with contents like this: { "key1": [ "value1" ], "key2": [ { "key3": "value3", "key4": "value4 } ], "key5": "value5" } To create a serde-table in presto for this file (without "key1") I would…
Bjørn
  • 3
  • 2
0
votes
1 answer

Load pipe delimited CSV data having " (double quote) in one of the column in hive

I have data as below:- Rollno|Name|height|department 101|Aman|5"2|C.S.E Taking all the columns as string. When I am loading above data in hive I am getting extra quote at start and end as…
Meena
  • 11
  • 1