Questions tagged [hive-serde]

SerDe is short for Serializer/Deserializer, an interface used by Hive for both serialization and deserialization during IO and also interpreting the results of serialization as individual fields. A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats.

Official documentation page: SerDe

There are many SerDe bundled with Hive as well as third-party SerDe, such as:

LazySimpleSerDe
OpenCSVSerDe
RegexSerDe
JsonSerDe
AvroSerDe
ParquetHiveSerDe
OrcSerDe
MultiDelimitSerDe

164 questions

vote

1 answer

Optional fields when matching log file rows using regex

I'm trying to parse a web log with regular expressions using RegexSerDe. It works by matching each regex group with a column in a table and if the regex group is empty it assigns a null to that column. I'm having trouble matching log rows with…

asked Oct 28 '16 at 01:01

Mete Kural

votes

0 answers

Is there a way I can load accented letters to hive using openCSVSerde?

I am trying to load a table in Hive which has accented letters. I was initially using openCSVSerde to parse the CSV file and load it to the table. However, when I come to the accented letters, it prints a � in place of the accented letters. I have…

hadoop hive hue hive-serde lazysimpleserde

asked Jul 14 '23 at 19:46

thecuriouscat

votes

0 answers

Why "missing fields" warning was triggered by Hive Serde2 lazy binary struct?

I'm seeing this: Missing fields! Expected 8 fields but only got 8! Last field end 2876176 and serialize buffer end 2876175. in my Hive log. It clearly does not make sense "expect 8 got 8", so I checked the source code: // Missing fields? …

hive hive-serde

asked Feb 27 '23 at 02:28

dz902

4,782
38
41

votes

0 answers

Reading output produced by Pig in Hive properly

I have a pig script outputting a map[float] using PigStorage. When I try to read this output in hive, the square brackets surrounding the map are not read properly (or maybe an extra pair of brackets is added when reading it as a map in hive). The…

hive apache-pig hive-serde

asked Jan 23 '23 at 17:10

beygel

votes

0 answers

Hive table show all fields NULL after add new columns

I have a hive table with three columns, delimited by spaces hive (database)> describe formatted my_table; # col_name data_type comment field1 string field2 string field3 string ... ... Storage Desc Params: input.regex (\\S+)\\s+(\\S+)\\s+(.*) //…

regex hadoop hive hive-serde

asked Oct 12 '22 at 06:35

Nickswaggy

votes

0 answers

Alter table in hive is not working for serde 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' in Hive "Apache Hive (version 2.1.1-cdh6.3.4)"

Environment: Apache Hive (version 1.1.0-cdh5.14.2) I tried creating a table with below DDL. create external table test1 (v_src_code string,d_extraction_date date) partitioned by (d_mis_date date) row format serde…

hadoop hive alter-table hive-serde

asked Jun 23 '22 at 06:50

Satya Nayak

votes

1 answer

How to preserve case of json key inside glue table which use serde?

I have created a glue table which converts the the json to parquet files .In one of the column which is defined as Map having a nested json .I see the nested json key is getting converted to lowercase always irrespective of input…

apache-spark hive aws-glue hive-serde

asked Apr 18 '22 at 13:19

Rajesh Kumar Dash

2,203
6
28
57

votes

1 answer

How to Handle Multiline record in Hive table

Json File : { "buyer": { "legalBusinessName": "test1 Company","organisationIdentifications": [{ "type": "abcd", "identification": "test.bb@tesr" }, { "type": "TXID","identification": "12345678" } ] }, "supplier": { "legalBusinessName": "test…

json hive hiveql hive-serde

asked Apr 01 '22 at 11:12

Sonu

votes

1 answer

how to configure serde with different value data formats?

I have the following code: private Properties getStreamProperties(String suffix) { Properties streamsConfiguration = new Properties(); streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, groupId + "-" + suffix); …

serialization apache-kafka apache-kafka-streams hive-serde

asked Sep 10 '21 at 09:38

st123

votes

1 answer

Hive 3.x causing error for compressed (bz2) json in external table

I have some JSON data (about 60GB) that I have to load in Hive external table. I am using Hive 3.x with Hadoop 3.x. The schema of table is as follows: CREATE TABLE people(a string, liid string, link string, n string, t string, e string) ROW…

json hadoop hive hive-serde

asked Feb 16 '21 at 06:58

Hafiz Muhammad Shafiq

8,168
12
63
121

votes

1 answer

How to row format a column with tsv file with sqlite in athena

So I want to add these to a table in athena from a tsv file which I can do except for the last column genres. I mean I can add it but I want it to be like for example ["Comedy", "Mystery"] but it comes out as [Comedy,Mystery] which makes it…

sql amazon-web-services sqlite amazon-athena hive-serde

asked Feb 05 '21 at 18:56

saji

votes

0 answers

How to get a better performance to load ElasticSearch data into Hive?

We created the Hive external table using ElasticSearch StorageHandler as shown below: CREATE EXTERNAL TABLE DEFAULT.ES_TEST ( REG_DATE STRING , STR1 STRING , STR2 STRING , STR3 STRING , STR4 STRING , STR5 STRING ) ROW FORMAT…

performance elasticsearch hive hive-serde

asked Jan 22 '21 at 01:47

SeungCheol Han

votes

1 answer

AWS Glue crawler able to parse the struct definition but Athena fails to read correctly

So we have CSV files in a S3 bucket and when the AWS Glue crawler crawls through all the files its able to identify the schema of a struct field correctly as follows: struct and the CSV file contents are as…

aws-glue amazon-athena hive-serde

asked Nov 07 '20 at 06:41

Shubham

votes

1 answer

Presto - using serde on lists?

I have a JSON-file with contents like this: { "key1": [ "value1" ], "key2": [ { "key3": "value3", "key4": "value4 } ], "key5": "value5" } To create a serde-table in presto for this file (without "key1") I would…

arrays presto hive-serde

asked Oct 19 '20 at 11:40

Bjørn

votes

1 answer

Load pipe delimited CSV data having " (double quote) in one of the column in hive

csv hadoop hive apache-spark-sql hive-serde

asked Aug 17 '20 at 12:34

Meena

Prev 1 2 3

…

10 11 Next