0

My data is in this format

[{"field1":"data1","field2":100,"field3":"more data1","field4":123.001}]

[{"field1":"data2","field2":200,"field3":"more data2","field4":123.002}]

[{"field1":"data3","field2":300,"field3":"more data3","field4":123.003}]

[{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}]

(each line is an array with only one object) and I want to create a hive table around this.

If there was no [] around the json then I could have easily used default json serde ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'.

The problem with regex serde is the order of the fields which can change and is really hard to extract exact values.

How can I create a hive table with such data format?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Abhijeet Ahuja
  • 5,596
  • 5
  • 42
  • 50

1 Answers1

0

You should be able to use an ARRAY<STRUCT

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-ComplexTypes

I would only suggest using regex if there is always one JSON object in each row

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245