18

I have many json arrays stored in a table (jt) that looks like this:

[{"ts":1403781896,"id":14,"log":"show"},{"ts":1403781896,"id":14,"log":"start"}]
[{"ts":1403781911,"id":14,"log":"press"},{"ts":1403781911,"id":14,"log":"press"}]

Each array is a record.

I would like to parse this table in order to get a new table (logs) with 3 fields: ts, id, log. I tried to use the get_json_object method, but it seems that method is not compatible with json arrays because I only get null values.

This is the code I have tested:

CREATE TABLE logs AS 
SELECT get_json_object(jt.value, '$.ts') AS ts, 
get_json_object(jt.value, '$.id') AS id,
get_json_object(jt.value, '$.log') AS log
FROM jt;

I tried to use other functions but they seem really complicated. Thank you! :)

Update! I solved my issue by performing a regexp:

CREATE TABLE jt_reg AS
select regexp_replace(regexp_replace(value,'\\}\\,\\{','\\}\\\n\\{'),'\\[|\\]','') as valuereg  from jt;


CREATE TABLE logs AS 
SELECT get_json_object(jt_reg.valuereg, '$.ts') AS ts, 
get_json_object(jt_reg.valuereg, '$.id') AS id,
get_json_object(jt_reg.valuereg, '$.log') AS log
FROM ams_json_reg;
Nathan Hanna
  • 4,643
  • 3
  • 28
  • 32
marlieg
  • 449
  • 2
  • 5
  • 16

3 Answers3

11

I just ran into this problem, with the JSON array stored as a string in the hive table.

The solution is a bit hacky and ugly, but it works and doesn't require serdes or external UDFs

SELECT 
       get_json_object(single_json_table.single_json, '$.ts') AS ts,
       get_json_object(single_json_table.single_json, '$.id') AS id,
       get_json_object(single_json_table.single_json, '$.log') AS log
FROM ( SELECT explode (
  split(regexp_replace(substr(json_array_col, 2, length(json_array_col)-2),
            '"}","', '"}",,,,"'), ',,,,')
      ) FROM src_table) single_json_table;

I broke the lines up so that it would be a little easier to read. I'm using substr() to strip the first and last characters, removing [ and ] . I'm then using regex_replace to match the separator between records in the json array and adding or changing the separator to be something unique that can then be used easily with split() to turn the string into a hive array of json objects which can then be used with explode() as described in the previous solution.

Note, the separator regex used here ( "}"," ) wouldn't work with the original data set...the regex would have to be ( "},\{" ) and the replacement would then need to be "},,,,{" eg..

  split(regexp_replace(substr(json_array_col, 2, length(json_array_col)-2),
            '"},\\{"', '"},,,,{"'), ',,,,')
ryan
  • 111
  • 1
  • 3
5

Use explode() function

 hive (default)> CREATE TABLE logs AS
                  >   SELECT get_json_object(single_json_table.single_json, '$.ts') AS ts,
                  >   get_json_object(single_json_table.single_json, '$.id') AS id,
                  >   get_json_object(single_json_table.single_json, '$.log') AS log
                  >   FROM
                  >     (SELECT explode(json_array_col) as single_json FROM jt) single_json_table ;

Automatically selecting local only mode for query
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator

hive (default)> select * from logs;
OK
ts      id      log
1403781896      14      show
1403781896      14      start
1403781911      14      press
1403781911      14      press
Time taken: 0.118 seconds, Fetched: 4 row(s)
hive (default)>

where json_array_col is column in jt which holds your array of jsons.

hive (default)> select json_array_col from jt;
json_array_col
["{"ts":1403781896,"id":14,"log":"show"}","{"ts":1403781896,"id":14,"log":"start"}"]
["{"ts":1403781911,"id":14,"log":"press"}","{"ts":1403781911,"id":14,"log":"press"}"]
vijay kumar
  • 2,049
  • 1
  • 15
  • 18
  • 1
    Thank you for your reply! It doesn't work for me. I get this error: Your query has the following error(s): Error while compiling statement: FAILED: UDFArgumentException explode() takes an array or a map as a parameter. Instead of "json_array_col" I have put "value". – marlieg Jun 30 '14 at 09:19
  • what is the datatype of value . can u paste the following: desc jt; – vijay kumar Jun 30 '14 at 09:25
  • col_name:value,"data_type":"string" – marlieg Jun 30 '14 at 10:48
  • as you mentioned in your question you are storing arrays of json , i gave solution with explode(), explode() takes Arrays/Map as arg that's y its giving err – vijay kumar Jun 30 '14 at 11:06
  • if your value type is string , then u have to write UDF to convert comma separated json strings to arrays and use explode. the simplest way is while storing the data into hive table make your value data type as Array – vijay kumar Jun 30 '14 at 11:13
4

because get_json_object doesn't support json array string, so you can concat to a json object, like this:

SELECT 
    get_json_object(concat(concat('{"root":', jt.value), '}'), '$.root')
FROM jt;