22

Is there any way to convert the Hive query result in JSON format?

divinedragon
  • 5,105
  • 13
  • 50
  • 97
  • 1
    http://code.google.com/p/hive-json-serde/ – Steve Severance Apr 03 '12 at 15:44
  • 1
    Seems like I am not the only one looking for this solution. Thanks for sharing the project. Will see how it can be useful. – divinedragon Apr 05 '12 at 14:24
  • @Steve I checked out the hive-json-serde. It seems like it is for reading and processing the JSON data. I was looking for something in which my data is tab delimited, but when I query them from the hive, the query output is in the json format. – divinedragon Apr 06 '12 at 10:13
  • 1
    Hi divinedragon, have you got a solution already? I am looking for a solution as well, can you share if you got one plz, thanks! – Ming Jul 11 '12 at 10:55
  • Actually I have suspended that thing due to some other priority works. Will start looking onto that. – divinedragon Jul 11 '12 at 17:45

7 Answers7

13

This seems to come up quite often. Use the to_json UDFs' from Brickhouse (http://github.com/klout/brickhouse ). If you convert your results to a named_struct, it will interpret it as a JSON map, and output accordingly.

SELECT to_json( named_struct( "field1", field1 ,
            "field2", field2,
            "field3", field3 ) )
   FROM mytable;

The to_json will also interpret arrays and maps accordingly.

Jerome Banks
  • 1,620
  • 11
  • 15
  • 1
    To register the function you can use ADD jar /filepath/brickhouse-0.7.1-SNAPSHOT.jar; CREATE FUNCTION to_json AS 'brickhouse.udf.json.ToJsonUDF'; – 1EnemyLeft Nov 13 '18 at 20:14
  • it generates individual json objects properly, but when I group_by something and I use collect function, it crashes!: collect(to_json(named_struct("locale", cap))) any idea? – Reihan_amn Feb 19 '20 at 00:33
1

I was using a tool called Apache Nifi. It has AvrotoJSON processor. The hive output which is in Avro format can be easily converted to JSON. The below link will be helpful: https://nifi.apache.org/

ForeverLearner
  • 1,901
  • 2
  • 28
  • 51
0

My experience will be use jackson library(http://jackson.codehaus.org/), you create a POJO to map the json format. So once you get the ResultSet from your hive query, you iterate through it and create objects of the POJO using Jackson.

/**--JACKSON Class--**/
public class Item {
    @JsonProperty
    private String att1;
    @JsonProperty
    private String att2;
    public Item(String att1, String att2){
        this.att1 = att1;
        this.att2 = att2;
    }

}

/**--the class where u run your query--**/
List<Item> list = new ArrayList<Item>();
ResultSet rs = executeQuery(queryStr); // do your hive query here
while(rs.next){
    String att1 = rs.get("att1");
    String att2 = rs.get("att2");
    Item item = new Item(att1, att2);
    list.add(item);
}

then you can return a List of Item as the result, Jackson allows you to write it in json format very easily.

  1. create a ObjectMapper ObjectMapper mapper = new ObjectMapper(); The mapper gives you a lot options to write the object in json format to different target. eg. outputstream, bytebuffer etc.

  2. iterate through the list.

  3. use mapper to write the value in json format, eg. mapper.writeValue(out, Item).In this example, out is a OutputStream.

Shengjie
  • 12,336
  • 29
  • 98
  • 139
0

Easiest way would be to first store as tsv and then use libraries like spray to convert to JSON.

Hive works fine as long as you don't try to customize too much. If you do have access to Spark in your cluster, use SparkSQL instead of Hive. The results of your query will be stored into a SchemaRDD and from there you can easily map to a JSon doing .map(_.toJson), assuming you have spray imported in your scope.

Gianmario Spacagna
  • 1,270
  • 14
  • 12
0

list of json objects:

If you want to covent the output to json format and then making a collection out of it meaning: [{json1},{json2},{json3}]... you need to know that collect() function treat it as string which is wrong.

so what is needed for having a collection of json-formatted out put is wrapping them with from-json function too:

create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
create temporary function to_json as 'brickhouse.udf.json.ToJsonUDF';
create temporary function from_json as 'brickhouse.udf.json.FromJsonUDF';
collect(from_json(to_json(named_struct("locale", locale, "createdtime",  created_time)), 'map<string,string>')) as list_json_object,
Reihan_amn
  • 2,645
  • 2
  • 21
  • 21
  • Is it possible to use to_json without knowing the column names? i want to include all column names in JSON and don't know the column names in advance – Bosco Feb 13 '21 at 00:56
0

Top answer works. However I'm a bit too lazy to explicitly convert to named_struct with all the column names. Also brickhouse UDF crashes when I use create table as select. To solve this issue, I developed a similar UDF which can support select to_json(*) from tbl;

Usage:

ADD JAR ivy://org.jsonex:HiveUDF:0.1.22?transitive=true;
CREATE TEMPORARY FUNCTION to_json AS 'org.jsonex.hiveudf.ToJsonUDF';
SELECT to_json(*) FROM tbl
Jianwu Chen
  • 5,336
  • 3
  • 30
  • 35
0

'SELECT to_json' will work, but all the columns should be listed