0

I create an external table with HIVE (Hive 2.1.1-mapr-1703) from an XML file with XML SerDe. The file is the XML example from the W3C consortium.

This is my code to create the table:

add jar /mapr/localpath/hivexmlserde-1.0.5.3.jar;
USE my_db;
CREATE EXTERNAL TABLE frank_books (
category STRING,
title STRING,
language STRING,
year BIGINT
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.category" = "/book/@category",
"column.xpath.title"    = "/book/title/text()",
"column.xpath.language" = "/book/title/@lang",
"column.xpath.year"     = "/book/year/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/mapr/localpath/database_files/xml_example'
TBLPROPERTIES (
"xmlinput.start" = "<book category",
"xmlinput.stop" = "</book>"
)

The table itself exists because the describe statement does not lead to an error:

describe frank_books;

A simple select statement like the following leads to a NullPointerException:

select * from my_db.frank_books;

This is the output:

OK
Failed with exception java.io.IOException:java.lang.NullPointerException
Time taken: 1.117 seconds

Can anyone help and please explain the error to me?

Thanks, Frank

dtolnay
  • 9,621
  • 5
  • 41
  • 62
Frank
  • 1,315
  • 1
  • 10
  • 14

1 Answers1

0

Could be something MapR specific?

hive> DROP TABLE IF EXISTS xml_45158949;
OK
Time taken: 0.977 seconds
hive> 
    > CREATE  TABLE xml_45158949(
    > category STRING,
    > title STRING,
    > language STRING,
    > year BIGINT
    > )
    > ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
    > WITH SERDEPROPERTIES(
    > "column.xpath.category" = "/book/@category",
    > "column.xpath.title"    = "/book/title/text()",
    > "column.xpath.language" = "/book/title/@lang",
    > "column.xpath.year"     = "/book/year/text()"
    >   )
    > STORED AS 
    > INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
    > OUTPUTFORMAT    'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' 
    > TBLPROPERTIES (
    > "xmlinput.start"="<book category",
    > "xmlinput.end"="</book>"
    > );
 OK
 Time taken: 0.243 seconds
 hive> 
  > load data local inpath '/Users/dvasilen/Misc/XML/45158949.xml'        OVERWRITE into table xml_45158949;
 Loading data to table default.xml_45158949
 OK
 Time taken: 0.153 seconds
 hive> 
  > select * from xml_45158949;
  OK
 cooking     Everyday Italian   en  2005
 children   Harry Potter    en  2005
 web     XQuery Kick Start  en  2003
 web     Learning XML   en  2003
 Time taken: 0.08 seconds, Fetched: 4 row(s)
 hive> 

Seems to work for me.

dvasilen
  • 21
  • 3
  • My original example described the creation of an **external table**. I also reproduced the creation of an internal table described by _dvasilen_. I get the same error although _dvasilien_ writes that it works in his setup. This means that it could something MapR specific or has something to do with HIVE2 instead of HIVE1.x. Does someone have experiences with **XML-serde together with HIVE2** ? – Frank Jul 19 '17 at 14:39
  • FWIW, it was Hive 2 in my setup... $ hive --version Hive 2.1.1 – dvasilen Sep 09 '17 at 22:36