Using HDP 2.4 and HAWQ 2.0
Wanted to read json data kept in HDFS path into HAWQ external table?
Followed below steps to add new json plugin into PXF and read data.
Download plugin "json-pxf-ext-3.0.1.0-1.jar" from https://bintray.com/big-data/maven/pxf-plugins/view#
Copy the plugin into path /usr/lib/pxf.
Create External table
CREATE EXTERNAL TABLE ext_json_mytestfile ( created_at TEXT, id_str TEXT, text TEXT, source TEXT, "user.id" INTEGER, "user.location" TEXT, "coordinates.type" TEXT, "coordinates.coordinates[0]" DOUBLE PRECISION, "coordinates.coordinates[1]" DOUBLE PRECISION) LOCATION ('pxf://localhost:51200/tmp/hawq_test.json' '?FRAGMENTER=org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter' '&ACCESSOR=org.apache.hawq.pxf.plugins.json.JsonAccessor' '&RESOLVER=org.apache.hawq.pxf.plugins.json.JsonResolver' '&ANALYZER=org.apache.hawq.pxf.plugins.hdfs.HdfsAnalyzer') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import') LOG ERRORS INTO err_json_mytestfile SEGMENT REJECT LIMIT 10 ROWS;
When execute the above DDL table create successfully. After that trying to execute select query
select * from ext_json_mytestfile;
But getting error: -
ERROR: remote component error (500) from 'localhost:51200': type Exception report message java.lang.ClassNotFoundException: org.apache.hawq.pxf.plugins.json.JsonAccessor description The server encountered an internal error that prevented it from fulfilling this request. exception javax.servlet.ServletException: java.lang.ClassNotFoundException: org.apache.hawq.pxf.plugins.json.JsonAccessor (libchurl.c:878) (seg4 sandbox.hortonworks.com:40000 pid=117710) (dispatcher.c:1801) DETAIL: External table ext_json_mytestfile
Any help would be much appreciated.