1

Using HDP 2.4 and HAWQ 2.0

Wanted to read json data kept in HDFS path into HAWQ external table?

Followed below steps to add new json plugin into PXF and read data.

  1. Download plugin "json-pxf-ext-3.0.1.0-1.jar" from https://bintray.com/big-data/maven/pxf-plugins/view#

  2. Copy the plugin into path /usr/lib/pxf.

  3. Create External table

    CREATE EXTERNAL TABLE ext_json_mytestfile ( created_at TEXT, id_str TEXT, text TEXT, source TEXT, "user.id" INTEGER, "user.location" TEXT, "coordinates.type" TEXT, "coordinates.coordinates[0]" DOUBLE PRECISION, "coordinates.coordinates[1]" DOUBLE PRECISION) LOCATION ('pxf://localhost:51200/tmp/hawq_test.json' '?FRAGMENTER=org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter' '&ACCESSOR=org.apache.hawq.pxf.plugins.json.JsonAccessor' '&RESOLVER=org.apache.hawq.pxf.plugins.json.JsonResolver' '&ANALYZER=org.apache.hawq.pxf.plugins.hdfs.HdfsAnalyzer') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import') LOG ERRORS INTO err_json_mytestfile SEGMENT REJECT LIMIT 10 ROWS;

When execute the above DDL table create successfully. After that trying to execute select query

select * from ext_json_mytestfile;

But getting error: -

ERROR: remote component error (500) from 'localhost:51200': type Exception report message java.lang.ClassNotFoundException: org.apache.hawq.pxf.plugins.json.JsonAccessor description The server encountered an internal error that prevented it from fulfilling this request. exception javax.servlet.ServletException: java.lang.ClassNotFoundException: org.apache.hawq.pxf.plugins.json.JsonAccessor (libchurl.c:878) (seg4 sandbox.hortonworks.com:40000 pid=117710) (dispatcher.c:1801) DETAIL: External table ext_json_mytestfile

Any help would be much appreciated.

Pra
  • 31
  • 6

5 Answers5

1

It seems that referenced jar file has old package name as com.pivotal.*. The JSON PXF extension is still incubating, the jar pxf-json-3.0.0.jar is built for JDK 1.7 as Single node HDB VM is using JDK 1.7 and uploaded to dropbox.

https://www.dropbox.com/s/9ljnv7jiin866mp/pxf-json-3.0.0.jar?dl=0

Echo'ing the details of the above comments so that the steps are performed correctly to ensure the PXF service recognize the jar file. The below steps assume that Hawq/HDB is managed by Ambari. If not, the manual steps as mentioned by the previous updates should work.

  1. Copy the pxf-json-3.0.0.jar to /usr/lib/pxf/ of all your HAWQ nodes (master and segments).

  2. In Ambari managed PXF, add the below line by going through Ambari Admin -> PXF -> Advanced pxf-public-classpath

/usr/lib/pxf/pxf-json-3.0.0.jar

  1. In Ambari managed PXF, add this snippet to your pxf profile xml at the end by going through Ambari Admin -> PXF -> Advanced pxf-profiles

<profile>
  <name>Json</name>
  <description>
    JSON Accessor</description>
  <plugins>
    <fragmenter>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
    <accessor>org.apache.hawq.pxf.plugins.json.JsonAccessor</accessor>
    <resolver>org.apache.hawq.pxf.plugins.json.JsonResolver</resolver>
  </plugins>
</profile>
  1. Restart PXF service via Ambari
Niranjan Sarvi
  • 899
  • 7
  • 9
0

Did you add the jar file location to /etc//conf/pxf-public.classpath?

Sung Yu-wei
  • 161
  • 8
0

Did you try:

  • copying PXF JSON jar file to /usr/lib/pxf
  • updating /etc/pxf/conf/pxf-profiles.xml to include the Json plug-in profile if not already present
  • (per comment above) updating the /etc/pxf/conf/pxf-public.classpath
  • restarting the PXF service either via Ambari or command line (sudo service pxf-service restart)
0

likely didn't add json jar in classpath. Create External Table DDL will always succeed as it was just a definition. Only when you run queries, HAWQ will check the run time jar dependencies.

Goden Yao
  • 66
  • 3
0

Yes, the jar json-pxf-ext-3.0.1.0-1.jar" from https://bintray.com/big-data/maven/pxf-plugins/view# has old package name as com.pivotal.*. The previous update has edited with details to download the correct jar from dropbox

Niranjan Sarvi
  • 899
  • 7
  • 9