0

oue datawarehouse is based on hive,now we need to transform data from hive to greenplum,we want to use external table with gphdfs,but it looks something goes wrong. the table creating script is

CREATE EXTERNAL TABLE flow.http_flow_data(like flow.zb_d_gsdwal21001)
LOCATION ('gphdfs://mdw:8081/user/hive/warehouse/flow.db/d_gsdwal21001/prov_id=018/day_id=22/month_id=201202/data.txt')
FORMAT 'TEXT' (DELIMITER '      ');

when we run

bitest=# select * from flow.http_flow_data limit 1;
ERROR:  external table http_flow_data command ended with error. sh: java: command not found  (seg12 slice1 sdw3:40000 pid=17778)
DETAIL:  Command: gphdfs://mdw:8081/user/hive/warehouse/flow.db/d_gsdwal21001/prov_id=018/day_id=22/month_id=201202/data.txt

our hadoop is 1.0 and greenplum is 4.1.2.1

I want to know if we need to config something about to make gp access hadoop

Levon
  • 138,105
  • 33
  • 200
  • 191
moxpeter
  • 1
  • 4

3 Answers3

0

Have you opened the port (8081) to listen for the month_id=201202 directory?

osuthorpe
  • 255
  • 4
  • 7
0

I would double check the admin guide, I think you can use gphdfs, but not until greenplum 4.2

del
  • 1
  • gpdhs was added to 4.1 but that is a very old version. I think the problem is the url says "mdw:8081". That should be the name node of the Hadoop cluster. mdw is typically the master host name for Greenplum. You also need to make sure the segment hosts can connect to the Hadoop data nodes. – Jon Roberts Jan 22 '16 at 20:54
0

have you checked to ensure that java is installed on your Greenplum system? as this is required in order for gphdfs to work.

bern
  • 366
  • 1
  • 10