0

We have small gpdb cluster. When i am trying to create and read on my first External table using gpfdist protocol.

Getting error at reading of external table : that is

prod=# select * from ext_table
prod-# ;
ERROR:  connection with gpfdist failed for gpfdist://gpmasterhost:8080/demo/gp_RevenueReport_stg0.txt. effective url: http://gpmasterhost:8080/demo/gp_RevenueReport_stg0.txt. error code = 111 (Connection refused)  (seg0 slice1 datanode2 40000 pid=5402)
prod=#

We tried DDL command for external table :

CREATE EXTERNAL TABLE ext_table
(
  "ID" bigint,
  "time" timestamp without time zone,
   )
 LOCATION (
    'gpfdist://gpmasterhost:8080/demo/gp_RevenueReport_stg0.txt'
)
 FORMAT 'TEXT' (delimiter ';' null '' escape '~' )
ENCODING 'UTF8';

Any help on it would be much appreciated !

NEO
  • 389
  • 8
  • 31

2 Answers2

1

You have to create the gpfdist process on "gpmasterhost" listening on port 8080 and serving files that include the directory demo which contains gp_RevenueReport_stg0.txt.

gpfdist -p 8080 -d path_to_demo & 
Jon Roberts
  • 2,068
  • 1
  • 9
  • 11
  • I ran command it says and still executing :[gpadmin@mdw ~]$ 2016-06-16 14:49:05 32581 INFO Before opening listening sockets - following listening sockets are available: 2016-06-16 14:49:05 32581 INFO IPV6 socket: [::]:8080 2016-06-16 14:49:05 32581 INFO IPV4 socket: 0.0.0.0:8080 2016-06-16 14:49:05 32581 INFO Trying to open listening socket: 2016-06-16 14:49:05 32581 INFO Opening listening socket succeeded 2016-06-16 14:49:05 32581 INFO Trying to open listening socket: 2016-06-16 14:49:05 32581 INFO IPV4 socket: 0.0.0.0:8080 Serving HTTP on port 8080, directory /home/gpadmin/demo – NEO Jun 16 '16 at 14:54
  • [root@mdw ~]# ps aux |grep gpfdist root 9417 0.0 0.0 103244 868 pts/1 R+ 14:57 0:00 grep gpfdist gpadmin 32581 0.0 0.0 27148 1692 pts/0 S 14:49 0:00 gpfdist -p 8080 -d /home/gpadmin/demo [root@mdw ~]# – NEO Jun 16 '16 at 14:58
  • when i am reading the file that is select * from ext_table getting this :ERROR: http response code 404 from gpfdist (gpfdist://gphostname:8080/demo/gp_RevenueReport_stg0.txt): HTTP/1.0 404 file not found (url.c:352) (seg0 slice1 datanode1:40000 pid=6185) (cdbdisp.c:1326) – NEO Jun 16 '16 at 15:14
  • The directory specified in the `-d` option will be the root of the gpfdist server. If you data dir is: `/my/data/dir/demo/gp_RevenueReport_stg0.txt` Then the gpfdist command line should be: `gpfdist -p 8080 -d /my/data/dir &` – Stephen Carter Jun 16 '16 at 15:19
  • got it . It means we need to only put /home/gpadmin and again i have to run this command again .gpfdist -p 8080 -d /home/gpadmin & . should i kill exsting gpfdist process which is running on dir /home/gpadmin/demo – NEO Jun 16 '16 at 15:26
  • prod=# select * from schema.ext_table ; ID | time ----+------ (0 rows) – NEO Jun 16 '16 at 15:33
  • Thanks alot to both of you. when i have to take load from hdfs into greenplum . this protocol will not work ? – NEO Jun 16 '16 at 15:34
  • When loading from hdfs, you have to use the "gphdfs" protocol instead of "gpfdist". You won't use gpfdist at all when reading from hdfs. – Jon Roberts Jun 16 '16 at 16:13
  • checked with gphdfs protocol and have created readable external file in gpdb.Thanks Jon. – NEO Jun 16 '16 at 17:45
1

As Jon had said, you will need to run gpfdist on your "gpmasterhost" system.

However, based on your notes gpfdist is running in your demo directory:

ps aux |grep gpfdist root 9417 0.0 0.0 103244 868 pts/1 R+ 14:57 0:00 grep gpfdist gpadmin 32581 0.0 0.0 27148 1692 pts/0 S 14:49 0:00 gpfdist -p 8080 -d /home/gpadmin/demo

So you will either need to change your EXTERNAL definition to (note I am not using the demo directory):

CREATE EXTERNAL TABLE ext_table ( "ID" bigint, "time" timestamp without time zone, ) LOCATION ( 'gpfdist://gpmasterhost:8080/gp_RevenueReport_stg0.txt' ) FORMAT 'TEXT' (delimiter ';' null '' escape '~' ) ENCODING 'UTF8';

Or run gpfdist up one level (/home/gpadmin) with no modification to your external table.

CanuckAFK
  • 101
  • 2
  • Also gpfdist is just lighthttp daemon, is my understanding, so you can test out with wget as well to ensure you can connect and pull the file(s) you want outside of greenplum. – CanuckAFK Jun 16 '16 at 15:20
  • Thanks i am going to use /home/gpadmin up level . – NEO Jun 16 '16 at 15:27