0

I am trying to create an external table in Greenplum database on an Amazon ec2-cluster. My source file is parquet and stored in s3. My question is:

What protocol should I use to read the data from the parquet file?

If I use "s3://" with file format "Parquet" as below:

CREATE EXTERNAL TABLE rp2 (id text, fname text, lname text, mname text) LOCATION ('s3://location.parquet config=./s3/s3.config')

I get the following error:

ERROR:  unexpected end of file  (seg0 slice1 IP:port pid=xxx)

If I go for gphdfs:// protocol as :

CREATE EXTERNAL TABLE rp2 (id text, fname text, lname text, mname text) LOCATION ('gphdfs:location.parquet config=./s3/s3.config') FORMAT 'PARQUET';

I get the following error:

ERROR:  external table gphdfs protocol command ended with error. Exception in thread "main" java.lang.IllegalArgumentException: Illegal input uri: gphdfs://locs.parquet config=./s3/s3.config  (seg0 slice1 IP:Port pid=pid)

Any help in this regard will be highly appreciated.

mas
  • 145
  • 8
  • We are working on this too...I consider using pxf, like ````pxf://S3_BUCKET/pxf_examples/my_file?PROFILE=s3:parquet&SERVER=s3srvcfg```` from here https://gpdb.docs.pivotal.io/5170/pxf/access_objstore.html . Because s3 protocol do not support parquet format – Clxy Mar 04 '19 at 08:53

1 Answers1

1

You can read parquet file on S3 by using PXF

Example:

CREATE EXTERNAL TABLE pxf_ext_tbl(name text, orders int)
  LOCATION ('pxf://S3_BUCKET/dir/file.parquet?PROFILE=s3:parquet&SERVER=s3srvcfg)
 FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');