0

I am trying to create a table on a RHEL5 machine running GreenPlum (Postgres) server.

The table I am trying to create needs to have a "DISTRIBUTE BY" clause with a few of the columns as its predicate.

Without the "distribute by" the whole command works well creating the table, but when I add it I get the following syntax error:

ERROR: syntax error at or near "DISTRIBUTED"

I looked at this post (DISTRIBUTE BY notices in Greenplum) as well but could not find a solution.

The query structure is:

Create table (columns) DISTRIBUTED BY (columns);

Thanks!

Community
  • 1
  • 1
Ronnie
  • 1
  • 1
  • What is the exact version you're running? – Jakub Kania Oct 14 '15 at 08:16
  • @JosipIvicpostgres=# CREATE TABLE testdb (column1 int, column2 text) DISTRIBUTED BY (column1, column2); ERROR: syntax error at or near "DISTRIBUTED" LINE 1: CREATE TABLE testdb (column1 int, column2 text) DISTRIBUTED ... – Ronnie Oct 14 '15 at 08:26
  • @JakubKania According to the installation file it is: 4.3.6.1-build-2 – Ronnie Oct 14 '15 at 08:27
  • @JakubKania Even better: postgres=# select version(); version -------------------------------------------------------------------------------- ------------------------------ PostgreSQL 9.2.13 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.3 20140 911 (Red Hat 4.8.3-9), 64-bit (1 row) – Ronnie Oct 14 '15 at 08:28
  • 1
    @Ronnie It seems like you're using PostgreSQL not Greenplum. – Jakub Kania Oct 14 '15 at 08:51
  • @JakubKania Isn't GP just a layer on top of a PostgreSQL DB? I don't think there is a "select version" result that will provide a 'GreenPlum' version. I installed the instance using the Pivotal GreenPlum-DB-4.3.6.1 binary. – Ronnie Oct 14 '15 at 09:55
  • @Ronnie It may just be so, but still it's based on 8.2 not 9.2 – Jakub Kania Oct 14 '15 at 10:07
  • Please post the output of the following commands: "echo $MASTER_DATA_DIRECTORY" and "echo $PATH" – A. Scherbaum Oct 16 '15 at 16:25

2 Answers2

1

Here you're just connected to Postgres, not Greenplum, this is why you are getting this error.

When running psql, make sure you've specified the right host and the right port you want to connect to, like psql -h greenplum_master -p 5432.

When you run select version(); connected to Greenplum database, you'd see

[gpadmin@gpdb greenplum-db]$ psql
psql (8.2.15)
Type "help" for help.

test=# select version();
                                                                       version                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.2.15 (Greenplum Database 4.3.6.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Oct  1 2015 15:14:22
(1 row)

Which means you're running Greenplum. If you see only PostgreSQL, make sure Greenplum is started with running gpstate on the GPDB master host and make sure you're connecting to the right host and port

0x0FFF
  • 4,948
  • 3
  • 20
  • 26
0

When you specify the distributed clause, and there is a primary key in your table ,your distributed key should be part of the primary key and it should be left part of the primary key.

For example, your primary key is column1, column2 and column 3, then your distributed key should be column1 or column1&column2 or the same with your primary key.

buqing
  • 925
  • 8
  • 25
  • This is incorrect. You do not need a PK to specify the distribution key. You should also avoid PK indexes as it increases storage, slows data loading, and the query optimizer rarely will use the index. – Jon Roberts Dec 29 '15 at 19:32
  • Thanks Jon. I correct my answer. It doesn't need a primary key when you use distributed key. – buqing Dec 29 '15 at 19:48
  • Jon, according to 3NF, every table should have a primary key. – buqing Dec 29 '15 at 19:53
  • Robinson, go right ahead and create a PK if you want. It isn't required for Greenplum to have one on a table and using a hash distribution on a table doesn't require a PK. – Jon Roberts Dec 29 '15 at 20:12
  • You are right. PK is not mandatory for table definition, but in the real world, and for the best practice, every table should have a primary key. – buqing Dec 29 '15 at 21:25
  • Robinson, that is not correct with Greenplum. A PK is a burden which slows down the database. The PK is rarely used in queries especially with the new optimizer. We don't recommend using a PK on any table in Greenplum unless you have some compelling business requirement and are willing to take the performance penalty. This is true with any analytics database I've ever worked with too. In an OLTP database, use PK and FK constraints to make sure data quality but for analytics and in specifically in Greenplum, avoid using PK constraints. – Jon Roberts Dec 29 '15 at 23:10
  • Another point is we recommend using Append Optimized tables for better performance. These "AO" tables don't support PK constraints. – Jon Roberts Dec 29 '15 at 23:11