Questions tagged [greenplum]

Greenplum is the worlds first open-source massively parallel processing database based on PostgreSQL.It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes.

Greenplum is a massively parallel processing database based on PostgreSQL and is designed for analytic data warehouses to manage, store and analyze terabytes to petabytes of data. Greenplum is developed by Pivotal.

797 questions
3
votes
1 answer

Spatial Join Query Optimization on Large Data Set

I have a use case where two sets of data are joined with an expensive spatial predicate. To parallelize the query I partitioned the spatial universe into tiles (in the order of thousands) such that only records belong to the same tile are need to be…
ablimit
  • 2,301
  • 6
  • 27
  • 41
3
votes
6 answers

Find the last weekday for a given month in PostgreSQL

Find the last weekday for a given month in PostgreSQL Usage: If month end falls on a Saturday or a Sunday, return the previous Friday, else use month end Examples: 3/31/2013 falls on a Sunday, so return 3/29/2013 11/30/2013 falls on a Saturday, so…
Jon Jaussi
  • 1,298
  • 3
  • 18
  • 36
3
votes
2 answers

Filter on window function according to hour

I want to use two different (yet similar) window functions to calculate two values SUM and COUNT on is_active over user_id+item, only up to the time of the row - minus 1 hour. My intuition was use ROWS UNBOUNDED PRECEDING but that way I can't filter…
gilibi
  • 343
  • 2
  • 9
  • 18
2
votes
0 answers

Greenplum Operator on kubernetes zapr error

I am trying to deploy Greenplum Operator on kubernetes and I get the following error: kubectl describe pod greenplum-operator-87d989b4d-ldft6: Name: greenplum-operator-87d989b4d-ldft6 Namespace: greenplum Priority: 0 Node: …
Oktay Alizada
  • 290
  • 1
  • 6
  • 19
2
votes
1 answer

count(*) and count(*) over () performance difference in greenplum/postgreSQL

I want to query the detail data and the total number of detail data. In general, this requires two sentences of SQL. For example, one is select col1, col2, col3 from tb limit 50 offset 0, another one is select count() from tb. In order to reduce the…
Ryon
  • 21
  • 2
2
votes
1 answer

Import file failed to greenplum because of one line of data on navicate

When importing a file into Greenplum,one lines fails,and the whole file is not imported successfully.Is there a way can skip the wrong line and import other data into Greenplum successfully? Here are my SQL execution and error messages: copy…
James
  • 21
  • 3
2
votes
1 answer

Greenplum-Spark-Connector java.util.NoSuchElementException: None.get

My work Envorinments like bellow: . Hadoop 2.7.2 . Spark 2.3.0 . Greenplum 6.8.1 <- I knew this version is latest. and I have to create dataframe(RDD) from GPDB table. so, I have knew a "Greenplum-spark-connector". An architecture sounds good. but…
coolsk
  • 21
  • 2
2
votes
0 answers

How to mark an airflow task failed if a warning comes?

I came across a problem in the airflow with the execution of the command 'vacuum table' on the Greenplum database, in situations where the airflow does not owner the table. If the vacuum is executed inside the PythonOperator, via cursor.execute…
2
votes
2 answers

Fastest way to see what unique dates are in a table's timestamp field?

I have a table with billions of rows. There are daily partitions on the "recorded" field, which is a "timestamp without time zone." I want to know which days are currently in the table. I know I could do something like: SELECT recorded::date FROM…
A Question Asker
  • 3,339
  • 7
  • 31
  • 39
2
votes
1 answer

How to decide number of segments per host/node in Greenplum

I am setting up prod cluster and want to choose number of segments per host/node, how to decide this? And what is the benefits to have multiple primary segments (excluding mirror) per node?
ankitbeohar90
  • 109
  • 13
2
votes
3 answers

PostgreSQL - Combining multiple rows with several attributes into one row?

I have a table like this: DATE ID ScoreA ScoreB ScoreC 20180101 001 91 92 25 20180101 002 81 82 35 20180101 003 71 52 45 20180102 001 82 15 66 20180102 …
Taurus Dang
  • 551
  • 1
  • 4
  • 19
2
votes
2 answers

Handling backslashes in plpython

CREATE OR REPLACE FUNCTION CLEAN_STRING(in_str varchar) returns varchar AS $$ def strip_slashes(in_str): while in_str.endswith("\\") or in_str.endswith("/"): in_str = in_str[:-1] in_str = in_str.replace("\\", "/") return…
Deepak K M
  • 521
  • 1
  • 5
  • 13
2
votes
1 answer

Insert values to new array column based on conditions in PostgreSQL

What i have id test_1 test_2 test_3 Indicator_column 1 651 40 0.4 {test_1,test_2,test_3} 1 625 80 0.6 {test_1,test_2,test_3} 1 510 60 0.78 {test_1,test_2,test_3} 1 710…
user8545255
  • 761
  • 3
  • 9
  • 21
2
votes
3 answers

How to preview functions code on pgAdmin 4?

When I would login into a postgresql database in pgAdmin3 and when I expand a schema I would see a tab for functions. When you click on the functions and select a function you can see the function's code.In pgAdmin 4 I don't see the tab for…
user2631587
  • 61
  • 1
  • 2
  • 3
2
votes
1 answer

How to get DB username in PL/Python function

I have a PL/Python function like below create function pl_python (database character) returns character varying as $BODY$ import subprocess import getpass import os return getpass.getuser() $BODY$ language plpythonu volatile I am executing…
1 2
3
53 54