Understanding Distribution

Question

I have a couple of questions.

I would like to know if we need to worry about distribution in Netezza while using only select statements(not creating tables). I am basically trying to create a dataset in SAS by connecting to Netezza and selecting the view which has a couple of joins. I am wondering how will this affect performance of Netezza if i am creating the table directly in SAS.
I am creating a table by joining another two tables on customer_id. However, the output dataset does not consist of customer_id as a column. Can i distribute this table on customer_id?

Thanks.

score 0 · Accepted Answer · answered Oct 17 '14 at 18:14

For your first question, you typically don't need to worry about distribution if you aren't creating a table. It does help to understand distribution methods for the tables you are selecting from, but it's certainly not a requirement. Having a distribution method that supports the particular joins you are doing can certainly help performance during the select (e.g. if your join columns are superset of the distribution columns then you'll get co-located joins), but if the target of the output is SAS, then there's no effect on the write of the dataset to SAS.

For your second question, a table is distributed either on a column, or columns, in the table itself, or via a RANDOM (aka round robin) distribution method. In your case, if you are storing your data set in a table on Netezza, you could not distribute the data on customer_id as that column is not included in the data set.

Understanding Distribution

1 Answers1