In case it matters for the following will use a Netezza backend + SPSS Modeler and/or Advanced Query Tool for the query itself. I have no access to the CLI. I'm trying to understand if cursors and walking through a sorted table is necessary to handle the following:
Imagine a table with 2 columns, the first is a non-unique ID and the second is a date. Any given ID may occur in the table multiple times with one or more dates.
My goal is to select from this table rows for which the dates are spaced out by no less than a fixed number of days, say 90. For example:
| ID | DATE |
===================
X 2014-01-01
X 2014-02-01
X 2014-07-01
Y 2014-02-01
Y 2014-06-01
Y 2014-07-01
In the above example, the rows I want to select for X would be Jan 1 and Jul 1 (exclude Feb 1 because it is less than 90 days from Jan 1) and the rows for Y would be Feb 1 and Jun 1 (exclude Jul 1 because it's within 90 days of a prior case.
In practice there could be well over 100M rows in the table. Is it possible to do this without cursors? What would the optimum method be?
Thanks in advance for any advice!
EDIT: Expanded the test table data here. SQL Fiddle
In the above edited example, the desired output would be
| ID | DATE |
===================
X 2014-01-01
X 2014-04-01
X 2014-10-01
Y 2014-01-15
Y 2014-04-15
Y 2014-10-15
Z 2014-01-01
Z 2014-04-01
Z 2014-10-01