the Title can be a little confused. Let me explain the problem. I have a pipeline that loads new record daily. This record contain sales. The key is <date, location, ticket, line>. This data are loaded into a redshift table and than are exposed through a view that is read by a system. This system have a limit, the column for the ticket is a varchar(10) but the ticket is a string of 30 char. If the system take only the first 10 character will generate duplicate. The ticket number can be a "fake" number. Doesn't matter if it isn't equal to the real number. So I'm thinking to add a new column on the redshift table that contain a progressive number. The problem is that I cannot use an identity column because the record belonging to the same ticket must have the same "progressive number". Then I will expose this new column (ticket_id) instead of the original one.
That is what I want:
day | location | ticket | line | amount | ticket_id |
---|---|---|---|---|---|
12/12/2020 | 67 | 123...GH | 1 | 10 | 1 |
12/12/2020 | 67 | 123...GH | 2 | 5 | 1 |
12/12/2020 | 67 | 123...GH | 3 | 23 | 1 |
12/12/2020 | 23 | 123...GB | 1 | 13 | 2 |
12/12/2020 | 23 | 123...GB | 2 | 45 | 2 |
... | ... | ... | ... | ... | ... |
12/12/2020 | 78 | 123...AG | 5 | 100 | 153 |
The next day when new data will be loaded I want start with the ticket_id 154 and so on.
Each row have a column which specify the instant in which it was inserted. Rows inserted the same day have the same insert_time.
My solution is:
- insert the record with ticket_id as a dense_rank. But each time (that I load new record, so each day) the ticket_id start by one, so...
- ... update the rows just inserted as ticket_id = ticket_id + the max number that I find under the ticket_id column where insert_time != max(insert_time)
Do you think that there is a better solution? It would be very nice if a hash function existed that take <day, location, ticket> as input and return a number of max 10 characters.