In this answer: https://stackoverflow.com/a/28039556/1473460 Pentium10 suggests ordering by date to fetch the newest user row. Are there performance implications for this versus using an increasing version number with each user update?
Asked
Active
Viewed 553 times
0
-
What would be the technical difference between ordering by time vs ordering by version number? There are alternative ways to get the same results, but this question seems to focus on time vs id? – Felipe Hoffa Jan 27 '15 at 22:13
-
The difference would be a timestamp type for the column versus an int type. Would that impact query performance? I suppose I could also use a epoch timestamp and store that as an int too. I suppose I should have broadened my question and asked for the best practices around streaming new rows to BQ and the best way to select the newest row (for a given user). – Andrew Jan 27 '15 at 22:28
-
@FelipeHoffa I'd love to hear alternative ways to get the same results. Maybe edit/append on that answer, or a self answering wiki post. – Pentium10 Jan 28 '15 at 07:24
-
@Pentium10 I was thinking of a SELECT record FROM [table] a JOIN (SELECT MAX(ts) ts, id FROM [table] GROUP EACH BY id) b ON a.id=b.id AND a.ts=b.ts. Might be more scalable since OVER() has its own limitations. (btw, thanks for all your answers around here - I learn a lot from them!) – Felipe Hoffa Jan 29 '15 at 00:33
-
@FelipeHoffa - the PARTITION BY user_id query that Pentium10 wrote will scale better than self JOIN query, and it will perform better too, because it will cause only single scan over data. I would use FIRST(email) instead of ROW_NUMBER() and then filtering on seqnum = 1, for slightly more efficient execution, but that's a small difference. – Mosha Pasumansky Jan 29 '15 at 05:57
-
@MoshaPasumansky yes! Unless there are too many elements for one PARTITION to handle – Felipe Hoffa Jan 29 '15 at 21:28
1 Answers
0
Timestamps are stored internally as decimal numbers, so, they are operated the same way as an increasing version number. The only performance implication is the conversion between the internal decimal number and the formatted string representing it at presentation layer. See "Data types" section at link [1] for more details.
[1] https://cloud.google.com/bigquery/preparing-data-for-bigquery

Antxon
- 1,855
- 15
- 21