0

I am working with HBase DB through Phoenix, and I need to compare two Strings of epoch timestamp. the method to_number("ts") takes too long to compute for big amount of data.

What is the best practice for comparing two string of epoch timestamps without converting them into number or any other conversion.

Using string comparison works well for number of the same length e.g 456 < 654. But for number of different length it might fail, e.g 456 < 65.

I considered planting zeroes before each number in the insertion progress so that all strings will be of the same length, but there must be a better way to do it.

Yuval
  • 371
  • 4
  • 13

1 Answers1

1

Strings are compared lexicographically, you'll have to rely either on TO_NUMBER() at query time or LPAD() at insert/query time, lpadding numerical strings is a common practice, specially in Hadoop or HBase (for row keys).

Anyway, you should consider having that timestamp stored as a 4B integer or a 8B long instead of as a +10B string. You will be able to compare them directly and you will also save space, specially if it's part of your rowkey (in hbase the rowkey goes on every stored cell).

  • For a standard posix timestamp (in seconds) you can store it as an integer, or if you want to be ready for 2038, go for a long http://en.wikipedia.org/wiki/Year_2038_problem :)
  • For a timestamp in milliseconds you need to store them as a long.
Rubén Moraleda
  • 3,017
  • 1
  • 18
  • 20
  • Thanks, can you please explain why I should change the timestamp field to 4B integer or 8B long? which is preferred for epoch representation? – Yuval Jan 11 '15 at 21:03