0

I'd like to save 3 letter codes to SQL database, and would like to use them later to execute many queries against that codes (probably 3-4 combinations of these codes to identify some a list of records).

I think if I use String representation for this could be slow as I will be having >100.000 entries.

I'm now thinking if I could just represent these 3 letter codes (english alphabet) with a unique digit, and just query for this digit?

Therefore I could transform each character to ascii and map A=01, Z=26 (if I subtract 64 from the ascii value). Thus AAA=010101, ZZZ=262626.

Would this make sense? Could I do better?

membersound
  • 81,582
  • 193
  • 585
  • 1,120
  • How would you differentiate between `AAK`, `AKA`, `KAA` for instance? All would get _translated_ to `1111`. – devnull Jan 14 '14 at 10:45
  • 1
    100k is nothing in database terms. I think you're optimizing too early. I'd go for readability and optimize only when really needed. – NeplatnyUdaj Jan 14 '14 at 10:45
  • "better" in what sense? – Maroun Jan 14 '14 at 10:46
  • @NeplatnyUdaj I could get as will factor 10 to 1million lateron. – membersound Jan 14 '14 at 10:46
  • @devnull good point, I'd probably have to map each char to a 2digit value 01-26 – membersound Jan 14 '14 at 10:47
  • 6
    Indexing by integer is just marginally faster than by string. This is a bad case of premature optimization, which may even fail to be optimization at all. – Marko Topolnik Jan 14 '14 at 10:48
  • From the question it is not obvious if you want to search for the whole codes or for parts of it. If it will be whole codes, then I doubt there will be performence difference on integer/varchar indexed column. If every letter is a standalone identifier, then it should be split to three columns. – NeplatnyUdaj Jan 14 '14 at 10:50
  • no I'm only interested in looking up the whole 3 letter code, not part of it – membersound Jan 14 '14 at 10:54
  • 1
    If you really had to do this (please don't) you'd convert by taking the ascii offset from 'a', then up-shifting each value in base-26 and summing the place values. E.g. `SELECT sum((ascii(substring(lower('abc') from x for 1)) - ascii('a')) * (26^x)) FROM generate_series(1, length('abc')) x;` . Do not actually do this, I wrote it mostly to show just how ugly it will be to do what you propose. – Craig Ringer Jan 15 '14 at 03:25

2 Answers2

3

The database will hash and quickly compare the String values, using indexes designed for the process.

Implement it the clean way first and only start to worry if it is too slow. The database will do a good job of accelerating this using thoroughly optimized and tested routines.

Tim B
  • 40,716
  • 16
  • 83
  • 128
3

Add an index to the field with the 3 letter code:

CREATE INDEX index_name ON table_name ( column_name_3_letter_code ) ;

See How does database indexing work? for information about DB indexes.

Community
  • 1
  • 1