Setup:
I need to store feature vectors associated with string-string pairs. The string-string pairs encode an input-output relationship. There will be a relatively small number of inputs X
(e.g. 5), and for each input x
, there will be a relatively small number outputs Y|x
(e.g. 10).
The question is, what data structure is fastest?
Additional relevant information:
- The outputs are generally different for each input, and it cannot be assumed that each
X
has the same number of outputs. - Lookup will be done "many" times (perhaps 1000).
- Inputs will be sampled equally frequently, but for each input, usually one or 2 outputs will be accessed frequently, and the remainder will be accessed infrequently or not at all.
At present, I am considering three possibilities:
- list-of-lists: access outer list with index (representing input
X[i]
), access inner list with index (representing outputY[i][j]
). - hash-of-hashes: same as above.
- flat hash:
key = (input,output)
.