1

by Random Access i do not mean selecting a random record,
Random Access is the ability to fetch all records in equal time,
the same way values are fetched from an array.
From wikipedia: http://en.wikipedia.org/wiki/Random_access

my intention is to store a very large array of strings, one that is too big for memory.
but still have the benefit or random-access to the array.

I usally use MySQL but it seems it has only B-Tree and Hash index types.

I don't see a reason why it isn't possible to implement such a thing.
The indexes will be like in array, starting from zero and incrementing by 1.

I want to simply fetch a string by its index, not get the index according to the string. The goal is to improve performance. I also cannot control the order in which the strings will be accessed, it'll be a remote DB server which will constantly receive indexes from clients and return the string for that index.

Is there a solution for this?

p.s I don't thing this is a duplicate of Random-access container that does not fit in memory?
Because in that question he has other demands except random access

Community
  • 1
  • 1
fiftyeight
  • 393
  • 2
  • 15
  • You are doing it wrong, if you are storing strings that are too long to fit into memory in a database. Incredibly wrong. – Layke Aug 22 '11 at 15:55
  • are you wanting to be able to select a record from the database randomly? like how it may be done to display a random testimonial? – Ben Aug 22 '11 at 15:56
  • How exactly does providing a hash index not allow for random access? Also btrees allow for random access, it's just a question of performance. So what exactly is your question? – Janick Bernet Aug 22 '11 at 16:03
  • @inflagranti From what I know random access means getting the address of the string directly from its index, maybe the Hash index type does exactly that when you have a unique incrementing index, I'll be glad if you can confim that. As for B-Tree I think it's like binary search. – fiftyeight Aug 22 '11 at 16:09

2 Answers2

3

Given your definition, if you just use an SSD for storing your data, it will allow for what you call random access (i.e. uniform access speed across the data set). The fact that sequential access is less expensive than random one comes from the fact that sequential access to disk is much faster than random one (and any database tries it's best to make up for this, btw).

That said, even RAM access is not uniform as sequential access is faster due to caching and NUMA. So uniform access is an illusion anyway, which begs the question, why you are so insisting of having it in the first place. I.e. what you think will go wrong when having slow random access - it might be still fast enough for your use case.

Janick Bernet
  • 20,544
  • 2
  • 29
  • 55
  • The intention is to just have to make fetching the string as fast as possible according to its index. I though that Random-Access is the fastest access possible, can you point me to a source where I can read about Random Access vs. Sequential Access where Sequential Access is said to be faster? In wikipedia for example it sounds like Random Access is faster in this scenario http://en.wikipedia.org/wiki/Random_access – fiftyeight Aug 22 '11 at 16:16
  • You're mixing up concepts here. What the article takes about is a data structure providing random access vs. sequential access. With the example of a linked list, this means that to get the n'th element, without the possibility of random access, the previous n-1 nodes have to be traversed, meaning access is O(n) vs. O(1) for arrays. The other thing is, on a lower level, accessing array element 3 and then 34343 is slower than accessing array element 3, then 4, then 5. – Janick Bernet Aug 22 '11 at 16:21
  • Okay I understand, but I can't really control the order in which the elements will be accessed, what I want is the O(1) efficiency. – fiftyeight Aug 22 '11 at 16:24
  • If you would use a hash index you would theoretically get this O(1) access. However, I doubt there will be much difference between the hash index and the B*tree, as the performance will be dictated by the bottleneck that is the hard drive. – Janick Bernet Aug 22 '11 at 16:41
1

You are talking about constant time, but you mention a unique incrementing primary key.

Unless such a key is gapless, you cannot use it as an offset, so you still need some kind of structure to look up the actual offset.

Finding a record by offset isn't usually particularly useful, since you will usually want to find it by some more friendly method, which will invariably involve an index. Searching a B-Tree index is worst case O(log n), which is pretty good.

Assuming you just have an array of strings - store it in a disk file of fixed length records and use the file system to seek to your desired offset.

Then benchmark against a database lookup.

Cade Roux
  • 88,164
  • 40
  • 182
  • 265
  • Yes, it's just an array of strings, the indices are gapless. sorry if I wasn't clear, I updated the question. – fiftyeight Aug 22 '11 at 16:37
  • Both answers are very good, but the idea to use the file-system and benchmark against the DB lookup is what I plan to do. – fiftyeight Aug 24 '11 at 13:21