2

I have an application in which I have to store a couple of millions of integers, I have to store them in a Look up table, obviously I cannot store such amount of data in memory and in my requirements I am very limited I have to store the data in an embebedded system so I am very limited in the space, so I would like to ask you about recommended methods that I can use for the reduction of the look up table. I cannot use function approximation such as neural networks, the values needs to be in a table. The range of the integers is not known at the moment. When I say integers I mean a 32 bit value.

Basically the idea is use some copmpression method to reduce the amount of memory but without losing many precision. This thing needs to run in hardware so the computation overhead cannot be very high.

In my algorithm I have to access to one value of the table do some operations with it and after update the value. In the end what I should have is a function which I pass an index to it and then I get a value, and after I have to use another function to write a value in the table.

I found one called tile coding , this one is based on several look up tables, does anyone know any other method?.

Thanks.

Promila Ghosh
  • 389
  • 4
  • 12
Ryan
  • 131
  • 1
  • 2
  • 8
  • Can you provide a little more information about how you are using these integers? Why do you need to store them in a lookup table, and how are they accessed? – e.James Dec 02 '08 at 21:33
  • What are the ranges of values? How densely packed across the potential range of values? Is it 1-100, 102-199 or is it 1, 3, 5, 7, 11, 13, 17, 19, 23, ... – S.Lott Dec 02 '08 at 21:42
  • You're really going to have to give more information here - to be honest, this sounds a bit like a homework problem. – Patrick Harrington Dec 02 '08 at 21:42
  • I agree with all of the above; what function are you using a LUT for? Trig? Square root? Division? – strager Dec 02 '08 at 22:41
  • Strager what do you mean with function?, I do not understand your question. – Ryan Dec 02 '08 at 22:47
  • My idea of a "look up table" is a speed optimization. Precalculate the results of a function, store them in memory (or .text space), and reference them with the function's parameter. Are you doing something else with your LUT? – strager Dec 02 '08 at 23:01
  • I am not using the LUT as a function approximator, in fact I have to write values in the table. – Ryan Dec 02 '08 at 23:06

5 Answers5

1

I'd look at the types of numbers you need to store and pull out the information that's common for many of them. For example, if they're tightly clustered, you can take the mean, store it, and store the offsets. The offsets will have fewer bits than the original numbers. Or, if they're more or less uniformly distributed, you can store the first number and then store the offset to the next number.

It would help to know what your key is to look up the numbers.

John
  • 15,990
  • 10
  • 70
  • 110
0

If your set of integers is homongenous, then you could try a hash table, because there is a trick you can use to cut the size of the stored integers, in your case, in half. Assume the integer, n, because its set is homogenous can be the hash. Assume you have 0x10000 (16k) buckets. Each bucket index, iBucket = n&FFFF. Each item in a bucket need only store 16 bits, since the first 16 bits are the bucket index. The other thing you have to do to keep the data small is to put the count of items in the bucket, and use an array to hold the items in the bucket. Using a linked list will be too large and slow. When you iterate the array looking for a match, remember you only need to compare the 16 bits that are stored.

So assuming a bucket is a pointer to the array and a count. On a 32 bit system, this is 64 bits max. If the number of ints was small enough we might be able to do some fancy things and use 32 bits for a bucket. 16k * 8 bytes = 524k, 2 million shorts = 4mb. So this gets you a method to lookup the ints and about 40% compression.

johnnycrash
  • 5,184
  • 5
  • 34
  • 58
0

I need more detail on the problem. If you cannot store the real value of the integers but instead an approximation, that means you are going to reduce (throw away) some of the data (detail), correct? I think you are looking for a hash, which can be an artform in itself. For example say you have 32 bit values, one hash would be to take the 4 bytes and xor them together, this would result in a single 8 bit value, reducing your storage by a factor of 4 but also reducing the real value of original data. Typically you could/would go further and perhaps and only use a few of those 8 bits , say the lower 4 and reduce the value further.

I think my real problem is either you need the data or you dont, if you need the data you need to compress it or find more memory to store it. If you dont, then use a hash of some sort to reduce the number of bits until you reach the amount of memory you have for storage.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • Yes you are right I have to use some method to throw away some data but withot losing many precision. I have to use a compression method but that is fast in execution time because it needs to be executed in real time. Has the algorithm that you said based on XOR operations a name?. – Ryan Dec 02 '08 at 22:17
0

Read http://www.cs.ualberta.ca/~sutton/RL-FAQ.html

"Function approximation" refers to the use of a parameterized functional form to represent the value function (and/or the policy), as opposed to a simple table."

Perhaps that applies. Also, update your question with additional facts -- don't merely answer in the comments.


Edit.

A bit array can easily store a bit for each of your millions of numbers. Let's say you have numbers in the range of 1 to 8 million. In a single megabyte of storage you can have a 1 bit for each number in your set and a 0 for each number not in your set.

If you have numbers in the range of 1 to 32 million, you'll require 4Mb of memory for a big table of all 32M distinct numbers.

See my answer to Modern, high performance bloom filter in Python? for a Python implementation of a bit array of unlimited size.

Community
  • 1
  • 1
S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • Function approximation is a very common solution, and the normal solution in this problems, sadly I need to run this in an embebedded system so I cannot use the common function approximation due to the computation overhead. – Ryan Dec 02 '08 at 22:31
  • "computation overhead" usually false. If you don't have benchmark data for your function, you'll waste a lot of time on a lookup that turns out to be slower. – S.Lott Dec 02 '08 at 23:49
0

If you are merely looking for the presence of the number in question a bloom filter, might be what you are looking for. Honestly though your question is fairly vague and confusing. It would help to explain what Q values are, and what you do with them once you find them in the table.

grieve
  • 13,220
  • 10
  • 49
  • 61