2

I have a some data that needs to be stored and looked up efficiently. Preferably using C. Each line of the data file is in the following format:

key1 key2 key3 data  

where key1, key2, key3 are integers and data is an array of float.

I am thinking about converting key1,2,3 into a string, then use C++ std::map to map string to a float pointer:

std::map<string, float*>

Are there better ways of doing it?

Note: integer key1,2,3 has a range of 0-4000, but very sparsely populated. In another word if you go through all the values in key1, you will find < 100 unique int within the rang eof 0-4000.

chrisaycock
  • 36,470
  • 14
  • 88
  • 125
elgnoh
  • 493
  • 5
  • 15
  • 3
    Are you going to use C or C++? – AusCBloke Aug 11 '12 at 01:29
  • 4
    The solutions available for C, and those available for C++, are very different. You say "Preferably using C", but go on to suggest a C++ implementation using `std::map`. Which language do you actually want? – Greg Hewgill Aug 11 '12 at 01:32
  • And if you are going c++ a class that contains all three keys with a comparison operator that contains logic to correctly compare keys in their least common order would be good – Adrian Cornish Aug 11 '12 at 01:34
  • Could I have solutions for C as well as C++, just for the learning purpose, Thanks – elgnoh Aug 11 '12 at 01:35
  • You could - but you cannot use std::map in a C answer - which increases the complexity of the answer by a large factor since you would have to write the container – Adrian Cornish Aug 11 '12 at 01:57
  • Are the keys related? For example are you wanting to look the data up by one of any of the keys, or must all three keys be supplied in order? – Josh Petitt Aug 11 '12 at 03:39

4 Answers4

5

You can use std::tuple to combine the three values into one:

std::map<std::tuple<int, int, int>, float *>
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • I would think using a hand rolled class as the key could aid in optimizing the comparison order if it has known limits. Also isn't tuple boost only (practically) at the moment? Do not know windows based compliers much anymore – Adrian Cornish Aug 11 '12 at 02:04
2

you do not have to use strings if your data limits for each key is from 0 to 4000

first generate the combined key as follows:

unsigned long ulCombinedKey = key1 + key2<<12 + key3 <<24;

after that you can use the map as you already stated in your questions.

Mahmoud Fayez
  • 3,398
  • 2
  • 19
  • 36
  • Unfortunately, there aren't *quite* enough bits to do this on a 32-bit machine, but it would work fine with a 64-bit data type. – Greg Hewgill Aug 11 '12 at 01:42
  • @GregHewgill: Note that he's using `+` instead of `|`, so while it's not necessarily (quite) unique, what he's doing is basically a hash code that should still be nearly unique (i.e., very few collisions). At the same time, to ensure uniqueness you'd still need the original key to compare against. – Jerry Coffin Aug 11 '12 at 01:47
  • Thanks guys it seems he had a better solution but I was planning to come up with a C solution not C++ but it seems he is fine with C++ too. – Mahmoud Fayez Aug 11 '12 at 02:05
  • Thank you @MahmoudFayez for the solution. Is there a C solution as well? I guess I can use the same unique key concept and build a binary tree for lookup? – elgnoh Aug 11 '12 at 23:45
  • I think B-Tree is better. Please check this question http://stackoverflow.com/questions/32376/what-is-a-good-open-source-b-tree-implementation-in-c – Mahmoud Fayez Aug 11 '12 at 23:49
1

A hierarchical map would do it:

map<int, map<int , map<int, list<float> > > > records;

and the access time would be good (logarithmic). This way would be efficient if the range is very wide. Otherwise for 4000 the suggested shifts given in previous answer is faster and more efficient.

hashtpaa
  • 389
  • 2
  • 11
0

A hash provides very fast access to data, so you might want to use hashes to look up values from each of the three integers. This approach can be used in either c or c++.

For each line of data: 1. allocate space for the array of floats 2. store a pointer to the array of floats in an array of pointers 3. store the index of the pointer array in a hash based on on int1 4. store the index of the pointer array in a hash based on on int2 5. store the index of the pointer array in a hash based on on int3

This way, given int1, int2, or int3, one could look up a pointer array index, retrieve the pointer, then follow the pointer to the array of floats. This approach uses some memory, but not too much, given the problem said there are < 100 unique values for each of int1, int2, and int3.