4

I've searched for other threads with a similar problem, but I couldn't find any that apply to me. If I have a variable which has some value, and an array that has a list of values... is it possible for me to efficiently (time efficient, space isn't a constraint) find out the index of the array when the variable matches an element in the array?

I'm getting the variable from reading out of a massive file, and brute force iterating over every possibility will mean several million iterations. I'm willing to do that as a last resort, but I'd rather not. :)

I'm programming in C, if the algorithm depends on that. I don't have an option to program in C++/Python. Thanks!

Edit : The valued that I want to match with the array come in pairs (x,y). If the array matches with x or y, I further process (x,y). But it's vital that the ordering not change if I have to sort it for example.

Kitchi
  • 1,874
  • 4
  • 28
  • 46
  • 4
    Sorted or unsorted list? – cdarke Oct 22 '12 at 15:40
  • Does it have to be an array? If not, you can replace it with a hash (or any dictionary data type). If it does have to be an array, the hash can contain pointers into the array. By using a decent hash function and hash table ratio, you can significantly reduce lookup time. – StoryTeller - Unslander Monica Oct 22 '12 at 15:43
  • The complexity of what you call "brute force" is O(N), you can't make it better. Anything with pre-sorting will end up in O(NlogN) and hash-based -- with slower O(N) compared to "brute force". If the nature of your "array" is such that you can afford having it pre-indexed then go for it and get yourself O(1). – bobah Oct 22 '12 at 15:44
  • @bobah With all due respect, how does a decent hash constitute a slowdown? – StoryTeller - Unslander Monica Oct 22 '12 at 15:47
  • @cdarke - Unsorted list. I've edited the question to reflect more details. DimaRudnik - Are there standard library functions/something like that to create a hash table for a given array in C? – Kitchi Oct 22 '12 at 15:47
  • @bobah Unless you are going to be looking up more than one value, in which case the creating of the hash table become more efficient – asbumste Oct 22 '12 at 15:47
  • 3
    If you're only doing few lookups, a linear search in the array is the best you can do. If you're doing many such lookups, and have the space, build a structure that maps values to indices and has fast lookup. Whether the best way would be a hash map or a balanced tree or a sorted array of `(value, first index)` pairs depends, but in most cases, a good hash map should be the best (as bobah suggested). – Daniel Fischer Oct 22 '12 at 15:54
  • @DimaRudnik - comparing two strings is in the worst case _MIN(len(a),len(b))_ operations, comparing two strings via hash function (considering for _a_ it is precalculated) is _len(b) + 1_ operations. That "+1" is the difference. – bobah Oct 22 '12 at 16:12
  • I wasn't going to suggest you sort it. Volume of searches must be a factor. If you are going to do a lot of searches then you might consider loading it into a simple database, e.g. SQLite. Not worth it for a small number of searches. – cdarke Oct 22 '12 at 16:26
  • Okay, so it's more like ~ 10 million iterations of a loop, with a few conditional statements in the loop. So it's taking me a considerable amount of time 'brute-forcing' this. Would creating a hash table help speed things up in this case? – Kitchi Oct 22 '12 at 17:14
  • @Kitchi: Are x and y integers, doubles, strings, structures, or what? – Nominal Animal Oct 22 '12 at 17:43
  • @bobah I'll grant you that the act of compariosn may be longer (hash only gets you half way, considering you still have to check for equality), but you neglect the fact that a decent hash functions will cut the overall number of comparisons by a substnatial factor. – StoryTeller - Unslander Monica Oct 22 '12 at 21:02
  • @DimaRudnik - the author of the question did not make any comments about number of times he has to match a variable against the values set. It can be that it's value set that is changing and variable is a configuration parameter. With no additional implicit assumptions my comment stays valid - hashtable based approach will be _algorithmically_ slower. Comparison of two strings, as I said has the same complexity as hash function generation for a single string – bobah Oct 23 '12 at 06:51

1 Answers1

2

If space isn't a concern, and you want to know whether a value is contained in the array, you could do something like this:

  • First, create a new array. Let's call the old one v[ ], the new one w[ ], and let i be your iterator through v[ ].

  • Now, make w[v[i]] = 1, and the rest of w[ ] = 0. This basically says "if x is a value in array v[ ], then w[x] = 1". (Note: if you declare w[ ] globally, all of its positions will be initialized with 0, by default)

  • Whenever you want to check for a value contained in v[ ], check w[value] instead. If it equals 1, then the answer is yes.

If you do many checks per array, this should work pretty well. Take note, though, that w[ ] could get pretty big, in size.

Edit: if you want to also keep the index, you could replace the 1's in w[ ] with actual positions - as long as the values do not repeat, this works well.

Zamfi
  • 327
  • 2
  • 9