3

Given a large unsorted array, I need to find out the number of occurrences of a given number in a particular range. (There can be many queries)

e.g. if arr[]={ 6,7,8,3,4,1,2,4,6,7,8,9} and left_range=3 and right_range=7 and number=4, then the output will be 2. (considering a 0 indexed array)

arr[i] can be in the range of 1 to 100000. The array can have up to 100000 numbers.

Can you guide me about which data structure or algorithm I should use here?

PS: Pre-processing the array is allowed.

user3080029
  • 553
  • 1
  • 8
  • 19
  • Likely simply taking an array slice and brute-forcing it will wind up being the most efficient for an unsorted array and if you're only doing one-off queries. – aruisdante Mar 16 '14 at 04:46
  • Ah, if there can be many queries on the same input array, think about turning the array into a structure where you can query a number in **O(log n)** time, and expand that structure to then let you narrow the count by a range. In other-words your record at each number's node is going to be the indexes that contained that number. – aruisdante Mar 16 '14 at 04:49
  • @Sudipta - I created a 2D matrix. For each number 1<=i<=100000, I stored its number of occurrences up to index j in the original array. But, I get a segmentation fault because of huge memory requirement – user3080029 Mar 16 '14 at 04:52
  • @aruisdante - Are you suggesting something like a segment tree? Or should I create an adjacency list for all numbers and store only the indices in which they occur? – user3080029 Mar 16 '14 at 04:54
  • Mm, you can definitely make this problem boil down to **O(1)**, but it would have to take up **n^3** memory (it would have to be a 3D array to allow number, lower-bound, upper-bound preprocessing). You're also making the pre-processing step *much* more complicated. – aruisdante Mar 16 '14 at 04:55
  • @user3080029: A segment tree is just a binary search tree that takes up too much space. Use a real binary search tree for the positions of each number. Space usage is O(n) and queries are O(log n). You can use `set` in C++ and `TreeSet` in Java – Niklas B. Mar 16 '14 at 04:57
  • And yes, this can definitely be turned into a relatively efficient tree problem, though I might not necessarily use a segmented one. – aruisdante Mar 16 '14 at 04:57
  • 1
    @NiklasB. You'd probably want to use a `map` so that you can store the record of indices with the number, but I was trying not to solve their homework for them completely. – aruisdante Mar 16 '14 at 04:59
  • @aruisdante well you can have a separate BST for every number, but that doesn't really matter. Doesn't look like homework to me – Niklas B. Mar 16 '14 at 05:00
  • @aruisdante - Thanks a lot for the help. Using an adjacency list for all the 100000 numbers would have been an overkill. Using a map is better. :) – user3080029 Mar 16 '14 at 05:03
  • @NiklasB. My point is since you need to bind this by index range, you want to be able to query in the 'number space' in **O(log n)** and then in the 'index space' in **O(log m)**, where **m** is the number of indices that the record shows up in. If you do this cleverly using `map` should make it trivial. – aruisdante Mar 16 '14 at 05:04
  • @aruisdante: Since you only have numbers from 0 to 100000, you can easily have an array of sets or something like that, so you can query in one dimension in O(1). Would consider that pretty trivial as well. But as I said, it doesn't matter at all – Niklas B. Mar 16 '14 at 05:05
  • Well, using a proper *hashmap* would get both of them down to **O(1)** at the cost of slower insertion and no guaranteed order. Which, in this case, is actually probably not such a bad thing. I actually feel dumb that that wasn't the first thing I thought of when order wasn't part of the spec :p – aruisdante Mar 16 '14 at 05:08
  • @aruisdante You can't do range queries in an unordered data structure, so I don't see how that would be possible. – Niklas B. Mar 16 '14 at 05:14
  • By the way OP, you can just use arrays to store the positions, no need for search trees if nothing is changing. If only one number is ever queried, you also need only one array. – Niklas B. Mar 16 '14 at 05:15
  • @NiklasB. Again, this is why I said you have to be clever about what you store at each node in the *number* tree, and that it should be records relating to the indices where the number occurs. Now I've thought about it, this problem boils down to a single **O(n)** pre-processing step, and then an **O(1)** number query and two **O(1)** index queries based on the result of the number query. If the OP says it's not homework I'll happily post code demonstrating. – aruisdante Mar 16 '14 at 05:17
  • Create a kth min tree (A binary tree keeping track of the no. of elements less than each node) for inputs. For the left and right range, search for the node, check for the kth min value at the node. – vaibhav kumar Mar 16 '14 at 05:19
  • @NiklasB. Oh wait, you're right, the index space one will have to be an ordered map, so **O(m)**, unless you happened to be lucky and the number occurred at both the start and end values since there is no `lower_bound` equivalent for an unordered structure. Doh. And stupid timeout on editing comments. – aruisdante Mar 16 '14 at 05:20

1 Answers1

11

Here's a solution that doesn't require segment tree.

Preprocessing:

  1. For each number arr[i], push i to the 2D vector(or ArrayList) with index arr[i].

Answering Queries:

For any query do a binary search on vector[num] to find the index of the maximum index of num in that vector that's less than or equal to right range, let's call it R. Then find the minimum index that's greater than or equal to left range, let's call it L. Print R - L + 1

Runtime: Preprocessing in O(1) per item, taking total O(N) time. Per Query answer: O(lg(N))

Space: Quite linear assuming vector or ArrayList

Fallen
  • 4,435
  • 2
  • 26
  • 46
  • 2
    Swift and simple. And indeed it seems to be "quite linear" :D – Niklas B. Mar 16 '14 at 05:23
  • 1
    That was basically what I was suggesting, although without actually explicitly using `map` since you're doing it in-place with an array. Upvote for you :) – aruisdante Mar 16 '14 at 05:27