search unique URL

Question

Given a set of 1 million (very large) no. of URL's. Find the "first" "unique" URL from the list.

My Approach: Build a hash using perfect hashing function, that can help. But my question is to hash large data is not possible., then how can I solve this question.

Is there any method to do inplace? Please help. Thanks in advance.

No, Frerich. Actually I am preparing for an interview. That's why I asked this question. — devsda, Aug 31 '12 at 07:06
yes, Airza, but now a conversation is going on with Freich. After that i accepted his answer. — devsda, Sep 02 '12 at 05:51

score 1 · Accepted Answer · answered Aug 31 '12 at 07:09

1

Given an input list of ["c","a","b","a","c"], my first approach would be:

Convert the list of URLs into a list of tuples which associates each element which its position in the list. Now you have [(0,"c"),(1,"a"),(2,"b"),(3,"a"),(4,"c")].
Sort the list lexicographically by the second tuple element (the URL). Now you have [(1,"a"),(3,"a"),(2,"b"),(0,"c"),(4,"c")].
Group sequences of subsequent equal tuples (a tuple is equal if the second element equals) into sub-lists. Now you have [[(1,"a"),(3,"a")],[(2,"b")],[(0,"c"),(4,"c")]].
Filter the list so that you only have lists of length 1. Now you have [[(2,"b")]].
If the resulting list is empty, there is no unique URL in the list. If it is non-empty, Sort the list by the first tuple element (the position in the string). In this case, you get the same list back - [[(2,"b")]].
take the first element of the list. Now you have [(2,"b")].
The (only) tuple in this list tells you the first unique URL, and the position in the input list: it's the URL b at position 2 in the input list.

answered Aug 31 '12 at 07:09

Frerich Raabe

90,689
19
115
207

1

but Frerich, I think you tried to implement Map/Reduce concept. First map, then sort, then make pair. – devsda Aug 31 '12 at 07:14
But this can't help, This may possible that Z comes first, A comes last, both occurs only one time. But after sorting A comes before than Z. and gives anser as A, but actually answer is Z – devsda Aug 31 '12 at 07:17
@jhamb: So? I was thinking of Haskell while writing this algorithm (that's why the list/tuple notation is like that). I think it would be rather straightforward to implement and I suspect that Haskell's lazyness could make this reasonably efficient, too. – Frerich Raabe Aug 31 '12 at 07:18
@jhamb: No, that's why in step (5) you sort the list of unique characters again, this time by their position (the first tuple element), so e.g. `[(2,"a"),(0,"z")]` becomes `[(0,"z"),(2,"a")]`. – Frerich Raabe Aug 31 '12 at 07:19
So, which data structure that you use to implement this? To check URL is it necessary to parse URL? – devsda Aug 31 '12 at 07:25

search unique URL

1 Answers1