1

I have an array of strings, each one with a different length. e.g:

s[0] = "sSWXk"
s[1] = "qCk"
s[2] = "sOQQXPbk"
.
.
.
s[x] = "KVfdQk";

I also am given that

n = s[0].length() + s[1].length() + ... + s[x].length()

I need a sorting algorithm with time complexity O(n) for sorting these strings lexicographically, so that (for example)

a < ab < b < bbc < c < ca

Any suggestions? The time complexity is the essential requirement in the algorithm.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Ehsan Khodarahmi
  • 4,772
  • 10
  • 60
  • 87
  • 8
    Is this homework? – Louis Wasserman Jun 12 '12 at 19:37
  • 2
    The person who gives you an answer to this in general will get an accepted answer and a few million dollars. You can't do sorting (in general) in O(n), without making some serious CS innovations. – Oleksi Jun 12 '12 at 19:38
  • Not exactly! you can suppose it as an IQ test! – Ehsan Khodarahmi Jun 12 '12 at 19:38
  • 2
    @Oleksi: You can do it in this instance. The OP isn't exactly asking about the general case. – Louis Wasserman Jun 12 '12 at 19:40
  • 1
    take care about definition of "n". here "n" is not number of elements, but number of total characters of all strings! – Ehsan Khodarahmi Jun 12 '12 at 19:40
  • 1
    @EhsanKhodarahmi: So you're saying that if you have `m` strings each 1000 characters long, you expect to be able to sort them in `O(m)` time? (since `O(1000*m)` is the same as `O(m)`) – NPE Jun 12 '12 at 19:44
  • @aix- Not quite. There are m strings whose lengths are completely arbitrary. However, you know that the total number of characters is n. The goal is to sort in time proportional to the total number of characters, not the total number of strings. – templatetypedef Jun 12 '12 at 19:49
  • The standard sort of java on objects guaranties O(n*log(n)), which in a lot of practical cases will be similar to O(n): http://stackoverflow.com/questions/962545/on-log-n-complexity-similar-to-linear. And of course remember knuth, this kind of optimization will introduce a lot of extra work for little gain in preformance (I'm guessing) – Emile Vrijdags Jun 12 '12 at 19:51

3 Answers3

11

There is a data structure called a trie that is optimally suited for this. If you insert all the words into the trie and then do a DFS over the trie, you will get the words back in sorted order. Doing so takes time O(n) as well, where n is the total number of characters in all the strings.

Since I assume that this is homework, I'll leave the details of how to implement the trie as an exercise. :-)

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
1

I've had a similar test question, which I got incorrect, but it has bugged me since then. So I kept looking into it, and I think there are two other methods that might produce results in linear time. One is to treat the strings as a series of integers with a base of 26 and use radix sort on the array after padding the strings to be of equal length (in some way - there's probably a slick way to do this without increasing storage space drastically, I just haven't worked out the details). I haven't built an example or tested it, so I can't say with certainty this would work, but the principle of it seems sound. Another method would be bucket sort - using a 26 element array that contain pointers to 26 lists (the buckets). Sort each string into the appropriate the bucket-linked lists (those starting with 'a' into the list pointed at by the first array element, etc.) Then sort each list using a standard O(n log n) method. I don't fully understand the math, but according to Cormen's textbook "Introduction to Algorithms", using bucket sort in this way winds up having a linear time complexity. It seems the space would be larger than the Radix-style method, though, so long as the right-padding requirement could be met without actually allocating a bunch of storage for it.

Bill Gage
  • 11
  • 2
0

Because this is a homework problem I can only give a hint. Hint: Use a modified version of counting sort. It's practical if we assume a char is 8 bits.

Stefan Savev
  • 839
  • 1
  • 6
  • 7