Space complexity of finding non-repeating character in string

Question

Here is a simple algorithm exercise. The problem is to return the first non-repeating character. For example, I have this string: 'abbbcdd' and the answer is 'a' because 'a' appears before 'c'. In case it doesn't find any repeated characters, it will return '_'.

My solution works correctly, but my question is about the performance. The problem statement says: "Write a solution that only iterates over the string once and uses O(1) additional memory."

Here is my code:

console.log(solution('abbbcdd'))

function solution(str) {
  let chars = buildCharMap(str)
  for (let i in chars) {
    if (chars[i] === 1) {
      return i
    }
  }
  return '_'
}

function buildCharMap(str) {
  const charMap = {}
  for (let i = 0; i < str.length; i++) {
    !charMap[str[i]] ? charMap[str[i]] = 1 : charMap[str[i]]++
  }
  return charMap
}

Does my answer meet the requirement for space complexity?

Building the character map is `O(N)` because you are looping over your string, which can be of length `N`. It is not `O(1)`. Unless you know the string length will always be a constant size in length — Nick Parsons, Dec 14 '19 at 03:28
`O(1)` is impossible, because you *must* iterate over the each character of the string *somehow*, so worst-case, you *must* perform at least `n` operations, where `n` is the length of the string — CertainPerformance, Dec 14 '19 at 03:36
In the problem statement it was writting this way: Write a solution that only iterates over the string once and uses O(1) additional memory — claudiopb, Dec 14 '19 at 03:40
it says O(1) additional memory.. It says the space constraint must be O(1), not the run time. That means you can not store a copy of the string in a variable etc.Space complexity is different than time complexity — sinanspd, Dec 14 '19 at 03:42
Maybe you could convert the string to a number by having each character correspond to a prime and multiplying them all together, but calculating a unique prime for a character might take more than `O(1)`, not sure — CertainPerformance, Dec 14 '19 at 03:55
I think there is still something missing in this question. Either there is ambiguity around iterate once, or there is a condition that the string is sorted, or that repetitions are grouped. The obvious counter example is a symmetric string "abcdefedcba" (there is a name for these that start with p i forgot), there is no way you can do this in a SINGLE iteration, plus no extra memory. The most efficient solution to this is to sort and do a sliding window which will be O(nlgn) — sinanspd, Dec 14 '19 at 04:05
Yes! Thank you haha. Anyway, the point being palindromes will require more than exactly one iteration as far as i can see — sinanspd, Dec 14 '19 at 04:09
This actually depends on more details about the problem which you didn't include in the question. Is the string formed from an alphabet of a fixed size (e.g. only lowercase letters)? If so, the auxiliary space for the object is O(1), not O(*n*). — kaya3, Dec 14 '19 at 05:26
@NickParsons Yes the code does meet the requriements. The character count map takes `O(1)` memory, it is limited by the size of the alphabet not proportional to the input size. — Bergi, Dec 14 '19 at 16:30
@Bergi I've never seen an infinite string, either, but that doesn't mean that the string length is O(1). If the alphabet size is not fixed in the problem and we want to be rigorous, then we usually assume it has a variable size *a* rather than a constant size. — kaya3, Dec 14 '19 at 16:35
@kaya3 Ah, yes, the size might not be known, but it usually is assumed to be fixed unless it's explicitly mentioned that it is part of the input to be analysed. Any size `a` is essentially `O(1)` in terms of the string input length `n`. — Bergi, Dec 14 '19 at 16:48
@Bergi yes, you’re right about that thanks for pointing that out — Nick Parsons, Dec 14 '19 at 23:10

kaya3 · Accepted Answer · 2019-12-14T17:01:05.007

The time complexity is straightforward: you have a loop over a string of length n, and another loop over an object with strictly at most n keys. The operations inside the loops take O(1) time, and the loops are consecutive (not nested), so the running time is O(n).

The space complexity is slightly more subtle. If the input were a list of numbers instead of a string, for example, then we could straightforwardly say that charMap takes O(n) space in the worst case, because all of the numbers in the list might be different. However, for problems on strings we have to be aware that there is a limited alphabet of characters which those strings could be formed of. If that alphabet has size a, then your charMap object can have at most a keys, so the space complexity is O(min(a, n)).

That alphabet is often explicit in the problem - for example, if the input is guaranteed to contain only lowercase letters, or only letters and digits. Otherwise, it may be implicit in the fact that strings are formed of Unicode characters (or in older languages, ASCII characters). In the former case, a = 26 or 62. In the latter case, a = 65,536 or 1,112,064 depending on if we're counting code units or code points, because Javascript strings are encoded as UTF-16. Either way, if a is a constant, then O(a) space is O(1) space - although it could be quite a large constant.

That means that in practice, your algorithm does use O(1) space. In theory, it uses O(1) space if the problem statement specifies a fixed alphabet, and O(min(a, n)) space otherwise; not O(n) space. Assuming the former, then your solution does meet the space-complexity requirement of the problem.

This raises the question of why, when analysing algorithms on lists of numbers, we don't likewise say that Javascript numbers have a finite "alphabet" defined by the IEEE 754 specification for floating point numbers. The answer is a bit philosophical; we analyse running time and auxiliary space using abstract models of computation which generally assume numbers, lists and other data structures don't have a fixed limit on their size. But even in those models, we assume strings are formed from some alphabet, and if the alphabet isn't fixed in the problem then we let the alphabet size be a variable a which we assume is independent of n. This is a sensible way to analyse algorithms on strings, because alphabet size and string length are independent in the problems we're usually interested in.

"*using abstract models of computation which generally assume numbers could be arbitrarily large*" - actually, in most models of computations numbers (integers to be precise) are limited in size to be able to index into the largest input. In other words, the number of bits of each individual value is limited to `O(log n)` for input size `n`. — Bergi, Dec 14 '19 at 16:43
I accept that, but log *n* is still "arbitrarily large", because *n* is arbitrarily large. For number-theoretic algorithms where there aren't any lists involved, we tend to assume integers can be arbitrarily large, but that they don't take constant space, and that arithmetic operations like `+` and `*` don't take constant time. — kaya3, Dec 14 '19 at 16:48
I've changed "could be arbitrarily large" to "don't have a fixed limit on their size", which is a less misleading way to say what I meant. — kaya3, Dec 14 '19 at 17:03
Thanks. I think the common terminology is "arbitrary but fixed". — Bergi, Dec 14 '19 at 17:10

Space complexity of finding non-repeating character in string

1 Answers1