58

Many Java framework classes implement Iterable, however String does not. It makes sense to iterate over characters in a String, just as one can iterate over items in a regular array.

Is there a reason why String does not implement Iterable?

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
user333335
  • 583
  • 1
  • 4
  • 4
  • 1
    Wheres the problem to iterate through the string's char Array? (strInput.ToCharArray) – Tim Schmelter May 05 '10 at 11:02
  • 6
    Tim: String#toCharArray creates an array with a copy of the String's characters. Even if it works, it imposes unnecessary overhead just to iterate over the characters. – jarnbjo May 05 '10 at 11:15
  • 1
    @jambjo `Iterator` would be less overhead??? – Tom Hawtin - tackline May 05 '10 at 11:55
  • 2
    @Tom: Depending on the situation Iterator could have a MUCH smaller overhead than toCharArray – Foxfire May 05 '10 at 12:22
  • @Foxfire No, I don't think that is reasonable. An iterator optimised for generating long sequences of the same character? – Tom Hawtin - tackline May 05 '10 at 13:17
  • @Tom: If the iterator would use autoboxing (e.g. Character c = 'c'), the resulting code would use Character.valueOf('c'), which according to the Java docs should use a cache instead of creating new instances for all characters. In Suns VM, Character instances are cached for all chars with a value <= 127. – jarnbjo May 05 '10 at 13:39
  • 2
    @Tom: As I said it depends on the situation: If you have a long string and use the enumerator to only get a few entries it would be MUCH better. Extreme sample: E.g. if you had a 1GB string and used an enumerator to get the first 100 chars 100 times, then your would have basically 10,000 accesses in the enumerator case, but when using toCharArray you would have 100 copies of the string which alone result in 5,000,000,000 accesses and you still need the iteration so it would be 10,000 vs 5,000,010,000. Pretty clear which is better, isn't it (and yes, this is a constructed extreme case) – Foxfire May 06 '10 at 10:07

8 Answers8

31

There really isn't a good answer. An iterator in Java specifically applies to a collection of discrete items (objects). You would think that a String, which implements CharSequence, should be a "collection" of discrete characters. Instead, it is treated as a single entity that happens to consist of characters.

In Java, it seems that iterators are only really applied to collections and not to a string. There is no reason why it is this way (near as I can tell - you would probably have to talk to Gosling or the API writers); it appears to be convention or a design decision. Indeed, there is nothing preventing CharSequence from implementing Iterable.

That said, you can iterate over the characters in a string like so:

for (int i = 0; i < str.length(); i++) {
  System.out.println(str.charAt(i));
}

Or:

for(char c : str.toCharArray()) {
  System.out.println(c);
}

Or:

"Java 8".chars().forEach(System.out::println);

Also note that you cannot modify a character of a String in place because Strings are immutable. The mutable companion to a String is StringBuilder (or the older StringBuffer).

EDIT

To clarify based on the comments on this answer. I'm trying to explain a possible rationale as to why there is no Iterator on a String. I'm not trying to say that it's not possible; indeed I think it would make sense for CharSequence to implement Iterable.

String provides CharSequence, which, if only conceptually, is different from a String. A String is usually thought of as a single entity, whereas CharSequence is exactly that: a sequence of characters. It would make sense to have an iterator on a sequence of characters (i.e., on CharSequence), but not simply on a String itself.

As Foxfire has rightly pointed out in the comments, String implements the CharSequence interface, so type-wise, a String is a CharSequence. Semantically, it seems to me that they are two separate things - I'm probably being pedantic here, but when I think of a String I usually think of it as a single entity that happens to consist of characters. Consider the difference between the sequence of digits 1, 2, 3, 4 and the number 1234. Now consider the difference between the string abcd and the sequence of characters a, b, c, d. I'm trying to point out this difference.

In my opinion, asking why String doesn't have an iterator is like asking why Integer doesn't have an iterator so that you can iterate over the individual digits.

Martin Andersson
  • 18,072
  • 9
  • 87
  • 115
Vivin Paliath
  • 94,126
  • 40
  • 223
  • 295
  • 4
    Surely treating a string as a collection of letters isn't entirely without precedent, and to argue it on a "makes sense" case seems a little spurious. – Svend May 05 '10 at 11:09
  • @Svend that's true - I was actually at a loss for words - I think I wanted to say "it doesn't make sense in some cases" or even "it doesn't make sense in most cases" considering what iterators really are. I will edit my answer. – Vivin Paliath May 05 '10 at 11:10
  • 9
    "A String is not really a "collection" of discrete characters.". Well it is. In fact it even implements CharSequence, which is exactly that: An orderd collection of discrete characters! – Foxfire May 05 '10 at 11:21
  • @Vivin: there is no specific implication that `Iterator` must act on a collection. Infinite iterators seem to be acceptable in the right context. – polygenelubricants May 05 '10 at 11:22
  • @Foxfire, agreed - but a `String` by itself is not a `CharSequence`. A `CharSequence` is a sequence of characters that is created from a String. It would make sense to have an iterator on a `CharSequence` but not on just the `String` itself. – Vivin Paliath May 05 '10 at 11:32
  • @polygenelubricants I am not saying that you _can't_ have an Iterator on a string. I'm only trying to explain why. You can have an iterator on anything you want. The question is if it makes sense. – Vivin Paliath May 05 '10 at 11:33
  • 7
    You can do `foreach (char c in s)` in C#, just beautiful! – fredoverflow May 05 '10 at 11:33
  • 1
    @Vivin: CharSequence is an INTERFACE (exactly as Iterable). So it is the String itself implementing the interface. It is not created from the String. – Foxfire May 05 '10 at 11:35
  • @Foxfire, point noted. I realize I may be pedantic here, but to me a `String` and a `CharSequence` are two separate things. – Vivin Paliath May 05 '10 at 11:38
  • 2
    @Vivin: Then imho you should just try to answer the original question as: "Why does CharSequence not implement Iterable". (Which of course technically still means "Why does String not implement Iterable") – Foxfire May 05 '10 at 11:48
  • @Foxfire, indeed. It would make sense to have it on `CharSequence` imho (I've alluded to that in my answer). If the `CharSequence` interface specified an iterator, I think it would make more sense rather than `String` having it. Thanks for the fruitful discussion :) – Vivin Paliath May 05 '10 at 12:15
  • I don't think this was pointed out so far, but using `Iterable` would not be efficient. Since generics only exist at compile time, Iterable get's compiled down to `Iterable` (or in another sense, `Iterable`). Creating a new `Char` for each item in large string would get quite ridiculous (`O(n)`) – Dylanthepiguy Mar 17 '17 at 04:23
  • Don't quite agree with the assertion that a String should only be thought of as a single entity in the same sense as the number 1234. If it were so, there wouldn't be a method charAt(). There is no such method in class Integer for instance. charAt() shows that the String is indeed, even conceptually, a CharSequence. So except for efficiency reasons, it should implement Iterable. – asbxl Feb 24 '18 at 11:10
  • @asbxl Since I've written this, I've come to change my mind regarding that assertion as well. – Vivin Paliath Mar 05 '18 at 19:39
13

The reason is simple: The string class is much older than Iterable.

And obviously nobody ever wanted to add the interface to String (which is somewhat strange because it does implement CharSequence which is based on exactly the same idea).

However it would be somewhat imperformant because Iterable returns an object. So it would have to Wrap every Char returned.

Edit: Just as comparison: .Net does support enumerating on String, however in .Net Iterable also works on native types so there is no wrapping required as it would be required in Java.

Foxfire
  • 5,675
  • 21
  • 29
  • "adding Iterable to String class makes it imperformant", makes sense; but nobody added Itreable to String class just because it was old, seems a bit odd. can you please explain some more? – phoenix24 May 05 '10 at 11:54
  • String existed long before Iterable. So you would have to add the interface later. While that is possible it may - in some corner cases - be a breaking change. And taking into consideration how often String is used this *might* have been something considered risky. This is just guessing. I have no knowledge if these considerations were really affecting that decision. But it seems most likely. – Foxfire May 05 '10 at 11:59
  • 3
    I can't see adding `Iterable` (or any type) to `String` as being a breaking change. It's not like you can subclass `String` (thank god). – Tom Hawtin - tackline May 05 '10 at 13:19
  • @Tom: Surely in 99.9% of the cases it won't be. But it is easy enough to construct cases (e.g. reflecting on the interfaces) where it could break. Taking into account that basically EVERY application uses String somewhere that still might be a reason. – Foxfire May 06 '10 at 10:02
  • 3
    Any code like that which gets broken, deserves to be broken. I think I am safe in saying it is not a reason brought into consideration. – Tom Hawtin - tackline May 06 '10 at 12:34
  • 1
    Your main reason "string class is much older than Iterable" is not correct. Prior to Java 1.2 there was a Vector class, almost the same as ArrayList. Java 1.2 introduced the Collections framework, and Vector was backfitted into this framework (it was made to implement List). They added methods to it to implement the interface, without breaking its legacy API. – noamtm Oct 09 '17 at 07:24
  • Rather, I guess it's a design decision: in Java, String is treated almost like a primitive type -- not a collection (unlike Python, for example). – noamtm Oct 09 '17 at 07:25
13

For what it's worth, my coworker Josh Bloch strongly wishes to add this feature to Java 7:

for (char c : aString) { ... }

and

for (int codePoint : aString) { ... }

This would be the easiest way to loop over chars and over logical characters (code points) ever. It wouldn't require making String implement Iterable, which would force boxing to happen.

Without that language feature, there's not going to be a really good answer to this problem. And he seems very optimistic that he can get this to happen, but I'm not sure.

Kevin Bourrillion
  • 40,336
  • 12
  • 74
  • 87
  • Too bad that did not make it into Java 7’s project coin. – akuhn Nov 03 '12 at 19:52
  • If they were someday planed to do so, make sure it works for any objects inherits `CharSequence` rather than implemented for `String` only. – Earth Engine Jul 09 '13 at 02:15
  • 1
    @akuhn Neither into Java 8 or 9... RIP. – João Vitor Verona Biazibetti Nov 22 '15 at 00:49
  • @JoaaoVerona Not directly, as in having CharSequence or String implement Iterable. But Java 8 extended the CharSequence interface with (default) methods `chars()` and `codePoints()` which return an IntStream. That interface has a `forEach(IntConsumer action)` method which is the next best thing. You can write `"test".chars().forEach(c -> ...)` and it wouldn't be very different from a for loop. I suspect one reason for not having String or CharSequence implement Iterable is that you can iterate over its characters or its code points. An important distinction. – G_H Oct 13 '21 at 16:03
  • You worked with Josh Bloch?!?!!?! – HydroPage Feb 25 '22 at 21:42
2

One of the main reasons for making String implement Iterable is to enable the simple for(each) loop, as mentioned above. So, a reason for not making String implement Iterable could be the inherent inefficiency of a naïve implementation, since it requires boxing the result. However, if the implementation of the resulting Iterator (as returned by String.iterator()) is final, the compiler could special-case it and generate byte-code free from boxing/unboxing.

1

If you are really instrested in iterating here:

String str = "StackOverflow";

for (char c: str.toCharArray()){
     //here you go
}
mohdajami
  • 9,604
  • 3
  • 32
  • 53
  • 2
    -1 Sorry, but I don't see what this answer has to do with the question asked. – jarnbjo May 05 '10 at 11:17
  • 4
    A problem might be that toCharArray creates a new array. So this is VERY inefficient. – Foxfire May 05 '10 at 11:18
  • 1
    @Helper: String is immutable. However the returned Array is not. And changinig the Array must not affect the String. So it DOES make a complete copy. – Foxfire May 05 '10 at 12:04
  • +1 - For small strings, creating a char[] is roughtly as expensive as creating an Iterator - it's an object allocation (and a small amount of memory initialization and copy). As the strings become longer then the memory initialize/copy overhead becomes significant, but still nowhere near as significant as boxing each character. – mdma Jun 02 '10 at 00:51
1

They simply forgot to do so.

akuhn
  • 27,477
  • 2
  • 76
  • 91
  • Do you have any evidence for this assertion? It seems more likely to me that it is because String predates the Iterable interface (Strings presumable date back to Java 1.0, Iterable dates back to Java 1.5), and once the language specifiers had gotten used to not treating String as one of the collections, they continued to treat it that way. – Anomaly Jan 16 '18 at 15:36
0

I'm not sure why this is still not implemented in 2020, my guess would be that Strings are given a lot of special treatment in Java (with compiler overloading the + operator for string concatenation, string literals, string constants stored in a common pool, etc.) that this feature might be harder to implement than it looks (or it might mess up with too many things to be worth the effort from the implementers' point of view).

On the other hand, implementing something close to this is not too much work. I wanted this in one of my project, so I wrote these simple classes:

public class CharIterable implements Iterable<Character> {
  public CharIterable(CharSequence seq) {
    this.seq = seq;
  }

  @Override
  public Iterator<Character> iterator() {
    return new CharIterator(seq);
  }

  private final CharSequence seq;
}

public class CharIterator implements Iterator<Character> {
  public CharIterator(CharSequence sequence) {
    this.sequence = sequence;
  }

  @Override
  public synchronized boolean hasNext() {
    return position < sequence.length();
  }

  @Override
  public synchronized Character next() {
    return sequence.charAt(position++);
  }

  /**
   * Character sequence to iterate over
   */
  private final CharSequence sequence;

  /**
   * Current position of iterator which is the position of the item that
   * will be returned by {@link #next()}.
   */
  private int position = 0;
}

With these I can do this:

for (Character c: new CharIterable("This is a test")) {
  \\ do something with c
}

Now this looks like a lot for such a simple thing but it then allows strings to be treated like an iterable array of characters and work transparently with methods designed to work on collection of things (lists, sets, etc.).

Vikash Madhow
  • 1,287
  • 11
  • 15
-2

Iterable of what? Iterable<Integer> would make most sense, where each element represents a Unicode codepoint. Even Iterable<Character> would be slow and pointless when we have toCharArray.

Tom Hawtin - tackline
  • 145,806
  • 30
  • 211
  • 305
  • I know it's late but toCharArray always copys the whole string. If you only ever iterate over a small part of a long string toCharArray is a greater overhead than autoboxing (that might get optimized away anyway). – RedCrafter LP Jun 16 '22 at 10:33
  • @RedCrafterLP I would be very surprised if there were an interesting collection of cases where you are iterating over a very small head of a large string. And if there were, you would do something other than creating an `Iterator` or `Iterator`. – Tom Hawtin - tackline Jun 17 '22 at 11:19
  • @ Tom Hawtin for example you want to interpret the first few characters of a string to interpret a command. Using String.split is pretty wasteful. It would be arguably better to provide a way to split of string slices of strings without copying. Something that can be done for example in C# with ReadOnlySpan. – RedCrafter LP Jun 17 '22 at 13:03