11

What's behind the NumericRange Int size restriction on the Scala for-loop comprehension? Is it possible (without to much headache) to extend "for/Seqs" NumericRange to make use of Long (or anything bigger than Int.MaxValue)?

scala>for (i: Long <- 0L to 10000000000) {}

java.lang.IllegalArgumentException: 0 to 10000000000L by 1: "seqs cannot contain more than Int.MaxValue elements."
    at scala.collection.immutable.NumericRange$.count(NumericRange.scala:227)
    at scala.collection.immutable.NumericRange.numRangeElements(NumericRange.scala:53)
    at scala.collection.immutable.NumericRange.length(NumericRange.scala:55)
    at scala.collection.immutable.NumericRange.foreach(NumericRange.scala:73)
    at .<init>(<console>:19)
    at .<clinit>(<console>)
    at .<init>(<console>:11)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:704)
    at scala.tools.nsc.interpreter.IMain$Request$$anonfun$14.apply(IMain.scala:920)
    at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
    at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)
    at java.lang.Thread.run(Thread.java:680)

--
Thanks in advance!

IODEV
  • 1,706
  • 2
  • 17
  • 20

4 Answers4

10

In Scala there is no for-loop, but a for-comprehension. It works different than a loop. Actually your for-comprehension gets translated to:

(0L to 10000000000).map { i => // (0L to 10000000000) == collection.immutable.NumericRange.inclusive(0L, 10000000000,1)
  // block
}

To limitation is not in the for-comprehension, but in the Seq type, which cannot contain more than Int.MaxValue elements. If you really need a 10000000000x loop you can still use

var i = 0L
while(i < 10000000000) {
  // do stuff
  i+=1
}
drexin
  • 24,225
  • 4
  • 67
  • 81
  • Thanks for the replay thou I'm quite aware of the comprehension syntax sugar. But the point was "why" Seq is restricted (as already stated in the question) and if Seq is easely extendable? – IODEV Mar 27 '12 at 11:53
  • so what you basicly mean is that the for/comprehension is not mature enough to be used in production? Is this a library design flaw or a language issue? – IODEV Mar 27 '12 at 12:05
  • No, it only means that `for`construction require a sequence of elements and building such a construction cannot be done with the size you expect, due to memory limitation (see the last version of my answer). – Nicolas Mar 27 '12 at 12:11
  • 1
    No, for-comprehensions are just fine for non-performance-critical sections of you application. It is slower than a loop, but also more powerful. If you work with monads for-comprehensions rock, but not for iterating over large collections. – drexin Mar 27 '12 at 12:11
  • @Nicolas for does not require a sequence, but a monad. – drexin Mar 27 '12 at 12:13
  • Neither a Seq, nor a Monad, as monad is not a `trait` in scala. Actually, I think it requires something that can be transfromed into an `Iterator` (to be checked) – Nicolas Mar 27 '12 at 12:18
  • @drexin: That quote goes strait into our design guide lines. Thanks! IMO - this should also be addressed in some kind of official "Scala Design Guide" – IODEV Mar 27 '12 at 12:29
  • 2
    @Nicolas You are right, Monad is not a type in Scala, but a `for(...) yield` takes anything that has `map` and `flatMap` implemented, a simple `for` without `yield` takes anything, that has foreach implemented. You might want to read this article on Monads in Scala: http://james-iry.blogspot.de/2007/09/monads-are-elephants-part-1.html – drexin Mar 27 '12 at 12:35
  • Hum yes, it works, I was pretty sure it wasn't implemented thanks to strutural types. – Nicolas Mar 27 '12 at 12:45
8

Short answer- it appears to be a "feature" - at least, it's working as designed.

As @drexin pointed out, the implementation of "to" is limited to having an Int range. However...

The problem is that NumericRange[T].count(), .numRangeElements, and .length() returns an Int - regardless of what T is. In this case, it's a NumericRange[Long], where it seems a bit wrong to have count() limited to 31 bits, IMHO.
However...

From browsing Jira issues, this appears to be working as designed. See, e.g., SI-4370. But just to be sure it's been thought out from this perspective, I entered SI-5619.

Ed Staub
  • 15,480
  • 3
  • 61
  • 91
  • It is not limited to a Range, that is just what you create. You can pass any monad to a for-comprehension. – drexin Mar 27 '12 at 12:17
  • @Ed Staub - that is exactly my point i.e. why is range count() limited to Int! IMHO it's quite odd but the big question remains: is it a bug or feature ;-) – IODEV Mar 27 '12 at 12:18
  • @drexin: any idea why the range count defaults to Int? – IODEV Mar 27 '12 at 12:21
  • Maybe because of the compatibility to the 32bit JVM. But that is just a guess. – drexin Mar 27 '12 at 12:29
  • @IODEV your update is also not entirely true, because you can have a Range of Long's, as long as you don't have more than Int.MaxValue elements in it. For example `0L to 100000000000L by 1000000` works just fine. – drexin Mar 27 '12 at 12:43
  • @drexin - I'm referring to the specific error IODEV ran into. It is possible to use a monad with a for loop with more than 2 gigaiterations - it's just this particular one (NumericRange) that has the limitation - not to say there aren't others with the same limit. – Ed Staub Mar 27 '12 at 12:46
  • Any genral ideas/thoughts: is the Int.MaxValue limitation if NumericRange a design "flaw" that ought to be improved? – IODEV Mar 27 '12 at 13:04
  • 3
    It's definitively a flaw. since you can build such a Range but it will fails as soon as you call a method that require to check its length. – Nicolas Mar 27 '12 at 13:13
  • @Nicolas It is definitely not a flaw. `0L to 100000000000L` is invalid, but `0L to 100000000000L by 1000000` is valid, so no exception can be thrown in the first case, or the second case would not be possible. The `Int` limit itself is related to performance, `size`, `apply` on `Seq` and Java's own limitations, in particular size of `Array`. – Daniel C. Sobral Mar 27 '12 at 18:15
  • @Daniel An exection can be thrown if we do not lazily eval `count`. Otherwise, we can as well try to limit the use of limit of `length`(especially in the foreach case) – Nicolas Mar 27 '12 at 18:36
  • 1
    @Nicolas That's the point. An exception WILL be thrown if `count` is not lazily evaluated, but that's WRONG for `0L to 100000000000L by 1000000`. And while `length` can behave somewhat differently (and does, for infinite collections), `apply` cannot. Though, to be sure, one could theoretically exceed `Int.MaxValue` elements with an infinite `Stream`, so there's certainly some contradiction there. – Daniel C. Sobral Mar 27 '12 at 18:47
  • @DanielC.Sobral No, it's not the case: `(0L to 100000000000L by 1000000).length` works like a charm and it calls `NumericRange.count`. – Nicolas Mar 27 '12 at 18:53
  • Actually, it really seems that failing during the initialization of `NumericRange.count` will exactly does what we expect. But for sure, it doesn't solve the consistency issue we have with infinite `Stream`. – Nicolas Mar 27 '12 at 18:57
  • @Nicolas Well, go ahead and write the patch. – Daniel C. Sobral Mar 27 '12 at 18:57
  • @DanielC.Sobral Do you think the performance penatly of evaluating the length during the initialization is acceptable? It's the last doubt I have about it. – Nicolas Mar 27 '12 at 19:00
  • @Nicolas - I'm afraid I don't understand what the proposed patch is. But re the performance penalty: I wouldn't worry about it, because count()/length() is used so ubiquitously inside the class. equals(), toString, foreach()... it's hard to imagine someone wanting to use an object that is so fragile, just waiting for the wrong method to be called to break down in hysterics ;-). – Ed Staub Mar 27 '12 at 19:41
  • @Nicolas The performance penalty of a `lazy val` far _surpasses_ the performance penalty of precomputing the size, so it would be _faster_ not to make it a `lazy val`. – Daniel C. Sobral Mar 27 '12 at 19:52
  • I detailed what is the issue according to me and some alternatives there: https://issues.scala-lang.org/browse/SI-5622 – Nicolas Mar 27 '12 at 19:56
2

You cannot count elements as long as their count doesn't fit in Int, because length is declared to return Int, but here is the shortcut: you can create iterators with any actual size, as long as you don't try to count them.

scala> def longRange(first: Long, last: Long) = new Iterator[Long] {
    private var i = first
    def hasNext = i < last
    def next = {val r = i; i += 1; r}
}
longRange: (from: Long, to: Long)java.lang.Object with Iterator[Long]

scala> val lol = longRange(0, Long.MaxValue) map (x => x * x)
lol: Iterator[Long] = non-empty iterator

scala> lol drop 5 take 5 foreach println
25
36
49
64
81
om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
Display Name
  • 8,022
  • 3
  • 31
  • 66
  • Works well on Scala 2.9, Scala 2.10 shouts the following: scala> def longRange(from: Long, to: Long) = new Iterator[Long] { | private var i: Long = from | def hasNext = i < to | def next = {val r = i; i += 1; r} | } :10: error: overloaded method value < with alternatives: ... (x: Long)Boolean (x: Int)Boolean (x: Char)Boolean (x: Short)Boolean (x: Byte)Boolean cannot be applied to (scala.collection.immutable.IndexedSeq[Long]) def hasNext = i < to ^ – Ochoto Feb 22 '13 at 14:01
  • Seems like Scala compiler is confused by `to` word, because there is a method with such name. Maybe something went wrong in their syntax definition. BTW, I started to think that this part of scala syntax (allowing "space style" method calls instead of "dot style" for all names, not only "special symbols" like `+`, `++`, `::`, etc.) is evil. Here is working example: http://ideone.com/Z5989Q, and if we rename `end` to `to`, it doesn't compile: http://ideone.com/6FADFE. Maybe this is legal subject to report a bug. – Display Name Feb 23 '13 at 06:30
2

The methods size and length return an Int, so it would not be possible for them to return a value greater than Int.MaxValue. On Seq, as well, the apply method takes an Int, suffering from the same problem. The Scala collections, like the Java collections, are therefore limited to Int.MaxValue elements.

Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
  • So in order to cope with Seq collections larger than Int.MaxSize we need a BigSeq? ;-) – IODEV Mar 28 '12 at 13:15
  • 2
    @IODEV There's a practical matter of usefulness of collections that large. I think the real problem is that, perhaps, ranges shouldn't be collections, just have conversions _into_ collections. – Daniel C. Sobral Mar 28 '12 at 14:12