16

This is a bit of a general question but I was wondering if anybody could advise me on what would be advantages of working with Array vs ArraySeq. From what I have seen Array is scala's representation of java Array and there are not too many members in its API whereas ArraySeq seems to contain a much richer API.

Mario Galic
  • 47,285
  • 6
  • 56
  • 98
spydadome
  • 325
  • 2
  • 9

4 Answers4

48

There are actually four different classes you could choose from to get mutable array-like functionality.

Array + ArrayOps
WrappedArray
ArraySeq
ArrayBuffer

Array is a plain old Java array. It is by far the best way to go for low-level access to arrays of primitives. There's no overhead. Also it can act like the Scala collections thanks to implicit conversion to ArrayOps, which grabs the underlying array, applies the appropriate method, and, if appropriate, returns a new array. But since ArrayOps is not specialized for primitives, it's slow (as slow as boxing/unboxing always is).

WrappedArray is a plain old Java array, but wrapped in all of Scala's collection goodies. The difference between it and ArrayOps is that WrappedArray returns another WrappedArray--so at least you don't have the overhead of having to re-ArrayOps your Java primitive array over and over again for each operation. It's good to use when you are doing a lot of interop with Java and you need to pass in plain old Java arrays, but on the Scala side you need to manipulate them conveniently.

ArraySeq stores its data in a plain old Java array, but it no longer stores arrays of primitives; everything is an array of objects. This means that primitives get boxed on the way in. That's actually convenient if you want to use the primitives many times; since you've got boxed copies stored, you only have to unbox them, not box and unbox them on every generic operation.

ArrayBuffer acts like an array, but you can add and remove elements from it. If you're going to go all the way to ArraySeq, why not have the added flexibility of changing length while you're at it?

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
9

Array is a direct representation of Java's Array, and uses the exact same bytecode on the JVM.

The advantage of Array is that it's the only collection type on the JVM to not undergo type erasure, Arrays are also able to directly hold primitives without boxing, this can make them very fast under some circumstances.

Plus, you get Java's messed up array covariance behaviour. (If you pass e.g. an Array[Int] to some Java class it can be assigned to a variable of type Array[Object] which will then throw an ArrayStoreException on trying to add anything that isn't an int.)

ArraySeq is rarely used nowadays, it's more of a historic artifact from older versions of Scala that treated arrays differently. Seeing as you have to deal with boxing anyway, you're almost certain to find that another collection type is a better fit for your requirements.

Otherwise... Arrays have exactly the same API as ArraySeq, thanks to an implicit conversion from Array to ArrayOps.

Unless you have a specific need for the unique properties of arrays, try to avoid them too. See This Talk at around 19:30 or This Article for an idea of the sort of problems that Arrays can introduce.

After watching that video, it's interesting to note that Scala uses Seq for varargs :)

Kevin Wright
  • 49,540
  • 9
  • 105
  • 155
8

From the scala-lang.org forum:

Array[T] - Benefits: Native, fast - Limitations: Few methods (only apply, update, length), need to know T at compile-time, because Java bytecode represents (char[] different from int[] different from Object[])

ArraySeq[T] (the class formerly known as GenericArray[T]): - Benefits: Still backed by a native Array, don't need to know anything about T at compile-time (new ArraySeq[T] "just works", even if nothing is known about T), full suite of SeqLike methods, subtype of Seq[T] - Limitations: It's backed by an Array[AnyRef], regardless of what T is (if T is primitive, then elements will be boxed/unboxed on their way in or out of the backing Array)


ArraySeq[Any] is much faster than Array[Any] when handling primitives. In any code you have Array[T], where T isn't <: AnyRef, you'll get faster performance out of ArraySeq.

Vasil Remeniuk
  • 20,519
  • 6
  • 71
  • 81
  • 2
    You surely have that backwards -- as your citation says, ArraySeq boxes and unboxes primitives. – Jim Balter Feb 17 '11 at 12:28
  • @Jim - So does `Array[Any]`, it just does it more slowly. To get the performance out of `Array` it must be typed as `Array[Int]`, or whatever other primitive you're actually using. – Kevin Wright Feb 17 '11 at 12:45
  • @Kevin But he said "In any code you have Array[T], where T isn't <: AnyRef, you'll get faster performance out of ArraySeq" ... which isn't true if T is Int. – Jim Balter Feb 17 '11 at 12:47
  • 4
    @Jim - No, Vasil has it right (for most cases). `ArraySeq` boxes primitives on the way in, and stores the boxed primitives. `Array` has to do it on each element access. Thus, `Array` (via implicit conversion through `ArrayOps`) is like a view of boxed primitives, while `ArraySeq` is forced. If you're going through the array less than once, `Array` is better. If you're going through more than once, `ArraySeq` is better. – Rex Kerr Feb 17 '11 at 12:50
  • @Rex "Array[T] - Benefits: Native, fast" ... "To get the performance out of Array it must be typed as Array[Int], or whatever other primitive you're actually using." – Jim Balter Feb 17 '11 at 12:54
  • @Daniel I'm not talking about Array[Any], I'm talking about Array[Int]. C'mon, guys. Array[Int] stores without boxing and can be accessed without unboxing. – Jim Balter Feb 17 '11 at 12:57
  • @Jim - `isn't <: AnyRef` is just a more formal way of stating "is a primitive". This could also have been written `is <: AnyVal` – Kevin Wright Feb 17 '11 at 12:58
  • @Kevin So `is <: AnyVal` means "*not* a primitive"? What an odd notion. – Jim Balter Feb 17 '11 at 13:03
  • @Kevin Hey, I see you edited your misstatement -- but gee, that inverts the meaning, y'know. Vasil said that in code with Array[T] where T "is a primitive" you'll get faster performance with ArraySeq. Is that your claim? – Jim Balter Feb 17 '11 at 13:05
  • @Jim - If you're writing generic code, `def slow[T](a: Array[T])`, and you pass in a primitive, the method is, well, slow, due to boxing/unboxing. – Rex Kerr Feb 17 '11 at 13:06
  • @Rex I see you have written a partial comment -- but I know when you fix it, it won't address what I've actually written. Yup, there it is. It's been fun chatting, guys. – Jim Balter Feb 17 '11 at 13:08
  • @Jim - Crikey, it was a typo, and it exactly addresses what you wrote. It's just, apparently, a difference in interpretation of what `Array[T]` means. I know you know (and you know Kevin and I know) the details of what happens in various cases. We're just hung up on semantics. – Rex Kerr Feb 17 '11 at 13:10
  • @Rex "It's just, apparently, a difference in interpretation of what Array[T] means" -- right, but you guys are intent on quibbling. Bye. – Jim Balter Feb 17 '11 at 13:11
  • P.S. Because of the multiple interpretations, Vasil should clarify his answer, as it is incorrect for, say, `type T = Int; ... Array[T] ...` and could easily be misunderstood. – Jim Balter Feb 17 '11 at 13:14
  • @Jim - The correct interpretation is: "when using a concrete instance of `Array[Any]` to hold variables of some type `T` where `T <: AnyVal` then an `ArraySeq` would be faster" – Kevin Wright Feb 17 '11 at 14:40
  • @Kevin That may be correct language, but it's ludicrous to say that it's "the correct interpretation" of what was written. By that view, it would never be necessary to correct anything. – Jim Balter Feb 18 '11 at 22:05
  • @Jim - It's often required, most spoken languages are vague and open to interpretation, a vital characteristic for politicians and stand-up comedians, but less useful for technical specifications. If you yon't believe me, look up how many lojban speakers it takes to change a broken light bulb. – Kevin Wright Feb 20 '11 at 15:31
  • @Kevin Indeed vagueness is not useful for technical specifications which is exactly why I said the answer here should be clarified. I'm so glad that we agree. Now let's put this nonsense to rest. – Jim Balter Feb 20 '11 at 20:23
0

As you observed correctly, ArraySeq has a richer API as it is derived from IndexedSeq (and so on) whereas Array is a direct representation of Java arrays.

The relation between the both could be roughly compared to the relation of the ArrayList and arrays in Java.

Due to it's API, I would recommend using the ArraySeq unless there is a specific reason not to do so. Using toArray(), you can convert to an Array any time.

Mathias Weyel
  • 809
  • 6
  • 18