Getting ArrayIndexOutOfBoundsException while splitting record from a file in Scala

Question

My file contains records like this:

11001^1^100^2015-06-05 22:35:21.543^<d><nv n="ExtStationID" v="Station/FYI Television, Inc./25102" /><nv n="MediaDesc" v="19b8f4c0-92ce-44a7-a403-df4ee413aca9" /><nv n="ChannelNumber" v="1366" /><nv n="Duration" v="24375" /><nv n="IsTunedToService" v="True" /><nv n="StreamSelection" v="FULLSCREEN_PRIMARY" /><nv n="ChannelType" v="LiveTVMediaChannel" /><nv n="TuneID" v="636007629215440000" /></d>^0122648d-4352-4eec-9327-effae0c34ef2^2016060601

I am supposed to split the file with the character ^. But I am getting ArrayIndexOutOfBoundsException error:

Here is my program:

val spark = SparkSession.builder().appName("KPI 1").master("local").getOrCreate()
val data = spark.read.textFile("/some/path/to/Set_Top_Box_Data.txt").rdd

val raw = data.map{ record =>
  val rec = record.trim().toString.split("^")
  (rec(0),rec(2))
}
raw.collect().foreach(println)
spark.stop

And here is the associated error trace:

18/03/15 13:38:33 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ArrayIndexOutOfBoundsException: 2
    at KPI1.FilterForId1001$$anonfun$1.apply(FilterForId1001.scala:12)
    at KPI1.FilterForId1001$$anonfun$1.apply(FilterForId1001.scala:11)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
    at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    ...

score 3 · Accepted Answer · answered Mar 15 '18 at 08:17

3

You should split this way:

.split("\\^")

instead of

split("^")

answered Mar 15 '18 at 08:17

Xavier Guihot

54,987
21
291
190

score 2 · Answer 2 · answered Mar 15 '18 at 08:17

2

^ is a regex that denotes start of the line.

You need to escape it, and you should be good.

val raw = data.map{record=>{ val rec = record.trim().toString.split("\\^")
  (rec(0),rec(2))
}}

answered Mar 15 '18 at 08:17

philantrovert

9,904
3
37
61

Getting ArrayIndexOutOfBoundsException while splitting record from a file in Scala

2 Answers2