Spark ClosedChannelException exception during parquet write

Question

We have this massive legacy sql table that we need to extract data out of it and pushing it to s3. Below is how I'm querying a portion of data and writing output.

  def writeTableInParts(tableName: String, numIdsPerParquet: Long, numPartitionsAtATime: Int, startFrom : Long = -1, endTo : Long = -1, filePrefix : String = s3Prefix) = {
    val minId : Long = if (startFrom > 0) startFrom else findMinCol(tableName, "id")
    val maxId : Long = if (endTo > 0) endTo else findMaxCol(tableName, "id")

    (minId until maxId by numIdsPerParquet).toList.sliding(numPartitionsAtATime, numPartitionsAtATime).toList.foreach(list => {
      list.map(start => {
          val end = math.min(start + numIdsPerParquet, maxId)

          sqlContext.read.jdbc(mysqlConStr,
            s"(SELECT * FROM $tableName WHERE id >= ${start} AND id < ${end}) as tmpTable",
            Map[String, String]())
        }).reduce((left, right) => {
          left.unionAll(right)
        })
        .write
        .parquet(s"${filePrefix}/$tableName/${list.head}-${list.last + numIdsPerParquet}")
    })
  }

This has worked well for many different tables but for whatever reason a table continues to get java.nio.channels.ClosedChannelException no matter how much I reduce the scanning window or size.

based on this answer I guess I have exception somewhere in my code but I'm not sure where it would be as it is a rather simple code. How can I further debug this exception? logs didn't have anything very helfpul and doen't reveal the cause.

score 0 · Answer 1 · edited May 23 '17 at 11:45

0

Problem was due to below error, not spark related... It was very cumbersome to chase this down as spark isn't too good at displaying errors. Darn...

'0000-00-00 00:00:00' can not be represented as java.sql.Timestamp error

edited May 23 '17 at 11:45

Community

1
1

answered Apr 19 '16 at 19:20

jk-kim

1,136
3
12
20

Spark ClosedChannelException exception during parquet write

1 Answers1