We have this massive legacy sql table that we need to extract data out of it and pushing it to s3. Below is how I'm querying a portion of data and writing output.
def writeTableInParts(tableName: String, numIdsPerParquet: Long, numPartitionsAtATime: Int, startFrom : Long = -1, endTo : Long = -1, filePrefix : String = s3Prefix) = {
val minId : Long = if (startFrom > 0) startFrom else findMinCol(tableName, "id")
val maxId : Long = if (endTo > 0) endTo else findMaxCol(tableName, "id")
(minId until maxId by numIdsPerParquet).toList.sliding(numPartitionsAtATime, numPartitionsAtATime).toList.foreach(list => {
list.map(start => {
val end = math.min(start + numIdsPerParquet, maxId)
sqlContext.read.jdbc(mysqlConStr,
s"(SELECT * FROM $tableName WHERE id >= ${start} AND id < ${end}) as tmpTable",
Map[String, String]())
}).reduce((left, right) => {
left.unionAll(right)
})
.write
.parquet(s"${filePrefix}/$tableName/${list.head}-${list.last + numIdsPerParquet}")
})
}
This has worked well for many different tables but for whatever reason a table continues to get java.nio.channels.ClosedChannelException
no matter how much I reduce the scanning window or size.
based on this answer I guess I have exception somewhere in my code but I'm not sure where it would be as it is a rather simple code. How can I further debug this exception? logs didn't have anything very helfpul and doen't reveal the cause.