I have a spark (1.2.1 v) job that inserts a content of an rdd to postgres using postgresql.Driver for scala. In postgres I have some functions, such as "CDB_LatLng", that need to be triggered by the inserts (that particular function calculate 'the_geom' column for CartoDB tables).
I want to use COPY to insert the data to postgres:
def copyIn(url: String, username: String, password: String, tableName: String, reader: java.io.Reader, columnStmt: String = "") = {
//connect to postgres database on the localhost
val driver = "org.postgresql.Driver"
var connection:Connection = null
Class.forName(driver)
connection = DriverManager.getConnection(url, username, password)
try {
connection.unwrap(classOf[PGConnection]).getCopyAPI.copyIn(s"COPY $tableName ($columnStmt) FROM STDIN (FORMAT CSV, DELIMITER ';'))", reader)
} catch {
case se: SQLException => println(se.getMessage)
case t: Throwable => println(t.getMessage)
} finally {
connection.close()
}
}
squares_rdd.foreachPartition(iter => {
val sb = new StringBuilder()
var keys : String = ""
iter.foreach(row => {
val mapRequest = Utils.getSquaresInsertMap(geoSelectMap, row)
val mapValues = mapRequest.values.mkString("; ")
sb.append(mapValues).append("\n")
if(keys == ""){
keys = mapRequest.keySet.mkString(", ")
}
})
copyIn(url, username, password, squares_table, new StringReader(sb.toString), keys)
sb.clear
})
When I trying to use it I get an error that column 'the_geom' cannot receive String data "CDB_LatLng(x,y)"... The COPY command should fire triggers like the INSERT command as described in the documentation, so is it possible to use functions on columns inside a COPY and how?