2

I have a spark (1.2.1 v) job that inserts a content of an rdd to postgres using postgresql.Driver for scala. In postgres I have some functions, such as "CDB_LatLng", that need to be triggered by the inserts (that particular function calculate 'the_geom' column for CartoDB tables).

I want to use COPY to insert the data to postgres:

def copyIn(url: String, username: String, password: String, tableName: String, reader: java.io.Reader, columnStmt: String = "") = {
        //connect to postgres database on the localhost
        val driver = "org.postgresql.Driver"
        var connection:Connection = null
        Class.forName(driver)
        connection = DriverManager.getConnection(url, username, password)

        try {
            connection.unwrap(classOf[PGConnection]).getCopyAPI.copyIn(s"COPY $tableName ($columnStmt) FROM STDIN (FORMAT CSV, DELIMITER ';'))", reader)
        } catch {
            case se: SQLException => println(se.getMessage)
            case t: Throwable => println(t.getMessage)
        } finally {
            connection.close()
        }
}

squares_rdd.foreachPartition(iter => {
        val sb = new StringBuilder()
        var keys : String = ""

        iter.foreach(row => {
            val mapRequest = Utils.getSquaresInsertMap(geoSelectMap, row)
            val mapValues = mapRequest.values.mkString("; ")
            sb.append(mapValues).append("\n")

            if(keys == ""){
                keys = mapRequest.keySet.mkString(", ")
            }

        })

        copyIn(url, username, password, squares_table,  new StringReader(sb.toString), keys)
        sb.clear
  })

When I trying to use it I get an error that column 'the_geom' cannot receive String data "CDB_LatLng(x,y)"... The COPY command should fire triggers like the INSERT command as described in the documentation, so is it possible to use functions on columns inside a COPY and how?

  • 1
    @zero323 - but if I use trigger I wouldn't get any benefits out of COPY... I could use trigger with INSERT to, but it faster as a function, no? –  Jan 05 '16 at 17:49
  • 1
    Can you copy to a temp table, and then insert the contents of the temp table, transforming with the function at the same time? – BillRobertson42 Jan 05 '16 at 18:35
  • You right. Scratch that. – zero323 Jan 05 '16 at 20:11
  • 1
    @Bill - you mean that I should use COPY to insert the "CDB_LatLng(x,y)" column as String to temp table and then perform another insertion from the temp table to the target table? I guess it would solve my issue but is it faster then just use regular insertion with the function row by row? –  Jan 05 '16 at 21:58
  • 1
    @Bill she could do that but then again this process is time consuming and won't help – itai Jan 06 '16 at 09:23
  • @Rada You would have to test to see if it's faster, but in general issuing a single insert for for N rows is faster than issuing N insert statements. – BillRobertson42 Jan 06 '16 at 14:16

0 Answers0