0

I have a problem. I have a Spark RDD that I have to store inside an HBase table. We use the Apache-phoenix layer to dialog with the database. There a column of the table that is defined as an UNSIGNED_SMALLINT ARRAY:

CREATE TABLE EXAMPLE (..., Col10 UNSIGNED_SMALLINT ARRAY, ...);

As stated in the Phoenix documentation, that you can fine here, ARRAY data type is backend up by the java.sql.Array.

I'm using the phoenix-spark plugin to save data of the RDD inside the table. The problem is that I don't know how to create an instance of java.sql.Array, not having any kind of Connection object. An extract of the code follows (code is in Scala language):

// Map RDD into an RDD of sequences or tuples
rdd.map {
  value =>
    (/* ... */
     value.getArray(),   // Array of Int to convert into an java.sql.Array
     /* ... */
    )
}.saveToPhoenix("EXAMPLE", Seq(/* ... */, "Col10", /* ... */), conf, zkUrl)

Which is the correct way of go on? Is there a way to do want I need?

riccardo.cardin
  • 7,971
  • 5
  • 57
  • 106

1 Answers1

0

The guys at Phoenix have answered via email to the above question. I report the answer to leave the wisdom for the people who will come.

For saving arrays, you can use the plain old scala Array type. You can see the tests for an example: https://github.com/apache/phoenix/blob/master/phoenix-spark/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala#L408-L427

Note that saving arrays is only supported in Phoenix 4.5.0, although the patch is quite small if you need to apply it yourself: https://issues.apache.org/jira/browse/PHOENIX-1968

Nice answer. Thanks to guys at Phoenix.

riccardo.cardin
  • 7,971
  • 5
  • 57
  • 106