0

I am using Spark 1.3, HBase 1.1 and Phoenix 4.4. I have this in my code:

val dataframe = sqlContext.createDataFrame(rdd, schema)
dataframe.save("org.apache.phoenix.spark", SaveMode.Overwrite,
    Map("table" -> "TEST_SCHEMA.TEST_HTABLE", "zkUrl" -> zkQuorum))

CREATED_DATE is always set to DateTime.now() in the dataframe.

I don't want the field to be updated if the row already exists in HBase, yet there's an update in other fields.

I can achieve it using HBase's checkAndPut: Put all the fields and use checkAndPut on created_date field.

But how do I do that using Phoenix-Spark API? Should I use HBase API instead?

sophie
  • 991
  • 2
  • 15
  • 34
  • if you have advance level questions regarding phoenix spark integration, i would suggest you should join Phoenix mailing list. There are much more people to answer these kind of questions. – Anil Gupta Jul 10 '15 at 05:25
  • When I tried to post my question there, it opens up my mail (To:user@phoenix.apache.org) and I got a return email from Apache Mailer-daemon -fail. – sophie Jul 11 '15 at 08:41
  • did you subscribe to mailing list? You will need to subscribe it before sending email to it. – Anil Gupta Jul 11 '15 at 21:14

1 Answers1

0

Approach1: In this case, you should check whether the row exists or not. If the row exists, then remove CREATED_DATE column from your dataframe.
Approach2: If you cant remove CREATED_DATE column from dataframe, then you will need to write a prePut Coprocessor that will run before doing any puts on the region server. This appraoch will be slightly harder. So, i would suggest 1st approach.

Anil Gupta
  • 1,116
  • 2
  • 8
  • 13
  • I can't do the first approach since I am applying only one schema to the dataframe before saving – sophie Jul 10 '15 at 05:40
  • @sophie: if you think that my reply answered your question then acknowledge by accepting answer. – Anil Gupta Jul 10 '15 at 06:33
  • Hi Anil, I'm not sure how Coprocessors work. But I am able to achieve it using HBase's checkAndPut. I'm just not sure how to implement it using Phoenix-Spark API. – sophie Jul 11 '15 at 15:40