0

I have a data-set of size 10 Petabytes. My current data is in HBase where I am using Spark HbaseContext but it is not performing well.

Will it be useful to move data from HbaseContext to HiveContext on Spark?

pheeleeppoo
  • 1,491
  • 6
  • 25
  • 29
Amit khandelwal
  • 504
  • 5
  • 6
  • Where did you get `HbaseContext` from? It's part of a HBase connector, isn't it? If so, you won't be able to _just_ switch between the contexts as they are "incompatible". – Jacek Laskowski Mar 05 '18 at 18:43

2 Answers2

0

HiveContext is used to read data from Hive. so, if you switch to HiveContext the data has to be in Hive. I don't think what you are trying will work.

Prashant
  • 702
  • 6
  • 21
0

In my use case, I use mapPartition with a HBase connection inside. The key is just to know how to split.

For scan, you can create your own scanner, with prefix, etc... For get it's even easier. For puts, you can create a list of puts to do then batch insertion.

I don't use any HBaseContext and I have quite good performances on database of 1,2 billion rows.

kulssaka
  • 226
  • 8
  • 27