0

I'm using UR and I'm wondering, why during training it most of time use only 4 cores, when 8 is available

Most of time training are stuck in Job 17 "take at EsSpark.scala:60" Stage45 "flatMap at AtB.scala:234"

Can someone explain what it exactly do at this Stage and can it use all 8 cores (not only 4) ?

I suppose, it not enough RAM, because it use all avaliable 64 G RAM, but when I ran this on a cluster of spark (2 instances) the situation does not change, it also use 4 cores in sum.

cpu usage

Igor
  • 355
  • 2
  • 14
  • it can be because on splitting table on 4 parts: ` 16/11/11 11:37:19 INFO TableInputFormatBase: Input split length: 8.5 G bytes. 16/11/11 11:37:19 INFO TableInputFormatBase: Input split length: 4.2 G bytes. 16/11/11 11:37:19 INFO TableInputFormatBase: Input split length: 1.3 G bytes. 16/11/11 11:37:19 INFO TableInputFormatBase: Input split length: 7.2 G bytes.` But how can I change this behaviour? – Igor Nov 11 '16 at 18:01

1 Answers1

0

The problem was, that Prediction IO use TableInputFormatBase, which by default make RDD with number of partitions = # regions in hbase

So to increase number of partitions of RDD, it is posible to split regions in hbase or use RDD.partition(#of partitions)

Igor
  • 355
  • 2
  • 14