0

I am trying to create a cached table in shark-0.8.0. As per the documentation (https://github.com/amplab/shark/wiki/Shark-User-Guide) , I created table as follows:

CREATE TABLE mydata_cached (
  artist string,
  title string ,
    track_id string,
    similars array<array<string>>,
    tags array<array<string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
TBLPROPERTIES('shark.cache' = 'MEMORY');

The table is created and I am able to load the data using LOAD DATA command. But when I try to query the table, even a SELECT COUNT(1) statement is failing with the following error:

shark> select count(1) from mydata_cached;                                                
shark.memstore2.CacheType$InvalidCacheTypeException: Invalid string representation of cache type MEMORY
    at shark.memstore2.CacheType$.fromString(CacheType.scala:48)
    at shark.execution.TableScanOperator.execute(TableScanOperator.scala:119)
    at shark.execution.Operator$$anonfun$executeParents$1.apply(Operator.scala:115)
    at shark.execution.Operator$$anonfun$executeParents$1.apply(Operator.scala:115)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
    at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:47)
    at shark.execution.Operator.executeParents(Operator.scala:115)
    at shark.execution.UnaryOperator.execute(Operator.scala:187)
    at shark.execution.Operator$$anonfun$executeParents$1.apply(Operator.scala:115)
    at shark.execution.Operator$$anonfun$executeParents$1.apply(Operator.scala:115)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
    at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:47)
    at shark.execution.Operator.executeParents(Operator.scala:115)
    at shark.execution.UnaryOperator.execute(Operator.scala:187)
    at shark.execution.Operator$$anonfun$executeParents$1.apply(Operator.scala:115)
    at shark.execution.Operator$$anonfun$executeParents$1.apply(Operator.scala:115)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
    at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:47)
    at shark.execution.Operator.executeParents(Operator.scala:115)
    at shark.execution.UnaryOperator.execute(Operator.scala:187)
    at shark.execution.Operator$$anonfun$executeParents$1.apply(Operator.scala:115)
    at shark.execution.Operator$$anonfun$executeParents$1.apply(Operator.scala:115)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
    at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:47)
    at shark.execution.Operator.executeParents(Operator.scala:115)
    at org.apache.hadoop.hive.ql.exec.GroupByPostShuffleOperator.execute(GroupByPostShuffleOperator.scala:194)
    at shark.execution.Operator$$anonfun$executeParents$1.apply(Operator.scala:115)
    at shark.execution.Operator$$anonfun$executeParents$1.apply(Operator.scala:115)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
    at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:47)
    at shark.execution.Operator.executeParents(Operator.scala:115)
    at shark.execution.UnaryOperator.execute(Operator.scala:187)
    at shark.execution.Operator$$anonfun$executeParents$1.apply(Operator.scala:115)
    at shark.execution.Operator$$anonfun$executeParents$1.apply(Operator.scala:115)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
    at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:47)
    at shark.execution.Operator.executeParents(Operator.scala:115)
    at shark.execution.FileSinkOperator.execute(FileSinkOperator.scala:120)
    at shark.execution.SparkTask.execute(SparkTask.scala:101)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1312)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1104)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937)
    at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:294)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
    at shark.SharkCliDriver$.main(SharkCliDriver.scala:203)
    at shark.SharkCliDriver.main(SharkCliDriver.scala)
FAILED: Execution Error, return code -101 from shark.execution.SparkTask

As per the code at GitHub (https://github.com/amplab/shark/blob/master/src/main/scala/shark/memstore2/CacheType.scala), the option MEMORY is a valid one. I also tried with MEMORY_ONLY option and it is giving me the same error. Any suggestions or thoughts about what's going wrong here?

Thanks, TM

visakh
  • 2,503
  • 8
  • 29
  • 55

1 Answers1

1

Needs to be:

TBLPROPERTIES('shark.cache' = 'MEMORY_ONLY')
WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
  • Hi...I tried `MEMORY` as well as `MEMORY_ONLY` and it was the same error for both... – visakh May 13 '14 at 07:02
  • ah, I see that now in your OP as well. Sorry, no further suggestions. – WestCoastProjects May 13 '14 at 07:09
  • what's the best way to see whether caching is working fine? Say I create a table with `_cached` in the table name. If I see the Hive directory (somewhere in `/user/hive/warehouse`) created for the table and if it's empty, then can I conclude that the table has been cached successfully? – visakh May 13 '14 at 07:18
  • That sounds reasonable. Honestly i have always done BOTH so do not know precisely what artifacts are generated for memory only – WestCoastProjects May 13 '14 at 07:34