3

I have a Spark Thrift Server. I connect to the Thrift Server and get data of Hive table. If I query the same table again, it will again load the file in memory and execute the query.

Is there any way I can cache the table data using Spark Thrift Server? If yes, please let me know how to do it

T. Gawęda
  • 15,706
  • 4
  • 46
  • 61
Aditya Calangutkar
  • 486
  • 1
  • 6
  • 21

2 Answers2

2

Two things:

Remember that caching is lazy, so it will be cached during first computation

T. Gawęda
  • 15,706
  • 4
  • 46
  • 61
0

Pay attention that memory could be consumed by the Driver, not the executor (depend on your settings, local/cluster ...), so don't forget to allocate more memory to your driver.

To put in data:

CACHE TABLE today AS
SELECT * FROM datahub WHERE year=2017 AND fullname IN ("api.search.search") LIMIT 40000

Start by limiting the data, then look how memory is consumed to avoid OOM exception.

Spark history web UI

Thomas Decaux
  • 21,738
  • 2
  • 113
  • 124