0

I have two hive tables named omniture3 and omniture4. Size of both the tables is 50*20 each.

I am executing a JOIN query on the tables, but it is taking too long to execute! The Hadoop cluster consists of a single node.

I know hive query takes long time to execute on big tables, so I reduced the table size , but still it is not executing(it has already been more than half hour and it is still stuck).

Following is the query:

SELECT omniture3.*, omniture4.* FROM omniture3 JOIN omniture4 on (omniture3.id = omniture4.id) LIMIT 10;

How much time do these queries generally take for the given size of tables, and if this time is normal how can I speed up the process?

Edit: I have a cluster with 2 nodes(1 datanode and 1 namenode) and each node is allocated 4GB of RAM)

Shubham Gupta
  • 414
  • 7
  • 19
  • if you have partition in table then try to add filter condition on partition column. – vishnu sable May 31 '19 at 09:42
  • @vishnusable but shall this command be taking this much time for such small table? – Shubham Gupta May 31 '19 at 09:45
  • Possible duplicate of [Why is hadoop slow for a simple hello world job](https://stackoverflow.com/questions/54953620/why-is-hadoop-slow-for-a-simple-hello-world-job) – tk421 May 31 '19 at 16:13
  • @tk421 I have a cluster with two nodes... one name node and one data node and each of them have a space of 4GB ram allocated to each of them... Is it too less? If yes, how much space or nodes shall I increase? – Shubham Gupta Jun 03 '19 at 05:10
  • 4GB of ram is pretty small. Refer to https://stackoverflow.com/questions/49793791/spark-working-faster-in-standalone-rather-than-yarn to get a sense of a cluster you can run performance tests against. – tk421 Jun 03 '19 at 16:28

0 Answers0