0

I am using Hive with MapReduce.

I have tried to use a few different configurations (always the same, but using different values). It is creating some mappers, but no reducers.

The configurations that I have set are (I have tried the numeric values for 64MB, 128MB and 256MB):

SET hive.exec.reducers.bytes.per.reducer=134217728;
SET hive.merge.mapfiles=true;
SET hive.merge.mapredfiles=true;
SET hive.merge.size.per.task=134217728;
SET hive.merge.smallfiles.avgsize=67108864;
SET mapred.max.split.size=134217728;
SET parquet.block.size=134217728;
SET dfs.blocksize=134217728;
SET hive.exec.reducers.bytes.per.reducer=134217728;
SET hive.exec.dynamic.partition=true; 
SET hive.exec.dynamic.partition.mode=nonstrict;

The main objective is to run this query the more efficiently possible:

INSERT OVERWRITE TABLE my_table2 PARTITION(partition) SELECT * FROM mytable1;

This is one of the INFO messages on running the Hive query: INFO : Hadoop job information for Stage-1: number of mappers: 675; number of reducers: 0

I have tried to run this query for 4 different sized tables: <100.000 rows, <10.000.000, <100.000.000, >100.000.000 rows (all with more than 20 columns and less than 30 columns).

  • Normally you get that info message quite often with changing numbers, because at first you need a few mappers to transform your data and then you need the reducers to get everything together. Stage-1 is normally to get the data from HDFS. See [here](https://community.hortonworks.com/questions/141606/hive-queries-use-only-mappers-or-only-reducers.html) for example. What is the problem with the resulting table for you after your query is done? Is it that the table is not there? Does it have the wrong data? Is it too slow? – Secespitus Jul 29 '19 at 10:02
  • It is too slow. It is inserting the right data, but too slowly. – Antonio Barroso Jul 29 '19 at 10:33
  • How have you determined that the reason for being too slow is that there are no reducers? What would those reducers do according to your research? Right now it looks like you are stating that your query doesn't use reducers, which by itself is fine. If your problem is that it's taking too long you should analyze which steps are taking long and provide information about how long the different table sizes take and how long they should take. Otherwise people would solve a problem that is not your problem. See also [What is the XY problem?](https://meta.stackexchange.com/q/66377/352819). – Secespitus Jul 29 '19 at 10:41
  • 1
    I was reading a little more, and I have understood that my problem is due to my hardware and not properly hive. It is doing everything properly. Thank you @Secespitus. :) – Antonio Barroso Jul 30 '19 at 10:15

0 Answers0