Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).
Questions tagged [hadoop-partitioning]
339 questions
1
vote
0 answers
How to find list of all Hive tables in a database that are missing compute stats?
As part of my current project, we deployed 100+ hive tables. I am trying to find list of all hive tables in a particular database that are missing compute stats. For an individual table, I used SHOW PARTITIONS table_name. Is there anyway I can find…

vvazza
- 421
- 7
- 21
1
vote
1 answer
How to merge existing hourly partitions to daily partition in hive
My requirement is to merge existing hourly partitions to daily partition for all days.
My partition column is like:
2019_06_22_00, 2019_06_22_01, 2019_06_22_02, 2019_06_22_03..., 2019_06_22_23 => 2019_06_22
2019_06_23_00, 2019_06_23_01,…

bala chandar
- 99
- 6
1
vote
1 answer
unable to access hive table in impala
Unable to access hive table in Impala which has partition create on a date column. The data is inserted using dynamic partition column option.
Now date datatype is not supported in impala. what i should do to access this table in impala. Is there…

Umer
- 25
- 5
1
vote
0 answers
Hive Partition By dynamic value in s3 file name
Assuming an S3 location with required data is of the form:
s3://stack-overflow-example/v1/
where each file title in v1/ is of the form
francesco_{YYY_DD_MM_HH}_totti.csv
and each csv file contains a unix timestamp as a column in each row.
Is it…

pippa dupree
- 155
- 1
- 10
1
vote
0 answers
Generate unique id in MapReduce
I'm comparing two files A & B and extracting columns from A which don't exists in B and adding it to B. When new record is added to B , it should be given an unique id. I'm looking for logic where I can get the total count from B , which is the max …

user2316771
- 111
- 1
- 1
- 11
1
vote
0 answers
No Hash Partitioning when using repartition in spark
The spark doc says that .repartition() returns a new DataFrame, which is by default Hash-Partitioned. But, in the example I am running, as shown below, that's not the case.
rdd=sc.parallelize([('a',22),('b',1),('c',4),('b',1),('d',2),
…

cph_sto
- 7,189
- 12
- 42
- 78
1
vote
1 answer
How do you add partitions to a partitioned table in Presto running in Amazon EMR?
I'm running Presto 0.212 in EMR 5.19.0, because AWS Athena doesn't support the user defined functions that Presto supports. I'm using EMR configured to use the glue schema. I have pre-existing Parquet files that already exist in the correct…

Eddie
- 53,828
- 22
- 125
- 145
1
vote
1 answer
How does Hive partition works
Lets assume the below table:
as schema:
ID,NAME,Country and my partition key is country.
If my query is like:
select * from table where id between 155555756 to 10000000000;
The partition will not work in that case, right? .
On a simple note…

Varshini
- 69
- 10
1
vote
1 answer
Received the following error while running a hive query. What could be the possible reasons for it?
java.sql.SQLException: Error while processing statement: FAILED:
Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Map 1, vertexId=vertex_1538324912862_7122_1_00,
diagnostics=[Task…

Ankit
- 103
- 2
- 12
1
vote
1 answer
Spark sortMergeJoin running continuously
I am joining two dataframes, but the join is not completing and running many hours. Due to this 1 task is running continuously although 199 tasks are completed within seconds.
I tried, repartition and changing the right and left dataframes as well.…

Varun
- 33
- 1
- 8
1
vote
1 answer
Alternative to the default hashpartioner provided with hadoop
I have a hadoop MapReduce program that distributes keys unevenly.
Some reducers end up with two keys, some with one key, and some with none.
how do I force hadoop to distribute each partition with a certain key to a separate reducer. I have nine…

zaranaid
- 65
- 1
- 13
1
vote
1 answer
Inserting Partitioned Data into External Table in Hive
I needed few clarification regarding inserting data into External Table.
I have created an external parquet table, which is partitioned by week pointing to a hadoop location, after this I have moved the data (a .csv file) to that location.
My doubt…

av abhishiek
- 647
- 2
- 11
- 26
1
vote
1 answer
Hadoop-Installation-Multinode
Hi all I am trying to install the multinode hadoop installation. Everything works fine but my nodemanager for yarn is not working. When I looked at the log file for Yarn nodemanager, I got following…

buildengineer
- 11
- 3
1
vote
1 answer
Name clash of getPartition of type Partitioner has the same erasure of type main class in MapReduce, Hadoop
I was trying to write a code that I can customize the Input will go to the reducer according to the length of the character using implementing to the Partition where default Mapper and Reducer, but the following error is coming. I will be thankful…
user8331236
1
vote
2 answers
How data is split into part files in sqoop
I've a doubt how the data is partitioned into part files if the data is skewed. If possible, please help me clarifying this.
Let's say this my department table with department_id as primary key.
mysql> select * from departments;
2 Fitness
3…

iamteja
- 11
- 1
- 5