Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).
Questions tagged [hadoop-partitioning]
339 questions
1
vote
1 answer
hive explain plan not showing partition
I have a table which contains 251M records and size is 2.5gb.
I created a partition on two columns which I am doing condition in predicate.
But the explain plan is not showing it is reading partition even though I have partitioned. With…

D Mishra
- 35
- 4
1
vote
0 answers
How to arrange multi partitions in hive?
say i have a order table, which contains multi time column(spend_time,expire_time,withdraw_time),
usually,i will query the table with the above column independently,so how do i create the partitions?
order_no | spend_time | expire_time |…

lei yu
- 58
- 6
1
vote
1 answer
_temporary directory is not getting deleted from output location when mapreduce job is completed
I am parsing a data in order to get some sense out of it through MapReduce job. The parsed data comes in form of batches. It is further loaded to hive external table through spark streaming job. This is a real time process. Now an unusual event was…

Mohit Sudhera
- 341
- 1
- 4
- 16
1
vote
1 answer
MAX(Count) function apache pig latin
This below program I am trying to do it in Apache Pig as it is and unstructured data
i) I have dataset which contains street name, city and state:
ii) Group by state
iii) I am taking COUNT(*) of states in the dataset Now my o/p will be like…

sivaraj
- 49
- 1
- 5
1
vote
0 answers
Why and What changes should be done to Driver class in mapreduce program when using stringtokenizer instead of split()
I am new to java and hadoop. I was practicing mapreduce wordcount example where I came across 2 way of splitting the line in mapper class.
1st one
public class WordCountMapper extends
Mapper…

Vidya
- 154
- 1
- 17
1
vote
1 answer
How to rename all partition columns in hive
When I am trying to rename all partition columns in an existing table for date range of one year which are partitioned - this is what I am getting.
hive> ALTER TABLE test.usage PARTITION ('date') RENAME TO PARTITION (partition_date);
FAILED:…

hadoop
- 45
- 2
- 5
1
vote
1 answer
How to overwrite columns value by selecting another columns in partition table in hive
Hi how to overwrite columns value by selecting same partition table in hive.
I have created table by executing below query
CREATE TABLE user (fname string,lname string) partitioned By (day int);
And i insert the data , after inserting data into…

Sai
- 1,075
- 5
- 31
- 58
1
vote
1 answer
How to merge small files from existing partitions in hive?
How to merge existing Partition small files into one large file in one of the Partition .
For example I have a table user1, it contain columns fname,lname and partition column is day.
I have created table by using below script
CREATE TABLE…

Sai
- 1,075
- 5
- 31
- 58
1
vote
1 answer
who will create the block ids for blocks in hadoop?
I wanted to know who will create the block ids for blocks in hadoop either HDFS client or Name node.Please let me know.

sidhartha pani
- 623
- 2
- 12
- 23
1
vote
1 answer
Who will update metdata in Name node in Hadoop?
In case of HDFS writes how metadata is being updated in Name node. Once client writes the data to the Data nodes. Either Data nodes or HDFS client will update the metadata in Name node.

sidhartha pani
- 623
- 2
- 12
- 23
1
vote
1 answer
Hadoop INFO ipc.Client: Retrying connect to server localhost/127.0.0.1:9000
I read other posts about the HDFS configuration problem with Hadoop. However, none of them was helpful. So, I post my question. I followed this tutorial for hadoop v1.2.1. When I am running hadoop fs -ls command I've got this error:
16/08/29…

Hamid_UMB
- 317
- 4
- 16
1
vote
3 answers
How to reduce number of mappers, when I am running hive query?
I am using hive ,
I have 24 json files with total size of 300MB (in one folder), so I have created one external table(i.e table1) and I loaded the data(i.e 24 files ) Into external table.
When I am running select query on top of that external…

Sai
- 1,075
- 5
- 31
- 58
1
vote
1 answer
how to properly import csv data set using kite-dataset partitioned schema?
I'm working with the publicly-available csv dataset from MovieLens
I have created a partitioned dataset for the ratings.csv:
kite-dataset create ratings --schema rating.avsc --partition-by year-month.json --format parquet
Here is my…

Eugene Goldberg
- 14,286
- 20
- 94
- 167
1
vote
0 answers
how to distribute java pair RDD data based on key to different partitions of RDD
JavaRDD input = xyz.sc.textFile("/home/spark/Documents/XYZ");
JavaRDD infoRDD = input.mapToPair(new
PairFunction(){
public Tuple2 call(String x) {
return new…

gaurav
- 46
- 6
1
vote
2 answers
DELETE FROM table_name Cloudera Impala
I'm new on Impala, and I'm trying to understand how to delete records from a table...
I've tried looking for delete commands, but didn't quite find understandable instructions...
This is my table structure:
create table Installs (BrandID INT,…

Bramat
- 979
- 4
- 24
- 40