Questions tagged [hadoop-partitioning]

Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).

339 questions
1
vote
1 answer

hive explain plan not showing partition

I have a table which contains 251M records and size is 2.5gb. I created a partition on two columns which I am doing condition in predicate. But the explain plan is not showing it is reading partition even though I have partitioned. With…
D Mishra
  • 35
  • 4
1
vote
0 answers

How to arrange multi partitions in hive?

say i have a order table, which contains multi time column(spend_time,expire_time,withdraw_time), usually,i will query the table with the above column independently,so how do i create the partitions? order_no | spend_time | expire_time |…
lei yu
  • 58
  • 6
1
vote
1 answer

_temporary directory is not getting deleted from output location when mapreduce job is completed

I am parsing a data in order to get some sense out of it through MapReduce job. The parsed data comes in form of batches. It is further loaded to hive external table through spark streaming job. This is a real time process. Now an unusual event was…
Mohit Sudhera
  • 341
  • 1
  • 4
  • 16
1
vote
1 answer

MAX(Count) function apache pig latin

This below program I am trying to do it in Apache Pig as it is and unstructured data i) I have dataset which contains street name, city and state: ii) Group by state iii) I am taking COUNT(*) of states in the dataset Now my o/p will be like…
1
vote
0 answers

Why and What changes should be done to Driver class in mapreduce program when using stringtokenizer instead of split()

I am new to java and hadoop. I was practicing mapreduce wordcount example where I came across 2 way of splitting the line in mapper class. 1st one public class WordCountMapper extends Mapper
Vidya
  • 154
  • 1
  • 17
1
vote
1 answer

How to rename all partition columns in hive

When I am trying to rename all partition columns in an existing table for date range of one year which are partitioned - this is what I am getting. hive> ALTER TABLE test.usage PARTITION ('date') RENAME TO PARTITION (partition_date); FAILED:…
hadoop
  • 45
  • 2
  • 5
1
vote
1 answer

How to overwrite columns value by selecting another columns in partition table in hive

Hi how to overwrite columns value by selecting same partition table in hive. I have created table by executing below query CREATE TABLE user (fname string,lname string) partitioned By (day int); And i insert the data , after inserting data into…
Sai
  • 1,075
  • 5
  • 31
  • 58
1
vote
1 answer

How to merge small files from existing partitions in hive?

How to merge existing Partition small files into one large file in one of the Partition . For example I have a table user1, it contain columns fname,lname and partition column is day. I have created table by using below script CREATE TABLE…
Sai
  • 1,075
  • 5
  • 31
  • 58
1
vote
1 answer

who will create the block ids for blocks in hadoop?

I wanted to know who will create the block ids for blocks in hadoop either HDFS client or Name node.Please let me know.
sidhartha pani
  • 623
  • 2
  • 12
  • 23
1
vote
1 answer

Who will update metdata in Name node in Hadoop?

In case of HDFS writes how metadata is being updated in Name node. Once client writes the data to the Data nodes. Either Data nodes or HDFS client will update the metadata in Name node.
sidhartha pani
  • 623
  • 2
  • 12
  • 23
1
vote
1 answer

Hadoop INFO ipc.Client: Retrying connect to server localhost/127.0.0.1:9000

I read other posts about the HDFS configuration problem with Hadoop. However, none of them was helpful. So, I post my question. I followed this tutorial for hadoop v1.2.1. When I am running hadoop fs -ls command I've got this error: 16/08/29…
Hamid_UMB
  • 317
  • 4
  • 16
1
vote
3 answers

How to reduce number of mappers, when I am running hive query?

I am using hive , I have 24 json files with total size of 300MB (in one folder), so I have created one external table(i.e table1) and I loaded the data(i.e 24 files ) Into external table. When I am running select query on top of that external…
Sai
  • 1,075
  • 5
  • 31
  • 58
1
vote
1 answer

how to properly import csv data set using kite-dataset partitioned schema?

I'm working with the publicly-available csv dataset from MovieLens I have created a partitioned dataset for the ratings.csv: kite-dataset create ratings --schema rating.avsc --partition-by year-month.json --format parquet Here is my…
Eugene Goldberg
  • 14,286
  • 20
  • 94
  • 167
1
vote
0 answers

how to distribute java pair RDD data based on key to different partitions of RDD

JavaRDD input = xyz.sc.textFile("/home/spark/Documents/XYZ"); JavaRDD infoRDD = input.mapToPair(new PairFunction(){ public Tuple2 call(String x) { return new…
gaurav
  • 46
  • 6
1
vote
2 answers

DELETE FROM table_name Cloudera Impala

I'm new on Impala, and I'm trying to understand how to delete records from a table... I've tried looking for delete commands, but didn't quite find understandable instructions... This is my table structure: create table Installs (BrandID INT,…
Bramat
  • 979
  • 4
  • 24
  • 40