Questions tagged [hadoop-partitioning]

Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).

339 questions
0
votes
1 answer

Hadoop Total order Partitioning

Why total total order partitioning in hadoop?. Which scenario we need to take total order partitioning ?. My understanding is after multiple reducers, each reducer result will be sorted by key . then why we need to do total order partitioning. Would…
Learn Hadoop
  • 2,760
  • 8
  • 28
  • 60
0
votes
1 answer

custom partitioner in Hadoop error java.lang.NoSuchMethodException:- ()

I am trying to make a custom partitioner to allocate each unique key to a single reducer. this was after the default HashPartioner failed Alternative to the default hashpartioner provided with hadoop I keep getting the following error. It has…
zaranaid
  • 65
  • 1
  • 13
0
votes
0 answers

Isn't the term shuffling in MapReduce misleading?

I think the term shuffle refers to randomly reordering elements in a sequence [1]. Therefore, the first time I saw shuffling in MapReduce, I thought it's trying to uniformly distribute workload to nodes for load balancing purpose. However, after…
Lingxi
  • 14,579
  • 2
  • 37
  • 93
0
votes
0 answers

How handle uneven partition in spark

So i have a data frame which output 300 GB records into S3 . All the data is partitioned in 2K partition. .partitionBy("DataPartition", "PartitionYear", "PartitionStatement") But the issue is some of the partition has very huge data (40GB) and…
0
votes
1 answer

how to constraint hive query file output to be in a single file always

I have created a hive table using below query, and inserting data to this table on daily basis using second query as mentioned below create EXTERNAL table IF NOT EXISTS DB.efficacy ( product string, TP_Silent INT, TP_Active INT, server_date…
Ashish Mittal
  • 643
  • 3
  • 12
  • 32
0
votes
2 answers

Multiple reducers without running partitioner in MapReducer

I am trying to understand the concept of running multiple reducers in MR job and came to know that it is partitioner which decides which (key,value) pairs goes to which reducer. My question is: Can we run multiple reducers without running…
CuriousMind
  • 8,301
  • 22
  • 65
  • 134
0
votes
1 answer

One file per partition (Coalesce per pertition) while inserting data into hive table

I have a table created in hive stored in s3 location. It has about 10 columns and is partitioned on 3 columns month, year and city , in the same order. I am running a spark job that creates a dataframe(2 billion rows) and writes into this table. val…
dreddy
  • 463
  • 1
  • 7
  • 21
0
votes
0 answers

How to get recently created partitions in a Hive table?

I have a table called EMPLOYEE with columns ID, NAME, DESIGNATION, CITY, COUNTRY, CONTINENT. With 3 level partitioning on CONTINENT, COUNTRY, CITY. Now I need to know recently created partitions say after a specific timestamp. Note : Assuming access…
Shashank V C
  • 153
  • 1
  • 1
  • 9
0
votes
1 answer

Can I get access to a full block in the mapper?

Usually record reader passes line by line to the mapper or n lines. Can full block be accessed in the mapper? This means the record reader will give full block to the mapper instead of line by line? Does this approach makes sense? Thanks
shujaat
  • 279
  • 6
  • 17
0
votes
1 answer

How to merge hive partitoned data in one large file?

I have hive table partitioned on date and hour column. when I load the data i will create 24 files. I want merge this 24files in one file. Can anyone suggest me the solution
user3890017
  • 21
  • 1
  • 3
0
votes
0 answers

handle hive partitioned table running beyond physical memory limits

I have to create a hive partitioned table but it is running beyond physical memory limits. Is there a way to handle this issue . I cant change container memory neither use less data from source. Will temporary tables help here i.e. creating 12…
DrSD
  • 151
  • 2
  • 12
0
votes
1 answer

What happends at backend when we alter a table in hive

When we alter a table in hive like changing the partition. what happens to the table. Does it reformat the table or it creates a new data for new partition?
0
votes
0 answers

How to do a partition on Hive table column of every thousand rows

In a Hive table I have millions of rows, I would like to do a partition on one column 'id' which will be unique. So it is not a good practice to create a partition on that unique column because it will create so many number of files, and…
Nomad
  • 751
  • 4
  • 13
  • 34
0
votes
1 answer

File Storage in Hadoop

In Hadoop, suppose I have a file A.txt and in that I have some sample data say: Hello how are you? I am studying hadoop partitioning. Hadoop is interesting to learn and has good opportunities etc... How does this data gets stored in blocks? As…
jack0989
  • 39
  • 2
0
votes
0 answers

Join - Pig scripts

Iam new to Pig scripts. I need help in joining 'B' and 'E'. Below is my script. A = LOAD .... PAPS_1 = FILTER A BY (dataMap#'corr_id_' is NOT null); B = FOREACH PAPS_1 GENERATE dataMap#'corr_id_' as id, dataMap#'response' as resp,…
Anjanaa
  • 31
  • 1
  • 7