Highest Voted 'hadoop-partitioning' Questions

0

votes

1 answer

Hadoop Total order Partitioning

Why total total order partitioning in hadoop?. Which scenario we need to take total order partitioning ?. My understanding is after multiple reducers, each reducer result will be sorted by key . then why we need to do total order partitioning. Would…

apache hadoop hadoop-partitioning

asked Apr 29 '18 at 03:47

Learn Hadoop

2,760
8
28
60

0

votes

1 answer

custom partitioner in Hadoop error java.lang.NoSuchMethodException:- ()

I am trying to make a custom partitioner to allocate each unique key to a single reducer. this was after the default HashPartioner failed Alternative to the default hashpartioner provided with hadoop I keep getting the following error. It has…

java hadoop hadoop-partitioning

asked Apr 20 '18 at 14:59

zaranaid

65
1
13

0

votes

0 answers

Isn't the term shuffling in MapReduce misleading?

I think the term shuffle refers to randomly reordering elements in a sequence [1]. Therefore, the first time I saw shuffling in MapReduce, I thought it's trying to uniformly distribute workload to nodes for load balancing purpose. However, after…

hadoop group-by mapreduce shuffle hadoop-partitioning

asked Apr 13 '18 at 16:42

Lingxi

14,579
2
37
93

0

votes

0 answers

How handle uneven partition in spark

So i have a data frame which output 300 GB records into S3 . All the data is partitioned in 2K partition. .partitionBy("DataPartition", "PartitionYear", "PartitionStatement") But the issue is some of the partition has very huge data (40GB) and…

apache-spark apache-spark-sql hadoop2 hadoop-partitioning

asked Mar 21 '18 at 10:56

Sudarshan kumar

1,503
4
36
83

0

votes

1 answer

how to constraint hive query file output to be in a single file always

I have created a hive table using below query, and inserting data to this table on daily basis using second query as mentioned below create EXTERNAL table IF NOT EXISTS DB.efficacy ( product string, TP_Silent INT, TP_Active INT, server_date…

hadoop hive hiveql hadoop-partitioning

asked Mar 19 '18 at 02:29

Ashish Mittal

643
3
12
32

0

votes

2 answers

Multiple reducers without running partitioner in MapReducer

I am trying to understand the concept of running multiple reducers in MR job and came to know that it is partitioner which decides which (key,value) pairs goes to which reducer. My question is: Can we run multiple reducers without running…

hadoop mapreduce hadoop2 hadoop-partitioning

asked Feb 27 '18 at 06:22

CuriousMind

8,301
22
65
134

0

votes

1 answer

One file per partition (Coalesce per pertition) while inserting data into hive table

I have a table created in hive stored in s3 location. It has about 10 columns and is partitioned on 3 columns month, year and city , in the same order. I am running a spark job that creates a dataframe(2 billion rows) and writes into this table. val…

hive apache-spark-sql hadoop-partitioning partition-by

asked Feb 16 '18 at 15:27

dreddy

463
1
7
21

0

votes

0 answers

How to get recently created partitions in a Hive table?

I have a table called EMPLOYEE with columns ID, NAME, DESIGNATION, CITY, COUNTRY, CONTINENT. With 3 level partitioning on CONTINENT, COUNTRY, CITY. Now I need to know recently created partitions say after a specific timestamp. Note : Assuming access…

hadoop hive hadoop-partitioning pyhive hive-metastore

asked Feb 07 '18 at 14:39

Shashank V C

153
1
1
9

0

votes

1 answer

Can I get access to a full block in the mapper?

Usually record reader passes line by line to the mapper or n lines. Can full block be accessed in the mapper? This means the record reader will give full block to the mapper instead of line by line? Does this approach makes sense? Thanks

hadoop mapreduce hadoop-partitioning

asked Jan 22 '18 at 09:30

shujaat

279
6
17

0

votes

1 answer

How to merge hive partitoned data in one large file?

I have hive table partitioned on date and hour column. when I load the data i will create 24 files. I want merge this 24files in one file. Can anyone suggest me the solution

hive hadoop-partitioning merging-data

asked Dec 22 '17 at 10:59

user3890017

21
1
3

0

votes

0 answers

handle hive partitioned table running beyond physical memory limits

I have to create a hive partitioned table but it is running beyond physical memory limits. Is there a way to handle this issue . I cant change container memory neither use less data from source. Will temporary tables help here i.e. creating 12…

hive hadoop-partitioning

asked Dec 13 '17 at 15:51

DrSD

151
2
12

0

votes

1 answer

What happends at backend when we alter a table in hive

When we alter a table in hive like changing the partition. what happens to the table. Does it reformat the table or it creates a new data for new partition?

hive bigdata partitioning hadoop-partitioning

asked Dec 12 '17 at 06:48

Monika Samant

47
7

0

votes

0 answers

How to do a partition on Hive table column of every thousand rows

In a Hive table I have millions of rows, I would like to do a partition on one column 'id' which will be unique. So it is not a good practice to create a partition on that unique column because it will create so many number of files, and…

hive grouping hadoop-partitioning

asked Oct 20 '17 at 15:04

Nomad

751
4
13
34

0

votes

1 answer

File Storage in Hadoop

In Hadoop, suppose I have a file A.txt and in that I have some sample data say: Hello how are you? I am studying hadoop partitioning. Hadoop is interesting to learn and has good opportunities etc... How does this data gets stored in blocks? As…

hadoop hdfs hadoop-partitioning

asked Oct 09 '17 at 11:12

jack0989

39
2

0

votes

0 answers

Join - Pig scripts

Iam new to Pig scripts. I need help in joining 'B' and 'E'. Below is my script. A = LOAD .... PAPS_1 = FILTER A BY (dataMap#'corr_id_' is NOT null); B = FOREACH PAPS_1 GENERATE dataMap#'corr_id_' as id, dataMap#'response' as resp,…

hadoop hdfs apache-pig hadoop-partitioning

asked Oct 04 '17 at 04:23

Anjanaa

31
1
7

Questions tagged [hadoop-partitioning]