Highest Voted 'hadoop-partitioning' Questions

0

votes

1 answer

How to use value of IntWritable as condition to partition data?

I want to use the value of IntWritable as the condition to partition data. But it seems Partitioner() can not get value. public static class GroupMapper extends Mapper { public void map(Object…

java hadoop-partitioning

asked Mar 27 '19 at 19:10

yyyyyyrc

21
3

0

votes

1 answer

Is it possible to virtually divide hadoop cluster into small clusters

We are working to build a big cluster of 100 nodes with 300 TB storage. Then we have to serve it to different users (clients) with restricted resources limit i.e., we do not want to expose complete cluster to each user. Is it possible ? If it is not…

java hadoop mapreduce hadoop-yarn hadoop-partitioning

asked Dec 27 '18 at 09:04

Hafiz Muhammad Shafiq

8,168
12
63
121

0

votes

0 answers

Dynamic partitioning inserting null value for the second column of partition

I'm trying create dynamic partitioning based on two columns, and load data from file which is present in the hdfs location. But while loading data into the dynamically partitioned table from staging table, the second column in the partitioning is…

hadoop hive hiveql hadoop-partitioning

asked Dec 19 '18 at 02:25

learner

155
3
18

0

votes

0 answers

Broadcast join to join two dataframes in SPARK efficiently

I am having a DataFrame df1 which has some 2 Million rows. I have already repartitioned it on the basis of a key called ID, since the data was ID based - df=df.repartition(num_of_partitions,'ID') Now, I wish to join this df to a relatively small…

python apache-spark join pyspark hadoop-partitioning

asked Dec 14 '18 at 15:27

cph_sto

7,189
12
42
78

0

votes

1 answer

Hive query not reading partition field

I created a partitioned Hive table using the following query CREATE EXTERNAL TABLE `customer`( `cid` string COMMENT '', `member` string COMMENT '', `account` string COMMENT '') PARTITIONED BY…

hadoop hive mapreduce avro hadoop-partitioning

asked Dec 03 '18 at 03:07

user2316771

111
1
1
11

0

votes

2 answers

How to run a spark program in Java in parallel

So I have a java application that has spark maven dependencies and on running it, it launches spark server on the host where its run. The server instance has 36 cores. I am specifying SparkSession instance where I am mentioning the number of cores…

java apache-spark parallel-processing apache-spark-dataset hadoop-partitioning

asked Oct 05 '18 at 22:47

Atihska

4,803
10
56
98

0

votes

0 answers

Writing MapReduce and YARN application together

I want to run MapReduce application using Hadoop 2.6.5 (in my own native cluster) and I want to update some things in YARN thus I have seen that I can write my own YARN application…

hadoop hadoop-yarn hadoop2 hadoop-partitioning apache-twill

asked Oct 04 '18 at 08:25

Or Raz

39
2
11

0

votes

0 answers

Why does a partition need to be sorted prior to being reduced?

From here: As per hadoop definitive guide "Within each partition, the back-ground thread performs an in-memory sort by key, and if there is a combiner function, it is run on the output of the sort" I thought a partition corresponds to one key,…

java hadoop mapreduce hadoop-partitioning

asked Sep 30 '18 at 21:06

Mario Ishac

5,060
3
21
52

0

votes

1 answer

Map-Reduce job failing to deliver expected partitioned files

In a Map-Reduce job, I am using five different files where in my dataset contains values under two categories P and I. After I specific values are found, I am passing those into I-part-r-00000 file and accordingly, for P. I am using…

java mapreduce hadoop2 hadoop-partitioning

asked Sep 14 '18 at 08:30

Mohit Sudhera

341
1
4
16

0

votes

1 answer

How AM selects the node for each reduce task?

I am doing two jobs of Word count example in the same cluster (I run hadoop 2.65 locally with my a multi-cluster) where my code run the two jobs one after the other. Where both of the jobs share the same mapper, reducer and etc. but each one of them…

hadoop mapreduce hadoop-yarn hadoop2 hadoop-partitioning

asked Sep 11 '18 at 19:11

Or Raz

39
2
11

0

votes

2 answers

Combine Multiple Hive Tables as single table in Hadoop

Hi I have multiple Hive tables around 15-20 tables. All the tables will be common schema . I Need to combine all the tables as single table.The single table should be queried from reporting tool, So performance is also needs to be care.. I tried…

hadoop join hive union-all hadoop-partitioning

asked Jul 26 '18 at 07:07

Teju Priya

595
3
8
18

0

votes

1 answer

Convert value while inserting into HIVE table

i have created bucketed table called emp_bucket into 4 buckets clustered on salary column. The structure of the table is as below: hive> describe Consultant_Table_Bucket; OK id int age …

hadoop hive hadoop-partitioning

asked Jun 12 '18 at 00:22

Sunil

553
1
12
30

0

votes

1 answer

Hadoop Spark - Store in one Large File instead of Many Small ones and Index

On a daily basis i would be calculating some stats and storing it in a file (about 40 rows of data). df below is calculated daily. The issue is when i store it each day it becomes a new file and i do not want to do this as hadoop doesn't deal well…

apache-spark hadoop pyspark apache-spark-sql hadoop-partitioning

asked Jun 04 '18 at 16:26

SecretAgent

97
10

0

votes

1 answer

Record count for Hive partitioned table

I have a table called "transaction" in Hive which is partitioned on a column called "DS" which will have data like "2018-05-05", "2018-05-09", "2018-05-10" and so on This table is populated overnight for the day which got completed. At any point,…

hadoop hive bigdata hadoop-partitioning

asked May 09 '18 at 15:16

Prashanth G B

1
1
1

0

votes

1 answer

Hadoop MapReduce - How to create dynamic partition

How to create dynamic partition using java map reduce, like sql we have group by country column. Example i have country based dataset and need to separate the records based on country ( partition). We can't limit the coutry. since every day will get…

java hadoop mapreduce hadoop-partitioning

asked Apr 29 '18 at 04:14

Learn Hadoop

2,760
8
28
60

Questions tagged [hadoop-partitioning]