Highest Voted 'hadoop-partitioning' Questions

1

vote

1 answer

Creating view in HIVE

I want to create a view on a hive table which is partitioned . My view definition is as below: create view schema.V1 as select t1.* from scehma.tab1 as t1 inner join (select record_key ,max(last_update) as last_update from scehma.tab1 group by…

hadoop hive hadoop-partitioning

asked Mar 28 '16 at 23:18

jayanta layak

11
4

1

vote

1 answer

TotalOrderPartitioner and mrjob

How does one specify the TotalOrderPartitioner when using mrjob? Is this the default, or must it be specified explicitly? I've seen inconsistent behavior on different data sets.

hadoop-streaming mrjob hadoop-partitioning totalorderpartitioner

asked Feb 26 '16 at 04:30

vy32

28,461
37
122
246

1

vote

1 answer

Set date function as variable and use in beeline and hql file (hive)

Could anyone please explain to me how to solve this issue. I want to use from_unixtime(unix_timestamp() - 86400, 'yyyyMMdd) as the value for a variable and use it in a query's where clause that is stored in an hql file. I have tried: beeline…

hadoop hive hadoop-partitioning

asked Feb 11 '16 at 16:51

smastika

137
2
12

1

vote

1 answer

Facing an error when using TotalOrderPartitioner MapReduce

I have written the below program. I have run it without using TotalOrderPartitioner and it has run well. So I don't think there are any issues with Mapper or Reducer class as such. But when I include the code for TotalOrderPartitioner i.e. write…

hadoop mapreduce hadoop-partitioning

asked Jan 21 '16 at 02:48

Don Sam

525
5
20

1

vote

1 answer

different keys goes into 1 file even if using Hadoop custom Partitioner

I am running out a minute issue. I am trying to get different file for different keys from Reducer. Partitioner public class customPartitioner extends Partitioner implements Configurable { private Configuration…

hadoop mapreduce hadoop-partitioning

asked Dec 09 '15 at 06:27

USB

6,019
15
62
93

1

vote

1 answer

Spark-SQl DataFrame partitions

I need to load an Hive table using spark-sql and then to run some machine-learning algho on that. I do that writing: val dataSet = sqlContext.sql(" select * from table") It works well, but If I wanted to increase number of partions of the dataSet…

apache-spark apache-spark-sql hadoop-partitioning

asked Dec 02 '15 at 11:03

Edge07

13
3

1

vote

0 answers

How to do a secondary sort on filenames with numbers in hadoop streaming?

I'm trying to sort file names such as cat1.pdf, cat2.pdf, ... cat10.pdf ... I'm utilizing a sort right now with the following parameters: -D…

sorting hadoop hadoop-streaming hadoop-partitioning secondary-sort

asked Nov 15 '15 at 17:30

user110977

21
2

1

vote

0 answers

How to select top rows in hadoop?

I am reading a 138MB file from Hadoop and trying to assign sequence numbers to each record. Below is the approach I followed. I read the entire file using cascading, assigned current slice number and current record counter to each record. This was…

hadoop mapreduce cascading hadoop-partitioning input-split

asked Jul 20 '15 at 11:23

Abhishek Korpe

11
1

1

vote

3 answers

hadoop mapreduce unordered tuple as map key

Based on the wordcount example from Hadoop - The Definitive Guide, I've developed a mapreduce job to count the occurence of unordered tuples of Strings. The input looks like this (just larger): a b c c d d b a a …

java hadoop mapreduce hadoop2 hadoop-partitioning

asked Mar 24 '15 at 19:25

user3365

31
2
7

1

vote

1 answer

Using Hadoop Partitioner and Comparator Class Together

I have a file that has two columns id and timestamp. I'm count the number of sessions each value has - determined by inactivity for more than 30 minutes. However, I'm having trouble with the streaming commands. An example few row is as…

hadoop shuffle hadoop-streaming hadoop-partitioning

asked Feb 18 '15 at 15:19

cloud36

1,026
6
21
35

1

vote

2 answers

How to get the most uniform partition results?

I don't know if there is any algrithom to get the optimal parition for a key based data partition (need to ensure the same key records in the same result data set). For example: I have a data set needs to be divided into two parts： key …

hadoop partitioning partition data-partitioning hadoop-partitioning

asked Feb 08 '15 at 05:48

Tim

659
1
7
16

1

vote

1 answer

How to override shuffle/sort in map/reduce or else, how can I get the sorted list in map/reduce from the last element to the patitioner

Assuming only one reducer. My scenario is to get the list of top N scorers in the university. The data is in format. The Map/reduce framework, by default, sorting the data, in ascending order. But I want the list in descending order, or atleast if…

mapreduce hadoop2 hadoop-partitioning

asked Jan 20 '15 at 10:03

Jack Daniel

2,527
3
31
52

1

vote

0 answers

How to split a log file based on the the timestamp/Date

I have to analyze a huge log file for management report purpose. The format of the log file is as below:- [2014-08-28 08:49:40 GMT][Level:DEBUG] Connection from UGUBUKBBBHJGJ.mt.site (123.131.21.20) , user : 12345678 for compositeId :…

hadoop mapreduce hadoop-partitioning

asked Dec 11 '14 at 11:55

user3548788

1

vote

1 answer

isSplitable in combineFileInputFormat does not work

I have thousands of small files, and I want to process them with combineFileInputFormat. In combineFileInputFormat, there are multiple small files for one mapper, each file will not be split. the snippet of one of the small input files like…

hadoop mapreduce bigdata hadoop2 hadoop-partitioning

asked Dec 04 '14 at 15:01

alec.tu

1,647
2
20
41

1

vote

1 answer

Hadoop getting time difference between dates

I am struggling something like this in hadoop I get following as a result of my mapper KeyValue1, 2014-02-01 20:42:00 KeyValue1, 2014-02-01 20:45:12 KeyValue1, 2014-05-01 10:35:02 KeyValue2, 2014-03-01 01:45:12 KeyValue2, 2014-03-01…

hadoop mapreduce hadoop-partitioning

asked Dec 03 '14 at 23:12

Bedi Egilmez

1,494
1
18
26

Questions tagged [hadoop-partitioning]