Highest Voted 'hadoop-partitioning' Questions

0

votes

1 answer

Copying Hive managed table by copying partition directories into warehouse

I have an existing bucketed table that has YEAR, MONTH, DAY partitioning, but I want to add additional partitioning by INGESTION_KEY, a column that doesn't exist in the existing table. This is to accommodate future table inserts so that I don't have…

asked Feb 06 '17 at 14:49

ktmq

43
1
8

0

votes

1 answer

avoid partitions unbalancing Spark

I have a performance problem with a code I'm revisioning, everytime will give an OOM while performing a count. I think I found the problem, basically after keyBy tranformation, being executed aggregateByKey. The problem lies to the fact that almost…

apache-spark rdd shuffle hadoop-partitioning

asked Jan 16 '17 at 15:53

Giorgio

1,073
3
15
33

0

votes

1 answer

How to read multiple line elements in Spark , where each record of log is starting with yyyy-MM-dd format and each record of log is multi-line?

I have implemented below logic in scala so far for this : val hadoopConf = new Configuration(sc.hadoopConfiguration); //hadoopConf.set("textinputformat.record.delimiter", "2016-") hadoopConf.set("textinputformat.record.delimiter",…

scala hadoop apache-spark data-science hadoop-partitioning

asked Dec 24 '16 at 17:30

Ashish Tyagi

33
7

0

votes

1 answer

I can't ping windows azure VM's VIP from my local machine

I have created Windows azure VM and also installed HADOOP in it. Now I want to access HDFS by using URL from my local machine so that i can perform read and write operation. Please guide me the steps to perform this task. Thanks in Advance.

windows azure hadoop azure-virtual-machine hadoop-partitioning

asked Nov 25 '16 at 08:10

sourabh pandey

31
1
4

0

votes

3 answers

HIVE. Dynamic partitioning and Insert into specific column

There is a HIVE table with around 100 columns, partitioned by columns ClientNumber and Date. I am trying to insert data from another HIVE table into only 30 columns as well as create Date partitions dynamically. The issue is that all data gets…

hadoop hive hadoop-partitioning

asked Nov 21 '16 at 10:44

VasiliK

1
1
1

0

votes

0 answers

How can I use the custom Writable in the mapper? Hadoop

I am trying to write mapreducer program for the following problem. Problem: Determine the length of each tweet that is stored in csv file how many time a particular length of tweet occur Compute their averages The custome writable(Pair)below was…

java hadoop mapreduce hadoop-streaming hadoop-partitioning

asked Nov 05 '16 at 02:22

elyon

37
6

0

votes

1 answer

What is difference between hadoop 2.7.3 vs hadoop 2.6.5

I recently came across Hadoop version, in this I noticed that, both 2.6.5 and 2.7.3 are been developed parallel and simultaneous.If possible someone please give me difference between them. 08 October, 2016: Release 2.6.5 available A point release…

hadoop hadoop2 hadoop-streaming cloudera-cdh hadoop-partitioning

asked Oct 19 '16 at 19:51

Devendra Bhat

1,149
2
14
19

0

votes

1 answer

Hive select query failed on ORC table

Exception: Failed with exception java.io.IOException:java.io.IOException: Somehow read -1 bytes trying to skip 6257 more bytes t o seek to position 6708, size: 1290047 Does anyone has any idea about how to fix it on cloud dataproc ?

hadoop hive hadoop-partitioning google-cloud-dataproc orc

asked Oct 13 '16 at 03:10

Revan

541
1
5
13

0

votes

3 answers

how to check partition data sets in oozie work flow?

how to check the partition location exist or not with oozie work flow using decision node. example: /user/cloudera/year=2016/month=201609/day=20150912 in my hdfs location i will get one data set every day like…

hadoop oozie cloudera-cdh oozie-coordinator hadoop-partitioning

asked Sep 12 '16 at 09:54

Sai

1,075
5
31
58

0

votes

1 answer

Hadoop partitioning. How do you efficiently design a Hive/Impala table?

How do you efficiently design a Hive/Impala table considering the following facts? The table receives tool data of about 100 million rows every day. The date on which it receives the data is stored in a column in the table along with its tool…

hadoop hive impala hadoop-partitioning

asked Sep 02 '16 at 16:26

Outlander

25
3

0

votes

0 answers

Elasticsearch monthly index on nested field

How to create a monthly index based on field in Nested document. Example for below document i want to partition based on Joindate. My purging and query search logic is based on that. { "pkClmn": "100", "organizationName": "Microsoft", …

elasticsearch spring-data-elasticsearch hadoop-partitioning nosql

asked Aug 07 '16 at 16:00

user2526641

319
1
4
19

0

votes

1 answer

Distributing Hadoop Streaming Output files on basis of Keys

I have written a mapper function that parses the XML and outputs the result as columns separted by "\t" as shown below Name Age ABC 23 XYZ 24 ERT 25 Using the Hadoop Streaming Code as mentioned below, I am trying to partition the data on the…

python hadoop mapreduce hadoop-streaming hadoop-partitioning

asked Jun 13 '16 at 09:08

Rohit Guglani

1
3

0

votes

1 answer

hive hadoop: selecting data from table getting error

after I created an external table in Hive I wanted to know to the number of tweets so I wrote the following query but I got this error,please how to solve this problem and this is the configuration of mapred-site.xml …

hadoop hive hadoop-streaming hadoop-partitioning flume-twitter

asked May 29 '16 at 22:20

javac

2,819
1
20
22

0

votes

1 answer

Aggregate queries fail in hive if partition directory doesn't exist

I am using Hive v1.2.1 with Tez. I have an external partitioned table. The partitions are hourly and of the form p=yyyy_mm_dd_hh. The situation is that these partition directories in hdfs are likely to be deleted sometime. After they are deleted,…

hadoop hive hadoop-partitioning

asked May 26 '16 at 09:35

Ankit Khettry

997
1
13
33

0

votes

1 answer

What are the advantages of increasing the partition size and decreasing partitions number in spark?

I have 1 master and 3 slaves(4 cores each) By Default the min partition size in my spark cluster is 32MB and my file size is 41 Gb. So i am trying to reduce the number of partitions by changing the minsize to…

scala apache-spark hadoop-partitioning

asked Apr 13 '16 at 06:00

Pavan Kumar Aryasomayajulu

948
10
18

Questions tagged [hadoop-partitioning]