Highest Voted 'hadoop-partitioning' Questions

1

vote

1 answer

How to add one extra partition to external hive table?

I have hive table like below create external table transaction( id int, name varchar(60)) month string ) PARTITIONED BY ( year string, transaction_type_code varchar(20) ) STORED AS PARQUET LOCATION 'hdfs://xyz'; I am…

asked Aug 12 '21 at 13:35

Manoj Kumar Dhakad

1,862
1
12
26

1

vote

1 answer

How to drop rows from partitioned hive table?

I need to drop specific rows from a Hive table, which is partitioned. These rows for deletion matches certain conditions, so entire partitions can not be dropped in order to do so. Lets say the table Table has three columns: partner, date and…

hive hdfs delete-row hadoop-partitioning table-partitioning

asked Apr 30 '21 at 01:08

Dr Potato

168
1
15

1

vote

1 answer

How partitioning and clustered by works in Hive table?

I'm trying to understand below query by using that how data is going to be placed. CREATE TABLE mytable ( name string, city string, employee_id int ) PARTITIONED BY (year STRING, month STRING, day STRING) CLUSTERED BY…

hadoop hive hadoop-partitioning hive-partitions hiveddl

asked Apr 10 '21 at 10:28

nut

51
7

1

vote

1 answer

hive script failing due to heap space issue to process too many partitions

my script failing due to a heap space issue to process too many partitions. To avoid the issue I am trying to insert all the partitions into a single partition but I am facing the below error FAILED: SemanticException [Error 10044]: Line 1:23 Cannot…

hive hiveql hadoop2 hadoop-partitioning

asked Jan 19 '21 at 11:35

Never_Give_Up

126
1
9

1

vote

1 answer

joining hive partitioned , bucketed table with only bucketed table (not partitioned table) in hive

i have 2 tables: q6_cms_list_key1 (bucketed by cm and se) partitioned by tr_dt ... 99 000 000 000 rows q6_cm_first_visit (bucketed by cm and se) 25 000 000 000 rows making another table using below conditions insert into table…

hive query-optimization hiveql bucket hadoop-partitioning

asked Dec 28 '20 at 11:07

vashi

9
2

1

vote

1 answer

Hive: why to use partition by in selects?

I cannot understand partitioning concept in Hive completely. I understand what are partitions and how to create them. What I cannot get is why people are writing select statements which have "partition by" clause like it is done here: SQL most…

sql hive hiveql hadoop-partitioning hive-partitions

asked Oct 19 '20 at 15:17

MiamiBeach

3,261
6
28
54

1

vote

1 answer

Can I create buckets in a Hive External Table?

I am creating an external table that refers to ORC files in an HDFS location. That ORC files are stored in such a way that the external table is partitioned by date (Mapping to date wise folders on HDFS, as partitions). However, I am wondering if I…

hadoop hive hiveql bucket hadoop-partitioning

asked Jul 30 '20 at 12:26

Sai Geetha M N

33
10

1

vote

1 answer

How to insert Hive partition column and value into data (parquet) file?

Request:- How can I insert partition key pair into each parquet file while inserting data into Hive/Impala table. Hive Table DDL [ create external table db.tbl_name ( col1 string, col2 string) Partitioned BY (date_col string) STORED AS…

hadoop hive parquet impala hadoop-partitioning

asked Jul 22 '20 at 09:27

Peace_Dude

11
3

1

vote

1 answer

Drop partitions in Hive with different date format in the same partition column

I have 2 types of value in the partition column of string datatype: yyyyMMdd yyyy-MM-dd E.g. there are partition column values 20200301, 2020-03-05, 2020-05-07, 20200701, etc. I need to drop partitions less than 20200501 with a DDL statement…

hive comparison date-format hadoop-partitioning

asked Jul 17 '20 at 17:59

Keerthana Somu

13
2

1

vote

1 answer

HDFS:Exact meaning of dfs.block.size

In our cluster the dfs.block.size is configured 128M, but I have seen quite a few files which is of the size of 68.8M which is a weird size. I have been confused on how exactly this configuration option affects how files look like on HDFS. First…

hadoop hive hdfs hadoop-partitioning

asked Apr 16 '20 at 09:32

Boyu Zhang

219
2
12

1

vote

1 answer

How to call Partitioner in Haoop v 0.21

In my application I want to create as many reducer jobs as possible based on the keys. Now my current implementation writes all the keys and values in a single (reducer) output file. So to solve this, I have used one partitioner but I cannot call…

hadoop mapreduce hadoop-partitioning

asked May 17 '11 at 16:42

Kal

161
3
14

1

vote

1 answer

Reducer Selection in Hive

I have following record set to process like 1000, 1001, 1002 to 1999, 2000, 2001, 2002 to 2999, 3000, 3001, 3002 to 3999 And I want to process the following record set using HIVE in such a way so that reducer-1 will process data 1000 to 1999…

hadoop hive hiveql reduce hadoop-partitioning

asked Jan 23 '20 at 15:51

Suvo

19
1

1

vote

1 answer

Hive Date Partitioned table - Streaming Data in S3 with mixed dates

I have extensive experience working with Hive Partitioned tables. I use Hive 2.X. I was interviewing for a Big Data Solution Architect role and I was asked the below question. Question: How would you ingest a streaming data in a Hive table…

amazon-s3 hive streaming database-partitioning hadoop-partitioning

asked Dec 21 '19 at 15:09

ZeroDecibels

115
1
5

1

vote

0 answers

How to accelerate large hive table spark group by query?

I have an input table intab: create table intab ( ds string comment 'date partition filed' , id1 string comment 'id1' , id2 string comment 'id2' , n int comment 'n' ) comment 'test' partition by list(ds)(partition default); I need to…

sql apache-spark hive bigdata hadoop-partitioning

asked Nov 18 '19 at 15:12

Changwang Zhang

2,467
7
38
64

1

vote

0 answers

Spark Save job is taking a long time

I am trying to save the Dataframe to HDFS location. But my save is taking a long time. The action before this is joining two tables using Spark SQL. Need to know why the save is having four stages and how to improve the performance. I have attached…

scala apache-spark hadoop apache-spark-sql hadoop-partitioning

asked Oct 08 '19 at 14:04

Talib aman

19
2

Questions tagged [hadoop-partitioning]