Questions tagged [data-partitioning]

Data partitioning deals with the dividing of a collection of data into smaller collections of data for the purpose of faster processing, easier statistics gathering and smaller memory/persistence footprint.

337 questions
0
votes
0 answers

Explicit conversion of column in Table Partition in SQL Server

I have table like below: CREATE TABLE [dbo].[PartitionExample] ( [dateTimeColumn1] [datetime] NOT NULL, CONSTRAINT [PK_PartitionExample] PRIMARY KEY CLUSTERED ( [dateTimeColumn1] ASC )WITH (PAD_INDEX = OFF,…
0
votes
1 answer

Date difference over consecutive rows filtered to one instance of consecutive values using LEAD function

Basically the result i need is price change ranges for each item, i need to extract an item price and the date of the transaction with an end date set to the next time that item price is changed. Given this table create table myTable(id int, Price…
mhDuke
  • 137
  • 1
  • 10
0
votes
0 answers

Nested partition in postgres

I am attempting to create referral code infrastructure. I would like to use upto the first 12 characters of a user's email, deduped with an integer: So given three users with the emails bob@gmail.com, bob@hotmail.com, and bob@yahoo.com, I would like…
Abraham P
  • 15,029
  • 13
  • 58
  • 126
0
votes
1 answer

In Spark, when no partitioner is specified, does the ReduceByKey operation repartition the data by hash before starting aggregating it?

If we do not mention any partitioner for a reduceByKey operation, does it perform hashPartitioning internally before the reduction? For example my test code is like: val rdd = sc.parallelize(Seq((5, 1), (10, 2), (15, 3), (5, 4), (5, 1), (5,3),…
Sayantan Ghosh
  • 998
  • 2
  • 9
  • 29
0
votes
0 answers

Partitioning Huge SQL Database for data management

I have a DB with 170GB+ data , 1 table contains 90% of db data. we are planning to do a sliding window partition on the ChangeLog table . What would be the best solution to manage the data , so that i can remove old data with least amount of down…
drac13
  • 112
  • 9
0
votes
1 answer

recover the value above in postgres

I want to recover the value above by name. See the table. I would like to have a result like in the column before last number_week. Thank you
Bak
  • 411
  • 1
  • 5
  • 19
0
votes
0 answers

How to define partitions to Dataframe in pyspark?

Suppose I read a parquet file as a Dataframe in pyspark, how can I specify how many partitions it must be? I read the parquet file like this - df = sqlContext.read.format('parquet').load('/path/to/file') How may I specify the number of partitions…
Ani Menon
  • 27,209
  • 16
  • 105
  • 126
0
votes
1 answer

Kafka Streams - How to efficiently join with a large, non-copartitioned store/topic

We have a stream of web events. The event is partitioned by (domain, uid). All events explained here are from same domain. There are thousands of domains, very uneven in traffic (hence that partitioning). Let's say we have events from one…
xmar
  • 1,729
  • 20
  • 48
0
votes
1 answer

Can hive metastore virtually partition data based on column value without physically changing the directory structure?

As an example consider I have a data of all the major sports events happened.Schema given below EventName,Date,Month,Year,City This data that is physically structured in HDFS on year,date,month. Now I want to create virtual partitions on that based…
0
votes
0 answers

Partition by most recent data - Sql Server 2012

how can I partition my table into two. One with latest one month data and the rest into another partition? I am using SQL Server 2012 Standard Edition.
Pandiarajan
  • 410
  • 5
  • 14
0
votes
1 answer

Kafka Streams: Partial reprocessing by key

Scenario: In a KafkaStreams web sessioning scenario, with unlimited (or years-long) retention, with interactive queries (this can be reviewed if necessary), with many clients, which have many users each (each user particular to each client), and…
0
votes
1 answer

MySQL query to create a rank chart over time

I have a table with statistic data (roughly 100,000 rows) that has the following format: WeekNum Name Value Category 201751 Joe 15 X 201751 Max 23 X 201751 Jim 7 X 201752 Joe 18 X 201752 …
Matth
  • 159
  • 1
  • 3
  • 10
0
votes
0 answers

Apache Spark - write only the changed partitions

Is there a built-in way to define a DataFrame as a set of partition paths (each with one or more files), use that DataFrame as the basis of a set of so-called "mutation" queries which are defined as a separate DataFrame, and partition the resulting…
jennykwan
  • 2,631
  • 1
  • 22
  • 33
0
votes
2 answers

suitable data structure for set (graph) partition

I need to store data grouping nodes of a graph partition, something like: [node1, node2] [node3] [node4, node5, node6] My first idea was to have just a simple vector or array of ints, where the position in the array denoted the node_id and it's…
zenna
  • 9,006
  • 12
  • 73
  • 101
0
votes
1 answer

How to check on which column to create Index to optimize performance

I have below query which is costing too much time and i have to optimize the query performance. There is no index on any of the table. But now for query performance optimization i am thinking to create index. But not sure on particulary which…
Andrew
  • 3,632
  • 24
  • 64
  • 113