Data partitioning deals with the dividing of a collection of data into smaller collections of data for the purpose of faster processing, easier statistics gathering and smaller memory/persistence footprint.
Questions tagged [data-partitioning]
337 questions
0
votes
1 answer
When to create local and global indexes in RANGE INTERVAL partition
We are making use of Oracle 12c RANGE INTERVAL partitioning where Oracle creates the partitions automatically based on the data.
Parent tables are partitioned based on RANGE INTERVAL, child tables are based on REFERENCE partition.
We have about 33…

AJORA
- 27
- 5
0
votes
1 answer
Oracle Automatic LIST partition using virtual column not allowing REFERENCE partition on child table
I’ve made an attempt to create partition on test table using virtual column. This approach is working good for PARENT or standalone tables. However, I cannot create REFERENCE partition on CHILD table if the PARENT table is PARTITIONED using…

AJORA
- 27
- 5
0
votes
1 answer
1D Clustering with categorical variables
I have log operations which I try to analyse. For the analysis, I would like to learn whether a user is in a page/navigation mode or in the quiz mode (determined which kind of operations are more prevalent). The mode is given by the frequency of the…

navige
- 2,447
- 3
- 27
- 53
0
votes
1 answer
Split string values in equal and same partition
I need to split my data into 80 partitions regardless of what is the key of the data and each time the data should retrun the same partition value. Is there any alogorithm which can be used to implement the same.
The key is combination of multiple…

Rafa
- 487
- 7
- 22
0
votes
0 answers
Dynamic creation of partition key using Sqoop command from MySQL to Hive
I want to create a Hive table by importing data from MySQL. The following command can create the table -
sqoop import \
-D mapred.job.name=name \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
-connect "connection_detail" \
-username…

TeeKay
- 1,025
- 2
- 22
- 60
0
votes
2 answers
How to compute the average event frequency over a partition in BigQuery
I have a BigQuery table that is essentially an event trace for a given user session. I would like to partition the data by track in the example and produce a frequency distribution of events averaged over tracks.
track Event Name
1 A
1 B
1 …

Sean K
- 113
- 1
- 7
0
votes
2 answers
How to Partition Database Table in Azure Data Explorer?
I started exploring ADX a few days back. I imported my data from Azure SQL to ADX using ADF pipeline but when I query those data, it is taking a long time. To find out some workaround I researched for Table Data Partitioning and I am much clear on…

DSA
- 720
- 2
- 9
- 30
0
votes
1 answer
Data can't be written in dolphindb distributed database
I built a distributed database in dolphindb to store stock minute data. I partitioned the data by date and symbols and append the data using the following script for testing. I'm confused why the last query return 0?
data = loadText…

HIPO.L
- 146
- 1
- 7
0
votes
1 answer
Is Spark partitioning and bucketing similar with DataFrame repartition method?
I know that partitioning and bucketing are used for avoiding data shuffle.
Also bucketing solves problem of creating many directories on partitioning.
and
DataFrame's repartition method can partition at(in) memory.
Except that partitioning and…

C.Moon
- 48
- 3
0
votes
3 answers
Custom partition problem
I have the following problem: Given a set of N integers divide them into two almost equal partitions in such a way that the sum of the greater partition is minimum. This sounds almost like the classical partition problem with one exception: the even…

ZLMN
- 731
- 2
- 10
- 24
0
votes
1 answer
Hazelcast : Difference in data distribution across partition in IMap and ISemaphore
My doubt is from link https://hazelcast.org/mastering-hazelcast/#controlled-partitioning
It says:
Hazelcast has two types of distributed objects.
One type is the truly partitioned data structure, like the IMap, where
each partition will store a…

Reena Upadhyay
- 1,977
- 20
- 35
0
votes
1 answer
Is expression based partitioning supported in hive?
I have a table with a column, can i create a partition based on an expression using that column
I read that IBM's Big SQL technology has this feature.
I also know we can partition in hive by a column but what about an expression?
In this case i am…

david
- 88
- 6
0
votes
2 answers
Partition a Set into k Disjoint Subset
Give a Set S, partition the set into k disjoint subsets such that the difference of their sums is minimal.
say, S = {1,2,3,4,5} and k = 2, so { {3,4}, {1,2,5} } since their sums {7,8} have minimal difference. For S = {1,2,3}, k = 2 it will be…

st0le
- 33,375
- 8
- 89
- 89
0
votes
1 answer
Split large hash-structured JSON file into multiple smaller files
I am working with a very large JSON file, that has a hash-like structure:
{
"1893": {
"foo": {
"2600": {
...[snip]...
},
"3520": {
...[snip]...
}
}
"id": "foobar"
},
"123": {
"bar": {
…

linkyndy
- 17,038
- 20
- 114
- 194
0
votes
1 answer
Specifiying a selected range of data to be used in leave-one-out (jack-knife) cross-validation for use in the caret::train function
This question builds on the question that I asked here: Creating data partitions over a selected range of data to be fed into caret::train function for cross-validation).
The data I am working with looks like this:
df <- data.frame(Effect =…

André.B
- 617
- 8
- 17