Questions tagged [data-partitioning]

Data partitioning deals with the dividing of a collection of data into smaller collections of data for the purpose of faster processing, easier statistics gathering and smaller memory/persistence footprint.

337 questions
0
votes
1 answer

When to create local and global indexes in RANGE INTERVAL partition

We are making use of Oracle 12c RANGE INTERVAL partitioning where Oracle creates the partitions automatically based on the data. Parent tables are partitioned based on RANGE INTERVAL, child tables are based on REFERENCE partition. We have about 33…
0
votes
1 answer

Oracle Automatic LIST partition using virtual column not allowing REFERENCE partition on child table

I’ve made an attempt to create partition on test table using virtual column. This approach is working good for PARENT or standalone tables. However, I cannot create REFERENCE partition on CHILD table if the PARENT table is PARTITIONED using…
0
votes
1 answer

1D Clustering with categorical variables

I have log operations which I try to analyse. For the analysis, I would like to learn whether a user is in a page/navigation mode or in the quiz mode (determined which kind of operations are more prevalent). The mode is given by the frequency of the…
navige
  • 2,447
  • 3
  • 27
  • 53
0
votes
1 answer

Split string values in equal and same partition

I need to split my data into 80 partitions regardless of what is the key of the data and each time the data should retrun the same partition value. Is there any alogorithm which can be used to implement the same. The key is combination of multiple…
Rafa
  • 487
  • 7
  • 22
0
votes
0 answers

Dynamic creation of partition key using Sqoop command from MySQL to Hive

I want to create a Hive table by importing data from MySQL. The following command can create the table - sqoop import \ -D mapred.job.name=name \ -Dorg.apache.sqoop.splitter.allow_text_splitter=true \ -connect "connection_detail" \ -username…
TeeKay
  • 1,025
  • 2
  • 22
  • 60
0
votes
2 answers

How to compute the average event frequency over a partition in BigQuery

I have a BigQuery table that is essentially an event trace for a given user session. I would like to partition the data by track in the example and produce a frequency distribution of events averaged over tracks. track Event Name 1 A 1 B 1 …
Sean K
  • 113
  • 1
  • 7
0
votes
2 answers

How to Partition Database Table in Azure Data Explorer?

I started exploring ADX a few days back. I imported my data from Azure SQL to ADX using ADF pipeline but when I query those data, it is taking a long time. To find out some workaround I researched for Table Data Partitioning and I am much clear on…
DSA
  • 720
  • 2
  • 9
  • 30
0
votes
1 answer

Data can't be written in dolphindb distributed database

I built a distributed database in dolphindb to store stock minute data. I partitioned the data by date and symbols and append the data using the following script for testing. I'm confused why the last query return 0? data = loadText…
HIPO.L
  • 146
  • 1
  • 7
0
votes
1 answer

Is Spark partitioning and bucketing similar with DataFrame repartition method?

I know that partitioning and bucketing are used for avoiding data shuffle. Also bucketing solves problem of creating many directories on partitioning. and DataFrame's repartition method can partition at(in) memory. Except that partitioning and…
C.Moon
  • 48
  • 3
0
votes
3 answers

Custom partition problem

I have the following problem: Given a set of N integers divide them into two almost equal partitions in such a way that the sum of the greater partition is minimum. This sounds almost like the classical partition problem with one exception: the even…
ZLMN
  • 731
  • 2
  • 10
  • 24
0
votes
1 answer

Hazelcast : Difference in data distribution across partition in IMap and ISemaphore

My doubt is from link https://hazelcast.org/mastering-hazelcast/#controlled-partitioning It says: Hazelcast has two types of distributed objects. One type is the truly partitioned data structure, like the IMap, where each partition will store a…
Reena Upadhyay
  • 1,977
  • 20
  • 35
0
votes
1 answer

Is expression based partitioning supported in hive?

I have a table with a column, can i create a partition based on an expression using that column I read that IBM's Big SQL technology has this feature. I also know we can partition in hive by a column but what about an expression? In this case i am…
david
  • 88
  • 6
0
votes
2 answers

Partition a Set into k Disjoint Subset

Give a Set S, partition the set into k disjoint subsets such that the difference of their sums is minimal. say, S = {1,2,3,4,5} and k = 2, so { {3,4}, {1,2,5} } since their sums {7,8} have minimal difference. For S = {1,2,3}, k = 2 it will be…
st0le
  • 33,375
  • 8
  • 89
  • 89
0
votes
1 answer

Split large hash-structured JSON file into multiple smaller files

I am working with a very large JSON file, that has a hash-like structure: { "1893": { "foo": { "2600": { ...[snip]... }, "3520": { ...[snip]... } } "id": "foobar" }, "123": { "bar": { …
linkyndy
  • 17,038
  • 20
  • 114
  • 194
0
votes
1 answer

Specifiying a selected range of data to be used in leave-one-out (jack-knife) cross-validation for use in the caret::train function

This question builds on the question that I asked here: Creating data partitions over a selected range of data to be fed into caret::train function for cross-validation). The data I am working with looks like this: df <- data.frame(Effect =…
André.B
  • 617
  • 8
  • 17