Data partitioning deals with the dividing of a collection of data into smaller collections of data for the purpose of faster processing, easier statistics gathering and smaller memory/persistence footprint.
Questions tagged [data-partitioning]
337 questions
0
votes
1 answer
Generating multiple equally sized output files in Hadoop
What are some methods for finding X data ranges in Hadoop so that one can use these ranges as partitions in the reducer step?

syker
- 10,912
- 16
- 56
- 68
0
votes
1 answer
Use conditional criteria to change Row_Number like operation
Using SQL server 2012. I am using variables to identify the number of times "various criteria" are met within my overall dataset. And I want to take half of these instances and do one thing "first_half_thing" and with the other half do a…

Frank Horatio
- 23
- 4
0
votes
1 answer
Global indexes while renaming the partition name
I have a existing table with some indexes in it. I am going to do partitioning of that table using dbms redefinition. I also have to rename the partition names every 24 hours.
Is there any problem in global indexes after I rename the partition…

user1947949
- 43
- 2
- 8
0
votes
1 answer
Parallel bulk loading using partition switching of indexed table in SQL Server 2008
This is a follow up to a previous question of mine after definitely deciding on partition switching as the best way to quickly get data into a heavily indexed fact type table that needs to remain available to readers.
While it seems to be the best…

Chris Woodward
- 382
- 3
- 12
0
votes
1 answer
trying to use partitioning instead of sub querys
I have two tables one in my current system that has the current email, and one that I have created and exported all the email addresses from the Global Address Book, along with a BIT value for which is the primary email address and a user id value.…

user802599
- 787
- 2
- 12
- 35
0
votes
2 answers
Having trouble groking multiple exclusive combinations in C++
I have an ordered string that I need to present to a user:
ABCCDDCBBBCBBDDBCAAA
Objects represented by 'B' are tagged, such that 2 Bs will have a '~' after them.
AB~CCDDCB~BBCBBDDBCAAA
AB~CCDDCBB~BCBBDDBCAAA
AB~CCDDCBBB~CBBDDBCAAA
and so…

notwithoutend
- 235
- 1
- 9
0
votes
1 answer
efficient way to find unequal partitions of an integer
I have total partitions of an integer and I want only those partitions which have all the values unequal. For ex.-Partitions of 3 are {1,1,1,1},{2,2},{3,1},{1,1,2} and {4}. So, the required unequal partitions are {3,1} and {4} because they contain…

Sushant
- 66
- 1
- 9
-1
votes
1 answer
Python semisort list of objects by attribute
I've got a list of an object:
class packet():
def __init__(self, id, data):
self.id, self.data = id, data
my_list = [packet(1,"blah"),packet(2,"blah"),packet(1,"blah"),packet(3,"blah"),packet(4,"blah")]
I want to extract all objects…

Jakob Lovern
- 1,301
- 7
- 24
-1
votes
1 answer
How to groupby user_id and time using SQL Bigquery
I have a table that contains user_id, time (six hours interval), and average margin. I wanted to group by user_id and time (time in ascending order).
The table looks like this as shown below:
user_id
time
average_margin
5696
2020-10-12…

Data Beginner
- 61
- 1
- 1
- 6
-1
votes
1 answer
Formatting an eMMC to SD format
I've been working with a Micron BGA eMMC chip and prototyping a communication scheme with the eMMC chip inside an adapter board that connects to the GPIO pins of a TI microcontroller.
I've essentially created a communication scheme written in C code…

Tyler Mckean
- 1
- 1
-1
votes
1 answer
How insert data from a temporary table into partitioned table in oracle/sql using merge statement
I have to write a merge statement to insert data from temporary table to a partitioned table and i'm getting below error:-
Error report -
SQL Error: ORA-14400: inserted partition key does not map to any partition
I have to do it session wise and as…

Vivek Thakur
- 63
- 4
-1
votes
1 answer
jq: groupby and nested json arrays
Let's say I have: [[1,2], [3,9], [4,2], [], []]
I would like to know the scripts to get:
The number of nested lists which are/are not non-empty. ie want to get: [3,2]
The number of nested lists which contain or not contain number 3. ie want to…

Maths noob
- 1,684
- 20
- 42
-1
votes
2 answers
Unable to create exactly equal data partitions using createDataPartition in R- getting 1396 and 1398 observations each but need 1397
I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.
index = createDataPartition(final_ts$SAR,p=0.5, list = F)
final_test_data =…

Bharat Ram Ammu
- 174
- 2
- 16
-1
votes
1 answer
Pyspark: Why show() or count() of a joined spark dataframe is so slow?
I have two large spark dataframe. I joined them by one common column as:
df_joined = df1.join(df2.select("id",'label'), "id")
I got the result, but when I want to work with df_joined, it's too slow. As I know, we need to repartition df1 and df2 to…

Saeid SOHEILY KHAH
- 747
- 3
- 10
- 23
-1
votes
1 answer
STDEVP for calculated fields
I have a table that looks like this:
ID CHANNEL VENDOR num_PERIOD SALES.A SALES.B
000001 Business Shop 1 40 30
000001 Business Shop 2 60 20
000001 Business Shop 3 NULL …

Also
- 101
- 1
- 2
- 6