The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent useable form for further processing or input to an algorithm or system.
Questions tagged [data-munging]
236 questions
0
votes
1 answer
pandas how to get sorted value in groupby object
I have a dataframe
df =
Col Val
a. 8
a. 9
c. 4
c. 0
d. 3
d. 9
I want to sort by Val of the smallest value within group and then foreach row get the index of the groupby Col
So the new df will df
df_new =
Col Val Idx
c. 4. 0
c. 0.…

Cranjis
- 1,590
- 8
- 31
- 64
0
votes
2 answers
dataframe get rows between certain value to a certain value in previous row
I have a dataframe:
df = c1 c2 c3 code
1. 2. 3. 200
1. 5. 7. 220
1. 2. 3. 200
2. 4. 1. 340
6. 1. 1. 370
6. 1. 5. 270
9. 8. 2. 300
1. 6. 9. 700
9. 2. 1. 200
8. 1. 2 400
1. 2 1. 200
2. 5.…

Cranjis
- 1,590
- 8
- 31
- 64
0
votes
2 answers
Joining dataframes of different dimensions with varying merge by criterion
Good evening, I am trying to merge a couple datasets and my normal tools in R are failing me tonight. Consider df1 and df2 below.
df1 = data.frame(a = c("a", "b", "c"),
b = c("1", "2", "3"),
c = c("x", "y",…

Aswiderski
- 166
- 9
0
votes
1 answer
Transform sequential 2d array to time-windowed dataset
I have a 2d dataframe:
C1. C2. C3
0. 2. 3. 6
1. 8. 2. 1
2. 8. 6. 2
3. 4. 9. 0
4. 6. 7. 1
5. 2. 3. 0
I want it to be a 3d data with
So if window size is 5, the shape…

Cranjis
- 1,590
- 8
- 31
- 64
0
votes
0 answers
Many-one to one-many mapping in pandas
csv jpg
The first column in the image is of union ministry and second column contains schemes introduced in that ministry. I want to create one-many mapping using pandas in jupyter notebook.
Output should be something like this:
{Department of…
0
votes
1 answer
In a dataframe, replace first 'n' entries of a column with other values from another dataframe
I have a dataframe with a column called 'household'. Household has 2000 rows of entries. Now, I want to replace the first 200 rows with some values, that I have in another column.
So the final result of 'household' would be, first 200 is the…

pkha
- 95
- 7
0
votes
1 answer
pandas dataframe add rows that are shuffle of values of specific columns
I have the dataframe:
df = b_150 h_200 b_250 h_300 b_350 h_400 c1 c2 q4
1. 2. 3. 4 5. 6. 3. 4. 4
I want to add rows with possible shuffles between values of b_150, b_250, b_350 and h_200, h_300, h_400
So for example
df…

Cranjis
- 1,590
- 8
- 31
- 64
0
votes
2 answers
Aggregate sum of multiple columns by date in pandas
My df looks like this
Date
Col
Col1
01/01/2022
A
500
01/01/2022
B
100
01/01/2022
C
400
02/01/2022
A
400
02/01/2022
B
150
02/01/2022
C
450
My desired output looks…

Amit Kumar
- 154
- 12
0
votes
0 answers
pandas dataframe how to take only rows with max value on one col, per group
I have a dataframe with categorical value, numerical values and a counter.
I want to keep only rows with the highest counter per category.
So, if my dataframe is:
df = Tr. N1. N2. counter
T1. 0. 3. 0
T1. 2. 3. 0
T1. 7. 1.…

Cranjis
- 1,590
- 8
- 31
- 64
0
votes
1 answer
Pre-Processing / Formatting Data
I have two vectors in R:
list1 <- c("ABCDEF", "FEDCBA", "AA-BB-CCCC", "ABCDEFGH-IJK", "ZZZZ")
list2 <- c("ABCDEF", "FEDCBA:XA",
"AA-BB-CCCC-01","AA-BB-CCCC-21:ABC", "ABCDEFGH-IJK-1X",
"AKDWXFE-XXY")
I'd like to compare…

HelpMeCode
- 299
- 2
- 13
0
votes
1 answer
R function to increment numbers at each row
I'm trying to create a new column "ID" in a dataframe.
Each row must have a unique ID incremented by 5 each time. But it should not start at 0, but from a desired number (let's say N = max of another dataset's column).
What would be the easiest way…

Andre230
- 145
- 1
- 9
0
votes
2 answers
Merge rows with same index and prioritize column values
I have a dataframe with some duplicate index values with columns containing values for two different experiments. I want to prioritize Col_A if values are present across both index instances. I am working to solve this solution using the following…

Cody Glickman
- 514
- 1
- 8
- 30
0
votes
1 answer
Pandas get mean value per (row,col) across list of dataframes
I have a dictionary of dataframes:
{1 : df1, 2 : df2 .. }
All dataframe are with the same shapes. (but different number of rows).
I want to create the dataframe where every column is the mean of this column for this row.
So if:
df1 : A B C
…

Cranjis
- 1,590
- 8
- 31
- 64
0
votes
2 answers
Merge columns in Python/Pandas of Dataframe1 from Dataframe2 only if specific column contains at least one of the words of the other column
Consider the Dataframes:
Employees:
Employee City
Ernest Tel Aviv
Merry New York
Mason Cairo
Clients:
Client Words
Ernest New vacuum Tel
Mason Tel Aviv is so pretty
Merry Halo! I live in the city York
I'm trying to…

JAN
- 21,236
- 66
- 181
- 318
0
votes
3 answers
Pivot a character vector to a data.frame with specified number of columns
I have a vector of data where every 4th row starts a new observation. I need to pivot the data so that first four values become the first row, the next four values become the second, and so on.
Here is a simplified example...
Given the following…

Stan
- 905
- 9
- 20