Questions tagged [data-munging]

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent useable form for further processing or input to an algorithm or system.

236 questions
0
votes
1 answer

pandas how to get sorted value in groupby object

I have a dataframe df = Col Val a. 8 a. 9 c. 4 c. 0 d. 3 d. 9 I want to sort by Val of the smallest value within group and then foreach row get the index of the groupby Col So the new df will df df_new = Col Val Idx c. 4. 0 c. 0.…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
2 answers

dataframe get rows between certain value to a certain value in previous row

I have a dataframe: df = c1 c2 c3 code 1. 2. 3. 200 1. 5. 7. 220 1. 2. 3. 200 2. 4. 1. 340 6. 1. 1. 370 6. 1. 5. 270 9. 8. 2. 300 1. 6. 9. 700 9. 2. 1. 200 8. 1. 2 400 1. 2 1. 200 2. 5.…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
2 answers

Joining dataframes of different dimensions with varying merge by criterion

Good evening, I am trying to merge a couple datasets and my normal tools in R are failing me tonight. Consider df1 and df2 below. df1 = data.frame(a = c("a", "b", "c"), b = c("1", "2", "3"), c = c("x", "y",…
Aswiderski
  • 166
  • 9
0
votes
1 answer

Transform sequential 2d array to time-windowed dataset

I have a 2d dataframe: C1. C2. C3 0. 2. 3. 6 1. 8. 2. 1 2. 8. 6. 2 3. 4. 9. 0 4. 6. 7. 1 5. 2. 3. 0 I want it to be a 3d data with So if window size is 5, the shape…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
0 answers

Many-one to one-many mapping in pandas

csv jpg The first column in the image is of union ministry and second column contains schemes introduced in that ministry. I want to create one-many mapping using pandas in jupyter notebook. Output should be something like this: {Department of…
0
votes
1 answer

In a dataframe, replace first 'n' entries of a column with other values from another dataframe

I have a dataframe with a column called 'household'. Household has 2000 rows of entries. Now, I want to replace the first 200 rows with some values, that I have in another column. So the final result of 'household' would be, first 200 is the…
pkha
  • 95
  • 7
0
votes
1 answer

pandas dataframe add rows that are shuffle of values of specific columns

I have the dataframe: df = b_150 h_200 b_250 h_300 b_350 h_400 c1 c2 q4 1. 2. 3. 4 5. 6. 3. 4. 4 I want to add rows with possible shuffles between values of b_150, b_250, b_350 and h_200, h_300, h_400 So for example df…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
2 answers

Aggregate sum of multiple columns by date in pandas

My df looks like this Date Col Col1 01/01/2022 A 500 01/01/2022 B 100 01/01/2022 C 400 02/01/2022 A 400 02/01/2022 B 150 02/01/2022 C 450 My desired output looks…
0
votes
0 answers

pandas dataframe how to take only rows with max value on one col, per group

I have a dataframe with categorical value, numerical values and a counter. I want to keep only rows with the highest counter per category. So, if my dataframe is: df = Tr. N1. N2. counter T1. 0. 3. 0 T1. 2. 3. 0 T1. 7. 1.…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
1 answer

Pre-Processing / Formatting Data

I have two vectors in R: list1 <- c("ABCDEF", "FEDCBA", "AA-BB-CCCC", "ABCDEFGH-IJK", "ZZZZ") list2 <- c("ABCDEF", "FEDCBA:XA", "AA-BB-CCCC-01","AA-BB-CCCC-21:ABC", "ABCDEFGH-IJK-1X", "AKDWXFE-XXY") I'd like to compare…
HelpMeCode
  • 299
  • 2
  • 13
0
votes
1 answer

R function to increment numbers at each row

I'm trying to create a new column "ID" in a dataframe. Each row must have a unique ID incremented by 5 each time. But it should not start at 0, but from a desired number (let's say N = max of another dataset's column). What would be the easiest way…
Andre230
  • 145
  • 1
  • 9
0
votes
2 answers

Merge rows with same index and prioritize column values

I have a dataframe with some duplicate index values with columns containing values for two different experiments. I want to prioritize Col_A if values are present across both index instances. I am working to solve this solution using the following…
Cody Glickman
  • 514
  • 1
  • 8
  • 30
0
votes
1 answer

Pandas get mean value per (row,col) across list of dataframes

I have a dictionary of dataframes: {1 : df1, 2 : df2 .. } All dataframe are with the same shapes. (but different number of rows). I want to create the dataframe where every column is the mean of this column for this row. So if: df1 : A B C …
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
2 answers

Merge columns in Python/Pandas of Dataframe1 from Dataframe2 only if specific column contains at least one of the words of the other column

Consider the Dataframes: Employees: Employee City Ernest Tel Aviv Merry New York Mason Cairo Clients: Client Words Ernest New vacuum Tel Mason Tel Aviv is so pretty Merry Halo! I live in the city York I'm trying to…
JAN
  • 21,236
  • 66
  • 181
  • 318
0
votes
3 answers

Pivot a character vector to a data.frame with specified number of columns

I have a vector of data where every 4th row starts a new observation. I need to pivot the data so that first four values become the first row, the next four values become the second, and so on. Here is a simplified example... Given the following…
Stan
  • 905
  • 9
  • 20