The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent useable form for further processing or input to an algorithm or system.
Questions tagged [data-munging]
236 questions
0
votes
1 answer
How do you assign groups to larger groups dpylr
I would like to assign groups to larger groups in order to assign them to cores for processing. I have 16 cores.This is what I have so far
test<-data_extract%>%group_by(group_id)%>%sample_n(16,replace = TRUE)
This takes staples OF 16 from each…

Dominic Naimool
- 313
- 2
- 11
0
votes
2 answers
Create a column which references other columns and their values?
I am trying to create a time series which shows what the values of a specific Column was at a particular time. All I currently have access to is a table which logs all the changes, the current value of the columns, dates and the names of the column…

Dominic Naimool
- 313
- 2
- 11
0
votes
0 answers
Generating Artificial data from real data
I have a dataframe consisting 2000 rows and 5 features (columns) as follows:
my_data:
Id, f1, f2, f3, f4(target_value)
u1 34 sd 43 1
u1 30 fd 3 0
u1 01 …

Spedo
- 355
- 3
- 13
0
votes
1 answer
Finding missing pair combinations
I have a dataframe.
I'd like to find out which disease is not recorded in an area.
So for example:
Area A does not have Mumps
What I'd like to do is wherever an area doesn't have a disease, I'd like to record a zero in the n column.
I thought it…

damo
- 463
- 4
- 14
0
votes
2 answers
How to write a text column contain digits separated by "," into a .csv file using R
I have a data frame with a column. it contains digit numbers separated by ',' and its type is chr. (I'm using R and Rstudio)
I'm going to write this data frame to a .csv file using the code below, but it changes in the file as a large number in each…

Z_DEV
- 1
- 2
0
votes
1 answer
pandas dataframe group rows based on specific column
I have a table that looks like this:
P_id S_id Time
1 20 A 15
2 30 B 50
3 50 A 99
4 70 A 60
I want to group the table, based on the column "Sid", and sorted by Column "Time" so it will look like this:
…

oren_isp
- 729
- 1
- 7
- 22
0
votes
5 answers
R :Looping through each 5 rows of data frame and imputing incremental value
I am trying to impute incremental values for each 5 rows of the data frame. I am new to R and not sure how to achieve this.
Input data:
state Value
a 1
b 2
a 3
c 4
a 5
e 6
f 7
w 8
f 9
s 10
e …

suri
- 39
- 2
- 10
0
votes
0 answers
Wrangling a time series data set in an awkward format from a text file to a Pandas dataframe in Python
I have a .txt containing a time series data set, formatted in the following manner, as rows separated with \n:
N>New Section
A>1, 2, 3
L>Label_1
G>1, 2, 3
A>3, 2, 1
G>3, 1, 1
A>2, 2, 1
...many rows of G> and A> pairs, of varying…

Robert Constable
- 1
- 3
0
votes
1 answer
pandas dataframe take rows before certain indexes
I have a dataframe and a list of indexes, and I want to get a new dataframe such that for each index (from the given last), I will take the all the preceding rows that matches in the value of the given column at the index.
C1 C2 C3
0 1 2…

oren_isp
- 729
- 1
- 7
- 22
0
votes
0 answers
Keeping columns (but skiping index) in unpivot in geopandas data.frame
I'd like to use to some 'unpivot' method in python to reshape my pandas data frame. However, one of the columns that I want to keep is a 'geometry' column of geopandas after reshaping it. My dataset has 5 columns and there's no way to put this…

Renan Xavier Cortes
- 141
- 1
- 2
- 7
0
votes
1 answer
python pandas create a list group by value
I have a dataframe in python:
pID sID time
0 2133 152414 2018-06-16
1 1721 152912 2018-06-17
2 2264 152912 2018-06-18
I want to create a new table with sID as the key and list of pID:
pID time
152414 2133…

oren_isp
- 729
- 1
- 7
- 22
0
votes
0 answers
Never worked with data structure this messy
I have this file for work (and 7000 others of the same format) that is very messy and not tidy in any way. I've been reading about tidying data using Pandas but feel I'm spinning my wheels at this point...
Here is the raw data viewed in Excel:
Here…

CaptainPlanet
- 43
- 9
0
votes
3 answers
Extracting year from unformatted date character vector
I have a character vector, which represents the year of coverage in an unformatted date, and it like this:
Period of coverage
1 1/1/2011 to 31/12/2011
2 1/1/2010 to 31/12/2010
3 1/1/2012 to 31/12/2012
4 1/1/2010 to 31/12/2010
5 …

Nautica
- 2,004
- 1
- 12
- 35
0
votes
2 answers
Transpose or gather wide to long dataframe with multiple keys and values
I'm trying to transpose a wide dataset to a long tidy one. I use the tidyr::gather() function alot for these kind of tasks, only now I have a pretty weird dataset.
The following is a small version of mine. As you can imagine that the columns with…

Tdebeus
- 1,519
- 5
- 21
- 43
0
votes
0 answers
Is there a "fill-up" or "fill-down" command in r?
Is it possible to "fill up" NA values based on a condition in two other columns?
A similar answer is found here:
Replace missing values (NA) with most recent non-NA by group
This question is different because I need the fill up to be based on the…

Jordan
- 1,415
- 3
- 18
- 44