Questions tagged [data-munging]

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent useable form for further processing or input to an algorithm or system.

236 questions
0
votes
1 answer

How do you assign groups to larger groups dpylr

I would like to assign groups to larger groups in order to assign them to cores for processing. I have 16 cores.This is what I have so far test<-data_extract%>%group_by(group_id)%>%sample_n(16,replace = TRUE) This takes staples OF 16 from each…
Dominic Naimool
  • 313
  • 2
  • 11
0
votes
2 answers

Create a column which references other columns and their values?

I am trying to create a time series which shows what the values of a specific Column was at a particular time. All I currently have access to is a table which logs all the changes, the current value of the columns, dates and the names of the column…
Dominic Naimool
  • 313
  • 2
  • 11
0
votes
0 answers

Generating Artificial data from real data

I have a dataframe consisting 2000 rows and 5 features (columns) as follows: my_data: Id, f1, f2, f3, f4(target_value) u1 34 sd 43 1 u1 30 fd 3 0 u1 01 …
Spedo
  • 355
  • 3
  • 13
0
votes
1 answer

Finding missing pair combinations

I have a dataframe. I'd like to find out which disease is not recorded in an area. So for example: Area A does not have Mumps What I'd like to do is wherever an area doesn't have a disease, I'd like to record a zero in the n column. I thought it…
damo
  • 463
  • 4
  • 14
0
votes
2 answers

How to write a text column contain digits separated by "," into a .csv file using R

I have a data frame with a column. it contains digit numbers separated by ',' and its type is chr. (I'm using R and Rstudio) I'm going to write this data frame to a .csv file using the code below, but it changes in the file as a large number in each…
Z_DEV
  • 1
  • 2
0
votes
1 answer

pandas dataframe group rows based on specific column

I have a table that looks like this: P_id S_id Time 1 20 A 15 2 30 B 50 3 50 A 99 4 70 A 60 I want to group the table, based on the column "Sid", and sorted by Column "Time" so it will look like this: …
oren_isp
  • 729
  • 1
  • 7
  • 22
0
votes
5 answers

R :Looping through each 5 rows of data frame and imputing incremental value

I am trying to impute incremental values for each 5 rows of the data frame. I am new to R and not sure how to achieve this. Input data: state Value a 1 b 2 a 3 c 4 a 5 e 6 f 7 w 8 f 9 s 10 e …
suri
  • 39
  • 2
  • 10
0
votes
0 answers

Wrangling a time series data set in an awkward format from a text file to a Pandas dataframe in Python

I have a .txt containing a time series data set, formatted in the following manner, as rows separated with \n: N>New Section A>1, 2, 3 L>Label_1 G>1, 2, 3 A>3, 2, 1 G>3, 1, 1 A>2, 2, 1 ...many rows of G> and A> pairs, of varying…
0
votes
1 answer

pandas dataframe take rows before certain indexes

I have a dataframe and a list of indexes, and I want to get a new dataframe such that for each index (from the given last), I will take the all the preceding rows that matches in the value of the given column at the index. C1 C2 C3 0 1 2…
oren_isp
  • 729
  • 1
  • 7
  • 22
0
votes
0 answers

Keeping columns (but skiping index) in unpivot in geopandas data.frame

I'd like to use to some 'unpivot' method in python to reshape my pandas data frame. However, one of the columns that I want to keep is a 'geometry' column of geopandas after reshaping it. My dataset has 5 columns and there's no way to put this…
0
votes
1 answer

python pandas create a list group by value

I have a dataframe in python: pID sID time 0 2133 152414 2018-06-16 1 1721 152912 2018-06-17 2 2264 152912 2018-06-18 I want to create a new table with sID as the key and list of pID: pID time 152414 2133…
oren_isp
  • 729
  • 1
  • 7
  • 22
0
votes
0 answers

Never worked with data structure this messy

I have this file for work (and 7000 others of the same format) that is very messy and not tidy in any way. I've been reading about tidying data using Pandas but feel I'm spinning my wheels at this point... Here is the raw data viewed in Excel: Here…
0
votes
3 answers

Extracting year from unformatted date character vector

I have a character vector, which represents the year of coverage in an unformatted date, and it like this: Period of coverage 1 1/1/2011 to 31/12/2011 2 1/1/2010 to 31/12/2010 3 1/1/2012 to 31/12/2012 4 1/1/2010 to 31/12/2010 5 …
Nautica
  • 2,004
  • 1
  • 12
  • 35
0
votes
2 answers

Transpose or gather wide to long dataframe with multiple keys and values

I'm trying to transpose a wide dataset to a long tidy one. I use the tidyr::gather() function alot for these kind of tasks, only now I have a pretty weird dataset. The following is a small version of mine. As you can imagine that the columns with…
Tdebeus
  • 1,519
  • 5
  • 21
  • 43
0
votes
0 answers

Is there a "fill-up" or "fill-down" command in r?

Is it possible to "fill up" NA values based on a condition in two other columns? A similar answer is found here: Replace missing values (NA) with most recent non-NA by group This question is different because I need the fill up to be based on the…
Jordan
  • 1,415
  • 3
  • 18
  • 44