Questions tagged [data-munging]

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent useable form for further processing or input to an algorithm or system.

236 questions
0
votes
2 answers

How to filter lists within a list in R iteratively or how to filter a data.table using two criteria simultaneously, creating objects at run time

I'm working on a data.table which contains, among other data, the demand for certain products on certain stores of a business franchise. The goal is to predict the demand for every single product on every single store. Here is a "head" of my…
0
votes
1 answer

filter specific values in dataframe with unique prefix in column name (e.g. 'UniqueID_commonsuffix')

I have a dataframe with > 300 unique samples, there are 2 columns of similar information per sample, and I'd like to filter for 34 specific values in one of those columns per sample. I've included a screenshot of the data to help visualize this…
srajpara
  • 51
  • 1
  • 9
0
votes
1 answer

Turn user product views into network matrix/graph in python spark (pyspark)

I'm working with website data that includes user ID's and the products/items those users viewed. I've created a pyspark dataframe that looks something like this: +--------+----------+-------+----------+---------+ | UserId| productA| itemB| …
Jed
  • 1,823
  • 4
  • 20
  • 52
0
votes
3 answers

pandas get percentile of value withing

I have a dataframe: d = [f1 f2 f3 1 2 3 5 1 2 3 3 1 2 4 7 .. .. ..] I want to add, per feature, the percentile of the value for this feature in the row (for subset of features). So for subset =…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
1 answer

Python pandas dataframe merge 2 rows into one with keys

I have 2 dataframe, each has a single row, with the same columns: df1 = feat_1 feat_2 ... feat_n a b c d ... z A B N 1 2 3 4 9 df2 = feat_1 feat_2 ... feat_n a b c d ... z A B N 5 6 1 8 …
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
2 answers

How to assign NA to "NO"

In my dataset I have one column where I need to replace blanks to "No", How can I do this? The column is a character variable and had only two values, yes and no. data<- as.data.frame(Upper_GI_2ww) data$`Direct to Test?` = ifelse(nchar(data$`Direct…
tashu
  • 11
  • 5
0
votes
1 answer

Create new dataframe from repeated exposure and participants and only add new data

Happy Tuesday. I am currently collecting survey data. The surveys sometimes ask the same questions and other times do not. Why? Because there is 700+ questions and asking a participant to answer all of these (without payment) is not very…
Aswiderski
  • 166
  • 9
0
votes
1 answer

Python pandas apply function on columns value (base on columns names patern)

I have a dataframe: a b val1_b1 val1_b2 val2_b1 val2_v2 1 2 5 9 4 6 I want to take the max by column group, so the dataframe will be: a b val1 val2 1 2 9 6 or the RMS: a b val1 val2 1 2 sqrt(106) …
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
1 answer

data frame with mixed date format

I would like to change all the mixed date format into one format for example d-m-y here is the data frame x <- data.frame("Name" = c("A","B","C","D","E"), "Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25")) I hv tried…
0
votes
2 answers

What is the easiest way to split a string into a first name and last name?

The dataset has 14k rows and has many titles, etc. I am a beginner in Pandas and Python and I'd like to know how to proceed with getting the output of first name and last name from this dataset. Dataset: 0 Pr.Doz.Dr. Klaus Semmler Facharzt…
0
votes
1 answer

How to find the average of a list of elements imbedded in a Pandas data frame column

I'm the process of cleaning a data frame, and one particular column contains values that are comprised of lists. I'm trying to find the average of those lists and update the existing column with an int while preserving the indices. I can…
Cole
  • 1
  • 1
0
votes
1 answer

Dataframe apply balanced allocation of rows to lists based on row type

I have a dataframe with 192 rows , each rows represent a sentence with some metadata and a specific type (H or L). So, for each type I have total of 96 sentences. I need to allocate them to 10 different lists with the following conditions: Each…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
2 answers

dummy code collapsed column using data.table in R

It is quite easy to dummy code a collapsed column using the tidyverse. Here is a quick example of how I've done it in the past. First, I'll load the iris data and create a custom collapsed column of randomly sampled letters: library(tidyverse) #…
Trent
  • 771
  • 5
  • 19
0
votes
0 answers

How to create an array that repeats a filter function for every unique value in another column in Google Sheets

I have a sheet that does auto-analysis of surveys, and I'd like to have it analyze subsets of the data based on another variable. I've got two separate tables: Table 1 is the variable I want to group the analysis by, and Table 2 has all the…
Josh
  • 311
  • 3
  • 11
0
votes
1 answer

R : Filling in missing values in a column based on other columns

I have a large data set where each zipcodes have their corresponding latitude and longitude. In the data set some zipcodes are missing. I need to fill in the missing zipcodes on the basis of their corresponding lat long where that data is not…