Highest Voted 'data-munging' Questions

0

votes

2 answers

How to filter lists within a list in R iteratively or how to filter a data.table using two criteria simultaneously, creating objects at run time

I'm working on a data.table which contains, among other data, the demand for certain products on certain stores of a business franchise. The goal is to predict the demand for every single product on every single store. Here is a "head" of my…

r data-science data-wrangling data-munging

asked Mar 24 '21 at 20:19

Kíron Hashimoto

1
2

0

votes

1 answer

filter specific values in dataframe with unique prefix in column name (e.g. 'UniqueID_commonsuffix')

I have a dataframe with > 300 unique samples, there are 2 columns of similar information per sample, and I'd like to filter for 34 specific values in one of those columns per sample. I've included a screenshot of the data to help visualize this…

python pandas dataframe filtering data-munging

asked Jan 05 '21 at 17:07

srajpara

51
1
9

0

votes

1 answer

Turn user product views into network matrix/graph in python spark (pyspark)

I'm working with website data that includes user ID's and the products/items those users viewed. I've created a pyspark dataframe that looks something like this: +--------+----------+-------+----------+---------+ | UserId| productA| itemB| …

python pyspark parallel-processing dot-product data-munging

asked Dec 04 '20 at 14:14

Jed

1,823
4
20
52

0

votes

3 answers

pandas get percentile of value withing

I have a dataframe: d = [f1 f2 f3 1 2 3 5 1 2 3 3 1 2 4 7 .. .. ..] I want to add, per feature, the percentile of the value for this feature in the row (for subset of features). So for subset =…

python pandas dataframe data-science data-munging

asked Nov 30 '20 at 08:11

Cranjis

1,590
8
31
64

0

votes

1 answer

Python pandas dataframe merge 2 rows into one with keys

I have 2 dataframe, each has a single row, with the same columns: df1 = feat_1 feat_2 ... feat_n a b c d ... z A B N 1 2 3 4 9 df2 = feat_1 feat_2 ... feat_n a b c d ... z A B N 5 6 1 8 …

python pandas dataframe data-science data-munging

asked Oct 26 '20 at 12:05

Cranjis

1,590
8
31
64

0

votes

2 answers

How to assign NA to "NO"

In my dataset I have one column where I need to replace blanks to "No", How can I do this? The column is a character variable and had only two values, yes and no. data<- as.data.frame(Upper_GI_2ww) data$`Direct to Test?` = ifelse(nchar(data$`Direct…

r data-munging

asked Sep 30 '20 at 15:42

tashu

11
5

0

votes

1 answer

Create new dataframe from repeated exposure and participants and only add new data

Happy Tuesday. I am currently collecting survey data. The surveys sometimes ask the same questions and other times do not. Why? Because there is 700+ questions and asking a participant to answer all of these (without payment) is not very…

r dplyr data-munging

asked Aug 12 '20 at 00:41

Aswiderski

166
9

0

votes

1 answer

Python pandas apply function on columns value (base on columns names patern)

I have a dataframe: a b val1_b1 val1_b2 val2_b1 val2_v2 1 2 5 9 4 6 I want to take the max by column group, so the dataframe will be: a b val1 val2 1 2 9 6 or the RMS: a b val1 val2 1 2 sqrt(106) …

python pandas dataframe data-munging

asked Aug 11 '20 at 05:55

Cranjis

1,590
8
31
64

0

votes

1 answer

data frame with mixed date format

I would like to change all the mixed date format into one format for example d-m-y here is the data frame x <- data.frame("Name" = c("A","B","C","D","E"), "Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25")) I hv tried…

r data-science data-munging data-wrangling

asked Jun 25 '20 at 16:03

learner96

1

0

votes

2 answers

What is the easiest way to split a string into a first name and last name?

The dataset has 14k rows and has many titles, etc. I am a beginner in Pandas and Python and I'd like to know how to proceed with getting the output of first name and last name from this dataset. Dataset: 0 Pr.Doz.Dr. Klaus Semmler Facharzt…

python-3.x pandas jupyter-notebook data-munging data-wrangling

asked May 19 '20 at 13:47

jerof

73
1
8

0

votes

1 answer

How to find the average of a list of elements imbedded in a Pandas data frame column

I'm the process of cleaning a data frame, and one particular column contains values that are comprised of lists. I'm trying to find the average of those lists and update the existing column with an int while preserving the indices. I can…

python pandas data-munging

asked Apr 22 '20 at 23:23

Cole

1
1

0

votes

1 answer

Dataframe apply balanced allocation of rows to lists based on row type

I have a dataframe with 192 rows , each rows represent a sentence with some metadata and a specific type (H or L). So, for each type I have total of 96 sentences. I need to allocate them to 10 different lists with the following conditions: Each…

python random data-science python-itertools data-munging

asked Feb 25 '20 at 09:52

Cranjis

1,590
8
31
64

0

votes

2 answers

dummy code collapsed column using data.table in R

It is quite easy to dummy code a collapsed column using the tidyverse. Here is a quick example of how I've done it in the past. First, I'll load the iris data and create a custom collapsed column of randomly sampled letters: library(tidyverse) #…

r data.table tidyverse data-munging

asked Feb 20 '20 at 18:25

Trent

771
5
19

0

votes

0 answers

How to create an array that repeats a filter function for every unique value in another column in Google Sheets

I have a sheet that does auto-analysis of surveys, and I'd like to have it analyze subsets of the data based on another variable. I've got two separate tables: Table 1 is the variable I want to group the analysis by, and Table 2 has all the…

arrays google-sheets data-munging

asked Feb 06 '20 at 21:04

Josh

311
3
11

0

votes

1 answer

R : Filling in missing values in a column based on other columns

I have a large data set where each zipcodes have their corresponding latitude and longitude. In the data set some zipcodes are missing. I need to fill in the missing zipcodes on the basis of their corresponding lat long where that data is not…

r dataframe missing-data data-cleaning data-munging

asked Jan 27 '20 at 20:07

wickedpanda

17
7

Questions tagged [data-munging]