Questions tagged [data-wrangling]

1242 questions
4
votes
5 answers

R: create new rows from preexistent dataframe

I want to create new rows based on the value of pre-existent rows in my dataset. There are two catches: first, some cell values need to remain constant while others have to increase by +1. Second, I need to cycle through every row the same amount of…
YouLocalRUser
  • 309
  • 1
  • 9
4
votes
2 answers

Rearranging data by year

I have a table with 30,000 observations and a snippet looks like this: x <- rep(c("TX"), times=10) y <- rep(c("CA"), times=10) z <- rep(c("WI"), times=10) State <- c(x, y, z) Proj_ID <- c("TX01", "TX02", "TX03", "TX04", "TX05", "TX06", "TX07",…
Tathagato
  • 348
  • 1
  • 11
4
votes
3 answers

Updating based on Condition of Previous Occurrence

I have a data frame stim1 stim2 Chosen Rejected 1: 2 1 2 1 2: 3 2 2 3 3: 3 1 1 3 4: 2 3 3 2 5: 1 3 1 3 My objective is at each trial to add a…
user15791858
  • 175
  • 5
4
votes
2 answers

Use RSQLite to manipulate data frame in r directly using SQL

I have a data set of the form that I would like to change to this form below in R using SQL. I know that I could do this daily simply with dplyr but the point here is to learn to use SQL to create and manipulate a small relational database. Price…
user849541
  • 176
  • 8
4
votes
3 answers

How to get Pandas df.merge() mismatch column name

Given the following data: data_df = pd.DataFrame({ "Reference": ("A", "A", "A", "B", "C", "C", "D", "E"), "Value1": ("U", "U", "U--","V", "W", "W--", "X", "Y"), "Value2": ("u", "u--", "u","v", "w", "w", "x", "y") }, index=[1, 2, 3,…
Ricardo Sanchez
  • 4,935
  • 11
  • 56
  • 86
4
votes
4 answers

Top "n" rows of each group using dplyr -- with different number per group

I'll use the built-in chickwts data as an example. Here's the data, there are 5 feed types. > head(chickwts) weight feed 1 179 horsebean 2 160 horsebean 3 136 horsebean 4 227 horsebean 5 217 horsebean 6 168 horsebean >…
max
  • 4,141
  • 5
  • 26
  • 55
4
votes
3 answers

Check if values of one dataframe exist in another dataframe in exact order

I have 1 dataframe of data and multiple "reference" dataframes. I'm trying to automate checking if values of the dataframe match the values of the reference dataframes. Importantly, the values must also be in the same order as the values in the…
psychcoder
  • 543
  • 3
  • 14
3
votes
1 answer

Data wrangling problem with labelled sound files

Let's say I have a large dataframe with a column for 'soundfile' and then 'start and 'end' columns for when a particular bird is vocalising. Each vocalisation can vary significantly in length. An example of the dataframe is sound_df below. Each row…
davidj444
  • 115
  • 5
3
votes
5 answers

Selecting number from string based on criteria

I have the following data set: PATH = c("5-8-10-8-17-20", "56-85-89-89-0-15-88-10", "58-85-89-65-49-51") INDX = c(18, 89, 50) data.frame(PATH,…
R_Student
  • 624
  • 2
  • 14
3
votes
1 answer

Error when merging: Error in `vectbl_as_row_location()`: ! Must subset rows with a valid subscript vector. x Subscript `x` has the wrong type

I am trying to merge two dataframes in r, and this error message keeps coming up even though the variable types all should be correct. Here is my code: team_info <- baseballr::mlb_teams(season = 2022) team_info_mlb <- subset(team_info, sport_name ==…
sproff22
  • 31
  • 2
3
votes
2 answers

Summarizing repeated items by student attempt

Problem and explanation of the expected output I have raw data of how many attempts each student had in an exam and which items they responded (input). For this example, I have a pool of 5 items, but students only respond to 3 of those items. They…
Ruam Pimentel
  • 1,288
  • 4
  • 16
3
votes
1 answer

How to sum across rows with all NAs to be 0/NA

I have a dataframe: dat <- data.frame(X1 = c(0, NA, NA), X2 = c(1, NA, NA), X3 = c(1, NA, NA), Y1 = c(1, NA, NA), Y2 = c(NA, NA, NA), Y3 = c(0, NA, NA)) I…
jo_
  • 677
  • 2
  • 11
3
votes
2 answers

Fill up missing values based on other entries on R

I have dataset input with a couple of missing values. and I have to create dataset output with the following logic: If there is a missing in any of the columns b, c, or d, then check the correspondent a column and fill up the missing with…
Ruam Pimentel
  • 1,288
  • 4
  • 16
3
votes
2 answers

Inserting new values into a data frame using mutate and case_when in dplyr

I have the following data frame of letters with some blank (NA) slots for the lower cases letters_df <- data.frame(caps = LETTERS[1:10], lows = letters[c(1,2,11,11,11,11,11,11,11,10)]) letters_df[letters_df == "k"] <- NA letters_df To fill in some…
rainbird
  • 193
  • 1
  • 9
3
votes
5 answers

Add +1 (>1) after every time a condition is met

(sorry, really don't know how to better phrase this question) I have a column "have" with 1s and 0s. I want to create a new column "want" where, each time a 1 has occurred, the value of 0 increases to 2, then 3, then 4, etc. 0 should never be…
user303287
  • 131
  • 5
1
2
3
82 83