Questions tagged [data-wrangling]
1242 questions
4
votes
5 answers
R: create new rows from preexistent dataframe
I want to create new rows based on the value of pre-existent rows in my dataset. There are two catches: first, some cell values need to remain constant while others have to increase by +1. Second, I need to cycle through every row the same amount of…

YouLocalRUser
- 309
- 1
- 9
4
votes
2 answers
Rearranging data by year
I have a table with 30,000 observations and a snippet looks like this:
x <- rep(c("TX"), times=10)
y <- rep(c("CA"), times=10)
z <- rep(c("WI"), times=10)
State <- c(x, y, z)
Proj_ID <- c("TX01", "TX02", "TX03", "TX04", "TX05", "TX06", "TX07",…

Tathagato
- 348
- 1
- 11
4
votes
3 answers
Updating based on Condition of Previous Occurrence
I have a data frame
stim1 stim2 Chosen Rejected
1: 2 1 2 1
2: 3 2 2 3
3: 3 1 1 3
4: 2 3 3 2
5: 1 3 1 3
My objective is at each trial to add a…

user15791858
- 175
- 5
4
votes
2 answers
Use RSQLite to manipulate data frame in r directly using SQL
I have a data set of the form
that I would like to change to this form below in R using SQL.
I know that I could do this daily simply with dplyr but the point here is to learn to use SQL to create and manipulate a small relational database.
Price…

user849541
- 176
- 8
4
votes
3 answers
How to get Pandas df.merge() mismatch column name
Given the following data:
data_df = pd.DataFrame({
"Reference": ("A", "A", "A", "B", "C", "C", "D", "E"),
"Value1": ("U", "U", "U--","V", "W", "W--", "X", "Y"),
"Value2": ("u", "u--", "u","v", "w", "w", "x", "y")
}, index=[1, 2, 3,…

Ricardo Sanchez
- 4,935
- 11
- 56
- 86
4
votes
4 answers
Top "n" rows of each group using dplyr -- with different number per group
I'll use the built-in chickwts data as an example.
Here's the data, there are 5 feed types.
> head(chickwts)
weight feed
1 179 horsebean
2 160 horsebean
3 136 horsebean
4 227 horsebean
5 217 horsebean
6 168 horsebean
>…

max
- 4,141
- 5
- 26
- 55
4
votes
3 answers
Check if values of one dataframe exist in another dataframe in exact order
I have 1 dataframe of data and multiple "reference" dataframes. I'm trying to automate checking if values of the dataframe match the values of the reference dataframes. Importantly, the values must also be in the same order as the values in the…

psychcoder
- 543
- 3
- 14
3
votes
1 answer
Data wrangling problem with labelled sound files
Let's say I have a large dataframe with a column for 'soundfile' and then 'start and 'end' columns for when a particular bird is vocalising. Each vocalisation can vary significantly in length. An example of the dataframe is sound_df below. Each row…

davidj444
- 115
- 5
3
votes
5 answers
Selecting number from string based on criteria
I have the following data set:
PATH = c("5-8-10-8-17-20",
"56-85-89-89-0-15-88-10",
"58-85-89-65-49-51")
INDX = c(18, 89, 50)
data.frame(PATH,…

R_Student
- 624
- 2
- 14
3
votes
1 answer
Error when merging: Error in `vectbl_as_row_location()`: ! Must subset rows with a valid subscript vector. x Subscript `x` has the wrong type
I am trying to merge two dataframes in r, and this error message keeps coming up even though the variable types all should be correct.
Here is my code:
team_info <- baseballr::mlb_teams(season = 2022)
team_info_mlb <- subset(team_info, sport_name ==…

sproff22
- 31
- 2
3
votes
2 answers
Summarizing repeated items by student attempt
Problem and explanation of the expected output
I have raw data of how many attempts each student had in an exam and which items they responded (input). For this example, I have a pool of 5 items, but students only respond to 3 of those items. They…

Ruam Pimentel
- 1,288
- 4
- 16
3
votes
1 answer
How to sum across rows with all NAs to be 0/NA
I have a dataframe:
dat <- data.frame(X1 = c(0, NA, NA),
X2 = c(1, NA, NA),
X3 = c(1, NA, NA),
Y1 = c(1, NA, NA),
Y2 = c(NA, NA, NA),
Y3 = c(0, NA, NA))
I…

jo_
- 677
- 2
- 11
3
votes
2 answers
Fill up missing values based on other entries on R
I have dataset input with a couple of missing values. and I have to create dataset output with the following logic:
If there is a missing in any of the columns b, c, or d, then
check the correspondent a column and fill up the missing with…

Ruam Pimentel
- 1,288
- 4
- 16
3
votes
2 answers
Inserting new values into a data frame using mutate and case_when in dplyr
I have the following data frame of letters with some blank (NA) slots for the lower cases
letters_df <- data.frame(caps = LETTERS[1:10], lows = letters[c(1,2,11,11,11,11,11,11,11,10)])
letters_df[letters_df == "k"] <- NA
letters_df
To fill in some…

rainbird
- 193
- 1
- 9
3
votes
5 answers
Add +1 (>1) after every time a condition is met
(sorry, really don't know how to better phrase this question)
I have a column "have" with 1s and 0s. I want to create a new column "want" where, each time a 1 has occurred, the value of 0 increases to 2, then 3, then 4, etc. 0 should never be…

user303287
- 131
- 5