2

I am completely new to any kind of coding, nevermind R in particular, so my days of googling have not been very helpful. I would really appreciate any kind of help/insights!

I would like to know how to get two new variables out of the original variable, and attach new values to it - basically I start with this:

starting point

and want to obtain this:

desired result

I managed to get it in long format with melt(dataname, id.vars=c("ID")) and the ID & value I get are good. But there is only one variable with my four headers (loudHot, quietHot, loudCold, quietCold) repeated - how do I create two new variables out of this and assign the values to it (e.g. that "Volume" has the value 1 when the original variable is loudHot or loudCold and 0 if its quietHot or quietCold, and then "Temp" is 1 when the original variable is loudHot or quietHot and 0 when its loudCold or quietCold)?

DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25
Bommby
  • 35
  • 4
  • Thank you so much everyone for taking the time to help me, I really appreciate it - especially as I didn’t give you the data properly (thank you for making me aware in that regard too), so thanks for taking the extra steps! – Bommby Feb 04 '22 at 13:49

3 Answers3

4

I wouldn't be too hard on yourself - this isn't really trivial. Anyway, you can use pivot_longer from tidyr and some data manipulation with dplyr to achieve your desired outcome:

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(-ID) %>%
  mutate(Volume = as.numeric(grepl("loud", name)),
         Temp   = as.numeric(grepl("Hot",  name))) %>%
  select(ID, Volume, Temp, value)
#> # A tibble: 32 x 4
#>       ID Volume  Temp value
#>    <dbl>  <dbl> <dbl> <dbl>
#>  1     2      1     1    14
#>  2     2      0     1    16
#>  3     2      1     0    16
#>  4     2      0     0    15
#>  5     4      1     1    19
#>  6     4      0     1    15
#>  7     4      1     0    10
#>  8     4      0     0     8
#>  9     6      1     1    11
#> 10     6      0     1    17
#> # ... with 22 more rows

Data

df <- data.frame(ID        = (1:8) * 2,
                 loudHot   = c(14, 19, 11, 20, 18, 17, 16, 2),
                 quietHot  = c(16, 15, 17, 5, 10, 10, 15, 0),
                 loudCold  = c(16, 10, 10, 4, 3, 2, 14, 2),
                 quietCold = c(15, 8, 17, 8 ,10, 12, 5, 0))

As a tip for any future SO questions, please don't post images of data. Folks here need to be able to cut and paste the text of your data to test and verify solutions. Ideally, you should do this by the output of the dput function into a code block. People rarely go to the effort of manually transcribing data from your images.

Created on 2022-02-04 by the reprex package (v2.0.1)

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
1

Lest approach your problem using dplyr an tidyr packages.

The first recommendation for you is to always add a minimal reproducible example of your data in order for us to use it and help you faster. This is not complicated, you can use the dput(head(yourdata, 10)), for example, or simulate some observations.

I did a simulation as follow:

library(dplyr)
library(tidyr)

data <- data.frame(
  id = 1:5,
  loudHot = sample(10:20, 5, replace = TRUE),
  quieHot = sample(10:20, 5, replace = TRUE),
  loudCold = sample(0:12, 5, replace = TRUE),
  quiteCold = sample(0:12, 5, replace = TRUE)
)

Now that we have the data, lest turn it to long format using tidyr::pivot_longer. This function recibe as argument the dataframe in wide format, de columns you want to gather (or those you do not want to gather using the - symbol).

# Data to long format
data_long <- pivot_longer(
  data, cols = -id, 
  names_to = 'variable', values_to = 'value'
  )

With that, now you only have to create the dummys, which is simple.

# Adding new variables
data_with_dummy <- mutate(
  data_long,
  volume = as.numeric(variable %in% c('loudHot', "loudCold")),
  temp = as.numeric(variable %in% c('loudHot', "quietCold"))
  )
Johan Rosa
  • 2,797
  • 10
  • 18
1

Here's a base R approach:

# Original data
df <- data.frame(
  ID = c(2, 4, 5, 7, 8, 11, 12, 16),
  loudHot = c(14, 19, 11, 20, 18, 17, 16, 2),
  quietHot = c(16, 15, 17, 5, 10, 10, 15, 0),
  loudCold = c(16, 10, 10, 4, 3, 2, 14, 2),
  quietCold = c(15, 8, 17, 8, 10, 12, 5, 0)
)

# Stacked data
df_stacked <- stack(
  df,
  select = c(
    "loudHot", "quietHot", "loudCold", "quietCold"
  )
)

# New variable for volume
df_stacked$Volume <- as.numeric(grepl("loud", df_stacked$ind))

# New variable for Temp
df_stacked$Temp <- as.numeric(grepl("Hot", df_stacked$ind))

# Replace "ind" values with "ID"
df_stacked$ind <- rep(df$ID, times = 4)

# Reorder columns
new_df <- df_stacked[,c(2:4,1)]

# Rename columns
colnames(new_df) <- c("ID", "Volume", "Temp", "Value")

# Order by ID
new_df[order(new_df$ID),]

I believe your columns for "Volume" and "Temp" should be alternating sequences:

Result of R Code