1

I want to find the difference between rows based on specific criteria. I managed to do this using dplyr and the mutate function with lag. I have about 10 columns and 500 rows. I am able to find the difference for most rows in columns except for a couple. The problem is that the two of the columns are in factors form and using my code leads to the warning message: in ops.factor not meaningful for factors. To combat this, I tried changing numeric to character.

y <- mutate(df, d_f = df$L - lag(df$L) + n())    

x <- as.numeric(as.character(df$z))

This leads to a warning message. Using suppresswarnings(x) leads to all values in the column to become NA by coercion.

How can I change the factors to a different form so that I can find the difference between the rows? The columns causing this problem are in percentages if that makes any difference.

On a side note: I'm new to R and it does seem pretty cool.

Frank
  • 66,179
  • 8
  • 96
  • 180
Ice
  • 21
  • 1
  • 2

1 Answers1

1

Example Data

df <- data.frame(
    id = c("A", "A", "A", "A", "B", "B", "B"), 
    num = c("1", "8", "6", "3", "7", "7", "9"))

Solution with dplyr

library(dplyr)
df_new <- df %>% 
    # factor to numeric
    mutate(num = as.numeric(as.character(num))) %>% 
    # group by condition
    group_by(id) %>% 
    # find difference
    mutate(diff = num - lag(num))

Output

df_new
#       id   num  diff
#   <fctr> <dbl> <dbl>
# 1      A     1    NA
# 2      A     8     7
# 3      A     6    -2
# 4      A     3    -3
# 5      B     7    NA
# 6      B     7     0
# 7      B     9     2
emehex
  • 9,874
  • 10
  • 54
  • 100