1

Forgive me in advance for trying to use my excel logic in R, but I can't seem to figure this out. In a function, given X I am trying to find out if the row prior to it has a greater value or not using simple logic. If it is, show in the new column as "yes" if not "no".

Here is the sample data:

temp <- data
GetFUNC<- function(x){
         temp <- cbind(temp, NewCol = ifelse(temp[2:nrow(temp),8] > temp[1:(nrow(temp)-1),8], "yes","no"))
         write.csv(temp, file = paste0(x,".csv"))
}
lapply(example,GetFUNC)

Just so you can see column 8 it looks like this:

testdata$numbers
 [1] 32216510 10755328  8083097  6878500  8377025  6469979 10675856  8189887  5337239
[10]  5156737

The error:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 11, 10

Thanks for any insight you can provide!

frameworkgeek
  • 203
  • 3
  • 13
  • 1
    your NewCol is missing the first element. You can try `NewCol = c(NA, ifelse[2:... )`. Also, where are you using your `x`? what is `example`? – Damiano Fantini Aug 30 '17 at 21:57
  • Also, take a look at `?diff` - which will return the `diff`erence between values. So you can do `c(NA,diff(nums) < 0)` for instance. – thelatemail Aug 30 '17 at 21:58

2 Answers2

3

There are several problems:

  • You don't need lapply since all the operations you are using are already vectorized.
  • : binds more tightly than - (see ?Syntax) so 1:(nrow(temp)-1 means (1:(nrow(temp))-1. You want 1:(nrow(temp)-1) For example, compare these:

    3:5-1
    ## [1] 2 3 4
    
    (3:5) - 1   # same
    ## [1] 2 3 4
    
    3:(5-1)    # different
    ## [1] 3 4
    
  • even if the last one is corrected your ifelse expression returns a vector which is one smaller than the number of rows in testdata. Add on an NA at the beginning.

1) Even better would be this assuming the input data frame is testdata and defined as in the Note at the end:

transform(testdata, NewCol = c(NA, ifelse(diff(numbers) < 0, "yes", "no")))

giving:

    numbers NewCol
1  32216510   <NA>
2  10755328    yes
3   8083097    yes
4   6878500    yes
5   8377025     no
6   6469979    yes
7  10675856     no
8   8189887    yes
9   5337239    yes
10  5156737    yes

2) The above is likely what you want but here is a second solution using rollapplyr in the zoo package. It takes a rolling window of length 2 and performs a diff on each one filling the first value with NA.

library(zoo)

transform(testdata, New = ifelse(rollapplyr(numbers, 2, diff, fill = NA) < 0, "yes", "no"))

Note: The input testdata in reproducible form is:

testdata <- data.frame(numbers = c(32216510, 10755328, 8083097, 6878500, 
    8377025 , 6469979, 10675856, 8189887, 5337239, 5156737))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • I like solution 1 the best, but it isn't working in the lapply. When I try it in the above function it is returning "no" for every row. However, if I isolate it like you have listed there, it works fine. Any suggestions? – frameworkgeek Aug 30 '17 at 22:34
  • If your problem is to produce a yes or no for each element in the numbers column of testdata then, as stated at the top of the answer, you don't need `lapply`. If the problem is otherwise then please clarify. – G. Grothendieck Aug 30 '17 at 22:39
  • My fault, I left out a step in the lapply which is to save to CSV. So I was hoping it would work within lapply. – frameworkgeek Aug 30 '17 at 22:49
1

Here's a dplyr solution using lag to look at the previous row and mutate to add the new column.

library(dplyr)
df1 <- data.frame(numbers = c(32216510, 10755328, 8083097, 6878500, 8377025,
                               6469979, 10675856, 8189887, 5337239, 5156737))

df1 %>% 
  mutate(NewCol = ifelse(lag(numbers) > numbers, "yes", "no"))

    numbers NewCol
1  32216510   <NA>
2  10755328    yes
3   8083097    yes
4   6878500    yes
5   8377025     no
6   6469979    yes
7  10675856     no
8   8189887    yes
9   5337239    yes
10  5156737    yes
neilfws
  • 32,751
  • 5
  • 50
  • 63