I have the df1 data
df1 <- data.frame(id=c("A","A","A","A","B","B","B","B"),
year=c(2014,2014,2015,2015),
month=c(1,2),
new.employee=c(4,6,2,6,23,2,5,34))
id year month new.employee
1 A 2014 1 4
2 A 2014 2 6
3 A 2015 1 2
4 A 2015 2 6
5 B 2014 1 23
6 B 2014 2 2
7 B 2015 1 5
8 B 2015 2 34
and desired outcome with following functions:
library(data.table) # V1.9.6+
temp <- setDT(df1)[month == 2L, .(id, frank(-new.employee)), by = year]
df1[temp, new.employee.rank := i.V2, on = c("year", "id")]
df1
# id year month new.employee new.employee.rank
# 1: A 2014 1 4 1
# 2: A 2014 2 6 1
# 3: A 2015 1 2 2
# 4: A 2015 2 6 2
# 5: B 2014 1 23 2
# 6: B 2014 2 2 2
# 7: B 2015 1 5 1
# 8: B 2015 2 34 1
Now, I want to datamining by creating a user-defined function to varying the input, which is new.employee in above example. I tried some ways but they did not work:
the first try:
myRank <- function(data,var) { temp <- setDT(data)[month == 2L, .(id, frank(-var)), by = year] data[temp, new.employee.rank := i.V2, on = c("year", "id")] return(data) } myRank(df1,new.employee)
Error in is.data.frame(x) : object 'new.employee' not found
the second try:
myRank(df1,df1$new.employee)
nothing appeared
The third try: I change the function a bit
myRank <- function(data,var) { temp <- setDT(data)[month == 2L, .(id, rank(data$var)), by = year] data[temp, new.employee.rank := i.V2, on = c("year", "id")] return(data) }
myRank(df1,df1$new.employee) Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In
[.data.table
(setDT(data), month == 2L, .(id, rank(data$var)), : Item 2 of j's result for group 1 is zero length. This will be filled with 2 NAs to match the longest column in this result. Later groups may have a similar problem but only the first is reported to save filling the warning buffer. 3: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
I looked at similar problems but my R experience is not good enough to understand those.