2

I'm trying to sum the digits of integers in the last 2 columns of my data frame. I have found a function that does the summing, but I think I may have an issue with applying the function - not sure?

Dataframe
a = c("a", "b", "c")
b = c(1, 11, 2)
c = c(2, 4, 23)
data <- data.frame(a,b,c)

#Digitsum function
digitsum <- function(x) sum(floor(x / 10^(0:(nchar(as.character(x)) - 1))) %% 10)

#Applying function
data[2:3] <- lapply(data[2:3], digitsum)

This is the error that I get:

*Warning messages:
1: In 0:(nchar(as.character(x)) - 1) :
  numerical expression has 3 elements: only the first used
2: In 0:(nchar(as.character(x)) - 1) :
  numerical expression has 3 elements: only the first used*
smci
  • 32,567
  • 20
  • 113
  • 146
dsnOwhiskey
  • 141
  • 1
  • 12

2 Answers2

2

Your function digitsum at the moment works fine for a single scalar input, for example,

digitsum(32)
# [1] 5

But, it can not take a vector input, otherwise ":" will complain. You need to vectorize this function, using Vectorize:

vec_digitsum <- Vectorize(digitsum)

Then it works for a vector input:

b = c(1, 11, 2)
vec_digitsum(b)
# [1] 1 2 2

Now you can use lapply without trouble.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
2

@Zheyuan Li 's answer solved your problem of using lapply. Though I'd like to add several points:

  • Vectorize is just a wrapper with mapply, which doesn't give you the performance of vectorization.

  • The function itself can be improved for much better readability:

see

digitsum <- function(x) sum(floor(x / 10^(0:(nchar(as.character(x)) - 1))) %% 10)
vec_digitsum <- Vectorize(digitsum)

sumdigits <- function(x){
  digits <- strsplit(as.character(x), "")[[1]]
  sum(as.numeric(digits))
}
vec_sumdigits <- Vectorize(sumdigits)

microbenchmark::microbenchmark(digitsum(12324255231323),
  sumdigits(12324255231323), times = 100)

Unit: microseconds
                      expr    min     lq     mean median     uq    max neval cld
  digitsum(12324255231323) 12.223 12.712 14.50613 13.201 13.690 96.801   100   a
 sumdigits(12324255231323) 13.689 14.667 15.32743 14.668 15.157 38.134   100   a

The performance of two versions are similar, but the 2nd one is much easier to understand.

Interestingly, the Vectorize wrapper add considerable overhead for single input:

microbenchmark::microbenchmark(vec_digitsum(12324255231323), 
  vec_sumdigits(12324255231323), times = 100)

Unit: microseconds
                          expr    min     lq     mean  median      uq      max neval cld
  vec_digitsum(12324255231323) 92.890 96.801 267.2665 100.223 108.045 16387.07   100   a
 vec_sumdigits(12324255231323) 94.357 98.757 106.2705 101.445 107.556   286.00   100   a

Another advantage of this function is that if you have really big numbers in string format, it will still work (with small modification of removing the as.character). While the first version function will have problem with big numbers or may introduce errors.

Note: At first my benchmark was comparing the vectorized version of OP function and non-vectorized version of my function, that gave me the wrong impression of my function is much faster. Turned out that was caused by Vectorize overhead.

dracodoc
  • 2,603
  • 1
  • 23
  • 33