I'm working with a list of data frames. In each data frame, I would like to pad a single ID variable with leading zeros. The ID variables are character vectors and are always the first variable in the data frame. In each data frame, however, the ID variable has a different length. For example:
df1_id ranges from 1:20, thus I need to pad with up to one zero, df2_id ranges from 1:100, thus I need to pad with up to two zeros, etc.
My question is, how can I pad each data frame without having to write a single line of code for each data frame in the list.
As mentioned above, I can solve this problem by using the str_pad function on each data frame separately. For example, see the code below:
#Load stringr package
library(stringr)
#Create sample data frames
df1 <- data.frame("x" = as.character(1:20), "y" = rnorm(20, 10, 1),
stringsAsFactors = FALSE)
df2 <- data.frame("v" = as.character(1:100), "y" = rnorm(100, 10, 1),
stringsAsFactors = FALSE)
df3 <- data.frame("z" = as.character(1:1000), "y" = rnorm(1000, 10, 1),
stringsAsFactors = FALSE)
#Combine data fames into list
dfl <- list(df1, df2, df3)
#Pad ID variables with leading zeros
dfl[[1]]$x <- str_pad(dfl[[1]]$x, width = 2, pad = "0")
dfl[[2]]$v <- str_pad(dfl[[2]]$v, width = 3, pad = "0")
dfl[[3]]$z <- str_pad(dfl[[3]]$z, width = 4, pad = "0")
While this solution works relatively well for a short list, as the number of data frames increases, it becomes a bit unwieldy.
I would love if there was a way that I could embed some sort of "sequence" vector into the width argument of the str_pad function. Something like this:
dfl <- lapply(dfl, function(x) {x[,1] <- str_pad(x[,1], width = SEQ, pad =
"0")})
where SEQ is a vector of variable lengths. Using the above example it would look something like:
seq <- c(2,3,4)
Thanks in advance, and please let me know if you have any questions.
~kj