How to split a number into digits in R

Question

I have a data frame with a numerical ID variable which identify the Primary, Secondary and Ultimate Sampling Units from a multistage sampling scheme. I want to split the original ID variable into three new variables, identifying the different sampling units separately:

Example:

>df[1:2,]
ID Var        var1     var2      var3     var4         var5  
501901          9    SP.1          1        W         12.10    
501901          9    SP.1          2        W         17.68

What I want:

>df[1:2,]
ID1    ID2     ID3   var1   var2  var3     var4    var5  
5      01      901    9    SP.1    1        W     12.10    
5      01      901    9    SP.1    2        W     17.68

I know there is some functions available in R to split character strings, but I could not find same facilities for numbers.

Thank you,

Juan

why don't you try convert your id to string with `as.character()` then to use `strsplit()` and then back to numbers with `as.numeric()` ? — user974514, Mar 19 '13 at 11:35
Yes, It could be that way, but I was looking for a function which could be directly used on numeric. Thank you anyway. — jrs-x, Mar 19 '13 at 14:20

score 21 · Answer 1 · answered Mar 19 '13 at 11:38

21

You could use for example use substring:

df <- data.frame(ID = c(501901, 501902))

splitted <- t(sapply(df$ID, function(x) substring(x, first=c(1,2,4), last=c(1,3,6))))
cbind(df, splitted)
#      ID 1  2   3
#1 501901 5 01 901
#2 501902 5 01 902

answered Mar 19 '13 at 11:38

EDi

13,160
2
48
57

Exactly what I need, and just in one line. Than you very much. – jrs-x Mar 19 '13 at 14:18

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2013-03-19T12:16:25.713

13

Yet another alternative is to re-read the first column using read.fwf and specify the widths:

cbind(read.fwf(file = textConnection(as.character(df[, 1])), 
               widths = c(1, 2, 3), colClasses = "character", 
               col.names = c("ID1", "ID2", "ID3")), 
      df[-1])
#   ID1 ID2 ID3 var1 var2 var3 var4  var5
# 1   5  01 901    9 SP.1    1    W 12.10
# 2   5  01 901    9 SP.1    2    W 17.68

One advantage here is being able to set the resulting column names in a convenient manner, and ensure that the columns are characters, thus retaining any leading zeroes that might be present.

edited Mar 19 '13 at 12:16

answered Mar 19 '13 at 12:04

A5C1D2H2I1M1N2O1R2T1

190,393
28
405
485

This is even faster than the earlier approach. Thank you!! – jrs-x Mar 19 '13 at 14:22

score 5 · Answer 3 · answered Mar 19 '13 at 11:39

5

This should work:

df <- cbind(do.call(rbind, strsplit(gsub('(.)(..)(...)', '\\1 \\2 \\3', paste(df[,1])),' ')), df[,-1]) # You need that paste() there because gsub() works only with text.

Or with substr()

df <- cbind(ID1=substr(df[, 1],1,1), ID2=substr(df[, 1],2,3), ID3=substr(df[, 1],4,6), df[, -1])

answered Mar 19 '13 at 11:39

Rcoster

3,170
2
16
35

Another smart approach. Than you! – jrs-x Mar 19 '13 at 14:25

score 4 · Answer 4 · answered Mar 19 '13 at 11:47

Since they are numbers, you will have to do some math to extract the digits you want. A number represented in radix-10 can be written as:

d0*10^0 + d1*10^1 + d2*10^2 ... etc. where d0..dn are the digits of the number.

Thus, to extract the most significant digit from a 6-digit number which is mathematically represented as:

number = d5*10^5 + d4*10^4 + d3*10^3 + d2*10^2 + d1*10^1 + d0*10^0

As you can see, dividing this number by 10^5 will get you:

number / 10^5 = d5*10^0 + d4*10^(-1) + d3*10^(-2) + d2*10^(-3) + d1*10^(-4) + d0*10^(-5)

Voila! Now you have extracted the most significant digit if you interpret the result as an integer, because all the other digits now have a weight less than 0 and thus are smaller than 1. You can do similar things for extracting the other digits. For digits in least significant position you can do modulo operation instead of division.

Examples:

501901 / 10^5 = 5 // first digit
501901 % 10^5 = 1 // last digit
(501901 / 10^4) % 10^1 = 0 // second digit
(501901 / 10^2) % 10^2 = 19 // third and fourth digit

In R, you can use integer division: `%/%` and the modulo operator is `%%` — hadley, Mar 19 '13 at 12:54

score 4 · Answer 5 · answered Mar 23 '15 at 12:16

Several neat answers have been made years ago, but a solution I find useful, using the outer function, has not been mentioned. In this age of search engines, I put it here in case others could find it handy.

I was faced with a slightly simpler problem: turning a column of 6-digit numbers into 6 columns representing each digit. This can be solved using a combination of outer, integer division (%/%) and modulo (%%).

 DF <- data.frame("ID" = runif(3)*10^6, "a" = sample(letters, 3,T))
 DF <- cbind(DF, "ID" = outer(DF$ID, 10^c(5:0), function(a, b) a %/% b %% 10))
 DF
#       ID a ID.1 ID.2 ID.3 ID.4 ID.5 ID.6
# 1 814895 z    8    1    4    8    9    5
# 2 417209 q    4    1    7    2    0    9
# 3 545797 c    5    4    5    7    9    7

The question asked here is slightly more complex, requiring different values for both integer division and modulo.

 DF <- data.frame("ID" = runif(3)*10^6, "a" = sample(letters, 3,T))
 DF <- cbind(DF, "ID" = outer(DF$ID, c(1:3), function(a,b) a %/% 10^c(5,3,0)[b] %% 10^b))
 DF
#      ID a ID.1 ID.2 ID.3
# 1 809372 q    8    9  372
# 2 954790 g    9   54  789
# 3 166970 l    1   66  969

score 3 · Answer 6 · answered Mar 19 '13 at 11:47

3

If you don't want to convert to character for some reason, following is one of the way to achieve what you want

DF <- data.frame(ID = c(501901, 501902), var1 = c("a", "b"), var2 = c("c", "d"))

result <- t(sapply(DF$ID, function(y) {
    c(y%/%1e+05, (y - y%/%1e+05 * 1e+05)%/%1000, y - y%/%1000 * 1000)
}))


DF <- cbind(result, DF[, -1])

names(DF)[1:3] <- c("ID1", "ID2", "ID3")

DF
##   ID1 ID2 ID3 var1 var2
## 1   5   1 901    a    c
## 2   5   1 902    b    d

answered Mar 19 '13 at 11:47

CHP

16,981
4
38
57

ok, Got it. Today I have to say that I learnt all what I need in the present and the next future on how to split a number!! ;-) – jrs-x Mar 19 '13 at 14:52

Tyler Rinker · Answer 7 · 2013-03-19T14:20:51.560

2

With so many answers it felt like I needed to come up with something :)

library(qdap)
x <- colSplit(dat$ID_Var, col.sep="")
data.frame(ID1=x[, 1], ID2=paste2(x[, 2:3], sep=""), 
    ID3=paste2(x[, 4:6],sep=""), dat[, -1])

##   ID1 ID2 ID3 var1 var2 var3 var4  var5
## 1   5  01 901    9 SP.1    1    W 12.10
## 2   5  01 901    9 SP.1    2    W 17.68

edited Mar 19 '13 at 14:20

answered Mar 19 '13 at 12:30

Tyler Rinker

108,132
65
322
519

Very interesting function. Thank you! – jrs-x Mar 19 '13 at 14:50

How to split a number into digits in R

7 Answers7

Linked