5

I found this code:

string = c("G1:E001", "G2:E002", "G3:E003")
> sapply(strsplit(string, ":"), "[", 2)
[1] "E001" "E002" "E003"

clearly strsplit(string, ":") returns a vectors of size 3 where each component i is a vector of size 2 containing Gi and E00i.

But why the two more arguments "[", 2 have the effect to select only those E00i? As far as I see the only arguments accepted by the function are:

sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) 
Leonardo
  • 337
  • 2
  • 5
  • 12

3 Answers3

6

You could use sub to get the expected output instead of using strsplit/sapply

 sub('.*:', '', string)
 #[1] "E001" "E002" "E003"

Regarding your code, strsplit output is a list and list can be processed with apply family functions sapply/lapply/vapply/rapply etc. In this case, each list element have a length of 2 and we are selecting the second element.

strsplit(string, ":")
#[[1]]
#[1] "G1"   "E001"

#[[2]]
#[1] "G2"   "E002"

#[[3]]
#[1] "G3"   "E003"

lapply(strsplit(string, ":"), `[`, 2)
#[[1]]
#[1] "E001"

#[[2]]
#[1] "E002"

#[[3]]
#[1] "E003"

In the case of sapply, the default option is simplify=TRUE

 sapply(strsplit(string, ":"), `[`, 2, simplify=FALSE)
#[[1]]
#[1] "E001"

#[[2]]
#[1] "E002"

#[[3]]
#[1] "E003"

The [ can be replaced by anonymous function call

sapply(strsplit(string, ":"), function(x) x[2], simplify=FALSE)
#[[1]]
#[1] "E001"

#[[2]]
#[1] "E002"

#[[3]]
#[1] "E003"
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Look at the docs for ?sapply:

 sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

 FUN: the function to be applied to each element of ‘X’: see
      ‘Details’.  In the case of functions like ‘+’, ‘%*%’, the
      function name must be backquoted or quoted.

 ...: optional arguments to ‘FUN’.

There-in lies your answer. In your case, FUN is [. The "optional arguments to fun" is "2" in your case since it gets matched to ... in your call. So in this case, sapply is calling [ with the values in the list as the first argument, and 2 as the second. Consider:

x <- c("G1", "E001")   # this is the result of `strsplit` on the first value

Then:

`[`(x, 2)      # equivalent to x[2]
# [1] "E001"

This is what sapply is doing in your example, except it is applying to every 2 length character vector returned by strsplit.

BrodieG
  • 51,669
  • 9
  • 93
  • 146
2

Because the output of strsplit() is a list. The "[" addresses the elements of the list, and the 2 indicates that the second item of a member of the list is selected. The sapply() function ensures that this is done for each member of the list. Here [ is the function in sapply(), which is applied to the list of strsplit()and called with the additional parameter 2.

> strsplit(string, ":")
#[[1]]
#[1] "G1"   "E001"
#
#[[2]]
#[1] "G2"   "E002"
#
#[[3]]
#[1] "G3"   "E003"
#
> str(strsplit(string, ":"))
#List of 3
# $ : chr [1:2] "G1" "E001"
# $ : chr [1:2] "G2" "E002"
# $ : chr [1:2] "G3" "E003"
RHertel
  • 23,412
  • 5
  • 38
  • 64