You can use gsub
and backreference to cut the string to the desired length before you count how many substrings are > 0:
DATA:
df1 <- data.frame(a = "0,1,17,200,6,0,1")
df1$a <- as.character(df1$a)
SOLUTION:
First cut the string to whatever number of substrings you want--here, I'm cutting it to three numeric characters (the first two of which are followed by a comma)--and store the result in a new vector:
df1$a_3 <- gsub("^(\\d+,\\d+,\\d+)(.*)", "\\1", df1$a)
df1$a_3
[1] "0,1,17"
Now insert the new vector into your sapply
statement to count how many substrings are greater than 0:
sapply(strsplit(df1$a_3, ","), function(x) length(which(x>0)))
[1] 2
To vary the number of substrings, vary the number of repetitions of \\d+
in the pattern accordingly. For example, this works for 6 substrings:
df1$a_6 <- gsub("^(\\d+,\\d+,\\d+,\\d+,\\d+,\\d+)(.*)", "\\1", df1$a)
sapply(strsplit(df1$a_6, ","), function(x) length(which(x>0)))
[1] 4
EDIT TO ACCOUNT FOR NEW SET OF QUESTIONS:
To compute the maximum value of substrings > 0, exemplified here for df1$a
, the string as a whole (for the restricted strings, just use the relevant vector accordingly, e.g., df1$a_3
, df1$a_6
etc.):
First split the string using strsplit
, then unlist the resulting list using unlist
, and finally convert the resulting vector from character to numeric, storing the result in a vector, e.g., string_a
:
string_a <- as.numeric(unlist(strsplit(df1$a, ",")))
string_a
[1] 0 1 17 200 6 0 1
On that vector you can perform all sorts of functions, including max
for the maximum value, and sum
for the sum of the values:
max(string_a)
[1] 200
sum(string_a)
[1] 225
Re the number of values that are equal to 0, adjust your sapply
statement by setting x == 0
:
sapply(strsplit(df1$a, ","), function(x) length(which(x == 0)))
[1] 2
Hope this helps!