0

I have figured out how to create a new column on my data frame that = TRUE if the character string in "Column 5" is contained within the longer string in "Column 6" - can I do this by referring to the names of my columns rather than using [r,c] locational references?

rows = NULL

for(i in 1:length(excptn1[,1]))
{
    rows[i] <- grepl(excptn1[i,5],excptn1[i,6], perl=TRUE)
}

As a programmer I'm nervous about referring to things as "Column 5 and Column 6"...I want to refer to the names of the variables captured in those columns so that I'm not reliant on my source file always having the columns in the identical order. Furthermore I might forget about that locational reference and add something earlier in the code that causes the locational reference to fail later...when you can think in terms of the names of the columns in general (rather than their particular ordering at a point in time) it's a lot easier to build robust production strength code.

I found a related question on this site and it uses the same kind of locational references I want to avoid...

How do I perform a function on each row of a data frame and have just one element of the output inserted as a new column in that row

While R does seem very flexible it seems to lack a lot of features that you'd want in scaleable, production strength code...but I'm hoping I'm wrong and can learn otherwise.

Thanks!

Community
  • 1
  • 1
user3381203
  • 21
  • 1
  • 3

2 Answers2

2

You could refer to the columns by name rather than by index in two ways:

rows[i] <- grepl(excptn1[i,"colname"],excptn1[i,"othercolname"], perl=TRUE)

or

rows[i] <- grepl(excptn1$colname[i],excptn1$othercolname[i], perl=TRUE)

Finally, note that most R programmers would do this as:

rows = sapply(1:nrow(excptn), grepl(excptn1$colname[i],excptn1$othercolname[i], perl=TRUE))

One thing this avoids is the overhead of increasing the size of the vector in each iteration.

David Robinson
  • 77,383
  • 16
  • 167
  • 187
0

If you want to do this faster, use stri_match_first_regex function from stringi package.

Example:

require(stringi)

ramka <- data.frame(foo=letters[1:3],bar=c("ala","ma","koteczka"))

> ramka
  foo      bar
1   a      ala
2   b       ma
3   c koteczka

> stri_match_first_regex(str=ramka$bar, pattern=ramka$foo)
     [,1]
[1,] "a" 
[2,] NA  
[3,] "c" 
bartektartanus
  • 15,284
  • 6
  • 74
  • 102