2

I am trying to find all the palindromic words in a column in a data frame 'data' which looks like:

name year amount 
 James 2010  934706
 Aza   2010  21042
 Rory  2010   869691
 Suzanne 2010 651674
 Felicity 2010 386115
 Oliver   2010  382388
 Anna     2010   43211

I have tried:

palindrome <- function(word) {
rawWord <- charToRaw(tolower(word)) ## converts to lower case
 sprintf("%s is %sa palindrome", word,
    c("not ", "")[identical(rawWord, rev(rawWord)) + 1])
     }
palindrome(data)

But this returns a list of "mary is not a palindrome" "anna is not a palindrome" ... etc I want to be able to subset only the words that are palindromic and then return them to the data frame in order to correlate them to the other columns to find when they occured and how many times.

zx8754
  • 52,746
  • 12
  • 114
  • 209
beck8
  • 35
  • 4

2 Answers2

3

You can do the following steps.

rawdata <- sapply(tolower(data$name), charToRaw)

# Array of booleans. TRUE if palindromic. FALSE if not
ispalindrom <- unlist(lapply(rawdata, function(x) identical(x, rev(x))))

# Palindromic words
data[ispalindrom,]

# Non palindromic words
data[! ispalindrom,]
Pop
  • 12,135
  • 5
  • 55
  • 68
  • This returns all of the non palindromic words but for some reason it returns <0 rows> (or 0-length row.names) when I use "data[ispalindrom,]" even though I know there are palindromic words. – beck8 Oct 13 '14 at 12:53
  • Could do give an example of the data you are working on? – Pop Oct 13 '14 at 12:58
  • I have included the first few rows of what the data frame looks like-I'm not sure if that helps? It was imported from a .txt file – beck8 Oct 13 '14 at 13:09
  • See my edited answer which should work with your data – Pop Oct 13 '14 at 13:19
2

I wondered about efficiency, so I wrote the same algorithm but using characters:

palchar <-function(nfoo) {
spfoo<-list()
ispalindrom<-vector()
rawdata <- sapply( 1:length(nfoo), function(j) strsplit(tolower(nfoo[j]), '') )
    ispalindrom <-unlist(sapply(1:length(nfoo),function(j) identical(rawdata[[j]],rev(rawdata[[j]]))) )
    return(ispalindrom)
}

The relative performances are:

 nfoo<-rep(nfoo,10)
 microbench0mark(palbyte(nfoo),palchar(nfoo))
Unit: milliseconds
          expr      min       lq   median       uq      max
 palbyte(nfoo) 7.154999 7.435734 7.538363 7.648477 124.8712
 palchar(nfoo) 9.224697 9.531945 9.713685 9.850097 127.2356
 neval
   100
   100

(Yes, I get the same actual answer from both algorithms)

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73