2

I want to abbreviate each word in an object that is longer than 5 characters and replace the removed characters with a "."

i.e.

x <- "this example sentence I have given here"

would become

"this examp. sente. i have given here"

I imagine this would have to be done with a loop and may also need splitting into separate strings, but I'm very new with R and really struggling to get it to do this. Any help would be greatly appreciated!

Many thanks!

Community
  • 1
  • 1
EcoEvoJen
  • 37
  • 2
  • 3
    `gsub("(?<=\\w{5})\\w+", ".", x, perl=TRUE)` – user20650 Nov 20 '19 at 22:35
  • 1
    @user20650, really nice ! – dc37 Nov 20 '19 at 22:45
  • 1
    @user20650 Nice! You should add that as an answer. – eipi10 Nov 20 '19 at 22:47
  • 1
    Thanks dc37 / eipi10. eipi, It works for this simple example but I'm not that confident about regex or possible edge cases so I'll let it loiter here (or feel free to add to your answer if you're confident) – user20650 Nov 20 '19 at 22:56
  • @user20650, that's fantastic, thank you! It seems to work perfectly. If you don't mind, it would be amazing if you could please also explain what's going on here. I'm familiar with the gsub function, but some of the arguments here are new to me and I'd love to better understand for future use. – EcoEvoJen Nov 20 '19 at 23:22
  • @EcoEvoJen; https://www.regular-expressions.info/lookaround.html – user20650 Nov 20 '19 at 23:33

2 Answers2

3

My answer is below, but consider using @user20650's answer instead. It is much more concise and elegant (though perhaps inscrutable if you're not familiar with Regular Expressions). As per @user20650's second comment, check to make sure that it's robust enough to work on your actual data.

Here's a tidyverse option:

library(tidyverse)

vec = c("this example sentence I have given here",
      "and here is another long example")

vec.abbrev = vec %>% 
  map_chr(~ str_split(.x, pattern=" ", simplify=TRUE) %>% 
            gsub("(.{5}).*", "\\1.", .) %>% 
            paste(., collapse=" "))
vec.abbrev
[1] "this examp. sente. I have given. here"
[2] "and here is anoth. long examp."

In the code above, we use map_chr to iterate over each sentence in vec. The pipe (%>%) passes the result of each function on to the next function.

The period character is potentially confusing, because it has more than one meaning, depending on context."(.{5}).*" is a Regular Expression in which . means "match any character". In "\\1." the . is actually a period. The final . in gsub("(.{5}).*", "\\1.", .) and the first . in paste(., collapse=" ") is a "pronoun" that represents the output of the previous function that we're passing into the current function.

Here is the process one step at a time:

# Split each string into component words and return as a list
vec.abbrev = str_split(vec, pattern=" ", simplify=FALSE)

# For each sentence, remove all letters after the fifth letter in 
#  a word and replace with a period
vec.abbrev = map(vec.abbrev, ~ gsub("(.{5}).*", "\\1.", .x)) 

# For each sentence, paste the component words back together again, 
#  each separated by a space, and return the result as a vector, 
#  rather than a list
vec.abbrev = map_chr(vec.abbrev, ~paste(.x, collapse=" "))
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Base R version would be: `sapply(strsplit(x, " "), function(x) paste0(sub("(.{5}).*", "\\1.", x), collapse = " "))` – Ronak Shah Nov 21 '19 at 01:37
1

Using a for loop, you can do:

x <- "this example sentence I have given here"

x2 <- unlist(strsplit(x," "))

x3 <- NULL
for(w in x2)
{
  if(nchar(w) > 5) {
    w <- paste0(substr(w,1,5),".")
  }
  else{}
  x3 <- c(x3,w)
}
x_final <- paste(x3,collapse = " ")

And the final output:

> x_final
[1] "this examp. sente. I have given here"
dc37
  • 15,840
  • 4
  • 15
  • 32
  • I had something similar: `sapply(strsplit(x, " "), function(x) { inds <- nchar(x) > 5; x[inds] <- paste0(substr(x[inds], 1, 5), "."); paste0(x, collapse = " ") })` – Ronak Shah Nov 21 '19 at 01:37