How to censor ONLY swear words with gsub

Question

I've got a text corpus containing some swear words and I tried to censor them, but upon further inspection I realised that the regular expression I used doesn't quite fit yet and also proper words get censored due to that.

x <- c("ass", "badass", "class")
gsub("ass\\b", "a*s", x)

this will return the first two words censored properly, and "cla*s", but obviously I want to keep "class". What do I need to add to the regex in order to change that? I tried "\w\." but it didn't work.

score 1 · Accepted Answer · answered Jan 22 '19 at 10:27

1

You can make a list with bad words, i.e.

bad.words <- c('ass', 'badass', 'dumbass')
c(x[!x %in% bad.words], gsub("ass\\b", "a*s", x[x %in% bad.words]))
#[1] "class"  "a*s"    "bada*s"

answered Jan 22 '19 at 10:27

Sotos

51,121
6
32
66

thanks for the input but that's not really what I'm looking for, I just need a regular expression that is limiting the few letters that I need to be looked at and changed, like the "\\b" at the end, isn't there something that can be added to exclude the letters before the ones that are supposed to be converted? – ZaLa Jan 22 '19 at 10:29
How will it tell between badass and class? It is not possible. What if new words appear? Like asshole Vs associate ? – Sotos Jan 22 '19 at 10:31
1

okay, fair enough, I didn't think about that, so I guess I'll try the list thing, thanks! – ZaLa Jan 22 '19 at 10:34

score 0 · Answer 2 · answered Jan 22 '19 at 11:25

0

Seems your list above is just limited to a*s? If not:

GitHub List of 'Bad words'

One can pull from this list to subset, then replace the 2nd character with * in another column.

answered Jan 22 '19 at 11:25

blacktj

173
1
16

How to censor ONLY swear words with gsub

2 Answers2