API: Build a word profanity filter to identify abusive language

Asked Oct 12 '17 at 19:02

Active Oct 12 '17 at 19:02

Viewed 738 times

We want to create an API that takes an input and identifies if it's an abusive word or not. I read about word profanity filter but can't get a satisfactory solution to check this. There are a couple of challenges like:

The word "SUCK" which is considered as abusive can be written as SUUCK, SUCCK, SU C K or in many other ways. The words could be separated by any special character OR wrong spelling might be used but with the use of similar sounding word
Multi-lingual: Abusive words could be written in any language.

How can we identify this? I read Comparing strings with tolerance to get an idea where strings can be compared based on their similarity.

But this is something that many organizations must be worried about esp. chat etc. and there should be some way to identify such language. Can I get any reference for this? And how can we block similar sounding words OR where just 1 or characters are missing but they are very similar to any abusive word.

asked Oct 12 '17 at 19:02

Sahil Sharma

3,847
6
48
98

1

Relevant: http://en.wikipedia.org/wiki/Scunthorpe_problem – Robert Lozyniak Oct 14 '17 at 11:14
1

Possible duplicate of [How do you implement a good profanity filter?](https://stackoverflow.com/questions/273516/how-do-you-implement-a-good-profanity-filter) – Thomas Oct 19 '17 at 06:44

API: Build a word profanity filter to identify abusive language

0 Answers0