5

I need a script or regex (which I will be using with Javascript / jQuery to check form input on a website) to check if someone has entered words which are mostly gibberish.

Normal words or sentences should pass the test:

This is a normal sentence (pass)

Peterborough (pass)

Words like this should fail the test:

bfygrydyyisg (fail)

hjrrjmsjsinz (fail)

yqymuqawsioy (fail)

I'd thought of using a check of around 6 consonants or vowels in a row, but the last example above would still pass and I know some english words like 'rhythms' would fail (although that is very unlikely to be needed).

Any ideas? Thanks!

Shaun
  • 642
  • 5
  • 19
  • 6
    I guess the word "rhythms" is also valid gibberish then? – Mark Byers Apr 18 '12 at 14:03
  • Or indeed, several of the words on this list of acceptable scrabble words: http://www.tnellen.com/ted/scrabble/scrabble_words_others.html – Paddy Apr 18 '12 at 14:21
  • Wow. I'm in a bit of a shock here. Does the english language actually consider 'y' a consonant? (I'm swedish by the way, and we don't) – Per Salbark Apr 18 '12 at 14:26
  • 2
    Unindented code is gibberish, too... – ThiefMaster Apr 18 '12 at 15:12
  • @PerSalbark: Yes it is considered a consonant in English. – Dale Apr 18 '12 at 17:25
  • I very much doubt people are going to be typing 'rhythms' or any of those scrabble words on a business website. It's more to catch the gibberish spam that often gets sent in the contact forms. – Shaun Apr 19 '12 at 08:07
  • I've made some big changes to my question now to try and get an answer. – Shaun Apr 19 '12 at 10:26

3 Answers3

3

I run into this same problem just recently. Basically we needed to find if the form fields contained gibberish answers. We wanted to detect this fast (so as to not slow down form filling) and nudge the user to provide proper answers.

There are some newish front-end and backend libraries doing this detection.

  1. https://www.npmjs.com/package/gibberish-detector fairly self describing
  2. Aptly named https://www.npmjs.com/package/asdfjkl
  3. In Python there's also: https://github.com/rrenaud/Gibberish-Detector

Hope this helps others.

tayfun
  • 3,065
  • 1
  • 19
  • 23
1

Maybe you could use a spellchecker API like http://www.javascriptspellcheck.com/ or you could refer to John Resig's http://ejohn.org/blog/revised-javascript-dictionary-search/

walmik
  • 1,440
  • 2
  • 13
  • 30
1

May be this disscussion might give you some direction: Help on JS gibberish detection

Steve C
  • 2,638
  • 3
  • 25
  • 27