2

How can I alter a string so that variations of approximate string matching can't match it with the original?

I made an IRCbot which runs a game based on the logfile of the channel. It prints quotes from the logs and players collect points by guessing "who said it". The channel is rather geeky and it took no more than 30 minutes for one of the players to build a bot which wins the game every time. I realize manual cheating is also easy and impossible to defend against, but consider this a competition between automated bots. I want to update my bot so that any fully automated bot will not be able to play the game :)

I've considered randomly deleting a character from the quotes, but agrep would still be able to match the string. I've considered replacing some of the characters by similar-looking alternate characters, but that would be trivial to reverse-engineer. I'm looking for ideas that will be harder to break.

Example line:

[14:15] <baobot> [QUOTE 13/15] Who famously declared "minulla ainakin paperin tekemisessä 1% ajasta menee algon suunnitteluun ja 99% menee paperin kirjoittamiseen"?
Atte Juvonen
  • 4,922
  • 7
  • 46
  • 89
  • As long as the output is plain text, it's highly unlikely that you can come up with a method that is both understandable to humans and not easily broken by an approximate string matching algorithm. – Jim Mischel Dec 22 '15 at 17:02
  • Google translate the quote to a random language and back to the original. – user58697 Dec 22 '15 at 19:27

3 Answers3

1

Print your quote as ascii-art.

Use something similar to the command-line-tools figlet or toilet (explaination).

Here is a quick example: like string2ascii-generator.

To get you started, you might want to copy the sourcecode from figlet.

wotanii
  • 2,470
  • 20
  • 38
  • It's a nice idea, but it's not practical enough. Ascii art wouldn't display properly on systems with different fonts and it would be hard to fit more than a few words without using a huge amount of lines. – Atte Juvonen Dec 22 '15 at 15:00
  • I suspect making it ASCII art would be just a minor road block. The bot writer would just have to examine the output and then code his bot to do the decoding. – Jim Mischel Dec 22 '15 at 17:03
1

Anything that can be used to scramble can most likely be unscrambled. Below are some suggestions though for your experiment:

  • Humans can read words if the first and last letter are in place and the inner portion is scrambled.

  • You can also do substitution, such as elite speak to replace some characters with numbers.

  • You might be able to find other characters in other languages that also look familiar to letters that are used, which means you can randomly substitute them as well.

  • You can also try to randomize the positions of the spaces. So remove them from the original position then move them around, or remove them completely.

  • Reverse some words.

  • Find ways to phoneticized words... in english "ph" sounds like "f" so you can find and replace some of them.

  • Try a combination of different things above, remove all spaces, CaMEl CaSE words, then do character substitutions, etc.

Overall, there are lots of ways to help make it harder, however if you follow a set pattern every time, then it'll be easier to program something to undo it. If you randomly do different things, so one input can yield several different outputs, then it'll be harder for someone to write a program to reverse the process.

James Oravec
  • 19,579
  • 27
  • 94
  • 160
0

Use Google translate.

For example, I ran your quote to Russian, then to English, and back to Finnish, and got

Minulla on ainakin 1% ajasta kirjassa otetaan suunnittelussa Algon ja 99% menee kirjoituspaperia

I have no idea if it is a correct Finnish; as far as I can tell it is still somewhat recognizable. If you think it is too recognizable for an approximate search, do more intermediate translations.

user58697
  • 7,808
  • 1
  • 14
  • 28
  • That's thinking outside the box! :) Unfortunately, the result is often garbled too much. Also, part of the fun is reckognizing a person from the style that they write in. – Atte Juvonen Dec 22 '15 at 20:40