1

My problem is the following. I have a list of substitutions, including one substitution for each letter of the alphabet, but also some substitutions for groups of more than one letter. For example, in my cipher p becomes b, l becomes w, e becomes i, but le becomes by, and ple becomes memi.

So, while I can think of a few simple/naïve ways of implementing this cipher, it's not very efficient, and I was wondering what the most efficient way to do it would be. The answer doesn't have to be in any particular language, a general structured English algorithm would be fine, but if it must be in some language I'd prefer C++ or Java or similar.

EDIT: I don't need this cipher to be decipherable, an algorithm that mapped all single letters to the letter 'w' but mapped the string 'had' to the string 'jon' instead should be ok, too (then the string "Mary had a little lamb." would become "Wwww jon w wwwwww wwww.").

I'd like the algorithm to be fully general.

Pedro Carvalho
  • 565
  • 1
  • 6
  • 26
  • 1
    I assume you have ensured that the alphabet is unique and unambiguous? Do you have the whole alphabet somewhere, ie. all the substitution rules? – Lasse V. Karlsen Jun 20 '15 at 11:42
  • What do you mean by unique and unambiguous? I don't need this cipher to be decipherable, an algorithm that mapped all single letters to the letter w but mapped the string 'had' to the string 'jon' instead should be doable, too. I'd like the algorithm to be fully general. – Pedro Carvalho Jun 20 '15 at 13:14
  • Uhm, what? Why don't you need this cipher to be decipherable? That makes no sense to me. The word "cipher" is almost always used in conjunction with encryption, which has decryption as its counterpart, hence decipherable. Are you building a hash-like algorithm? – Lasse V. Karlsen Jun 20 '15 at 16:09
  • No, I'm building a roleplaying "language" in Second Life. The translation will be sent secretly to the intended target already, I just need it to sound cool to outsiders. – Pedro Carvalho Jun 21 '15 at 10:16
  • Ah, then I understand. I guess "kek" is the appropriate response here then :) – Lasse V. Karlsen Jun 21 '15 at 12:48

1 Answers1

0

One possible approach is to use deterministic automaton. The closest to your problem and commonly used example is Aho–Corasick string matching algorithm. The difference will be, instead of matching you would like to emit cypher at some transition. Generally at each transition you will emit or do not emit cypher. In your example

p -> b
l -> w
e -> i
le -> by
ple -> memi

The automaton (in Erlang like pseudocode)

start(p) -> p(next());
start(l) -> l(next());
start(e) -> e(next());
...

p(l) -> pl(next);
p(X) -> emit(b), start(X).

l(e) -> emit(by), start(next());
l(X) -> emit(w), start(X).

e(X) -> emit(i), start(X).

pl(e) -> emit(memi), start(next());
pl(X) -> emit(b), l(X).

If you are not familiar with Erlang, start(), p() are functions each for one state. Each line with -> is one transition and the actions follows the ->. emit() is function which emits cypher and next() is function returning next character. The X is variable for any other character.

Hynek -Pichi- Vychodil
  • 26,174
  • 5
  • 52
  • 73
  • Wouldn't that require me to hard-code the transition rules, though? This automaton looks very specific to that example I gave. I'd like something that has the substitution rules as input, rather, than part of the code. – Pedro Carvalho Jun 20 '15 at 14:08
  • There exists an algorithm which generates automaton for any cypher you have defined in a way you described in your question. Making this algorithm is the hard part of the solution but you can inspire by the algortihm used for generating Aho-Corasic. – Hynek -Pichi- Vychodil Jun 20 '15 at 14:14
  • And what's the time and space complexity of it, as a function of the number of transitions and size of the string to be ciphered? I already have an algorithm that does what I want, but it takes too long (about 0.25~0.5s for a two-line sentence). Also, I don't think the language I'm restricted to (https://en.wikipedia.org/wiki/Linden_Scripting_Language) is powerful enough to build an algorithm that generates this automaton based on the input efficiently. – Pedro Carvalho Jun 20 '15 at 14:18
  • If you would read linked Wikipedia article you would know _When the pattern dictionary is known in advance (e.g. a computer virus database), the construction of the automaton can be performed once off-line and the compiled automaton stored for later use. In this case, its run time is linear in the length of the input plus the number of matched entries._ which answer both of your questions. You can generate automaton in different language and generate LSL code. – Hynek -Pichi- Vychodil Jun 20 '15 at 14:28
  • I read that, and the point is that the pattern dictionary is not known in advance, it is part of the input, and I cannot use a different language as the input comes from within SL, too. The user gives the program an arbitrary pattern dictionary as input, and then gives it a string to cipher using that pattern. – Pedro Carvalho Jun 20 '15 at 14:43