0

I want to break XOR repeated key, I dont now anything about the key nor the message, only thing I know that it is using repeated key. Encoded message s beenbase64'd after being encrypted with repeating-key XOR so I converted base 64 to base16 first so it is easier. I have instructions but I don't understand this very good.

  1. Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40. Write a function to compute the edit distance/Hamming distance between two strings.

  2. For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.

  3. The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.

Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length, etc, I got this and the rest fine, for now, I should now exactly when I found out if this is good and try to decode..

I wrote a code for this in Python, it is working, but I am not completely sure if I have done this correctly

    def compute_distance(str1,str2,keysize):
      count=0
      str1=str1.replace("\n", "")
      str2=str2.replace("\n", "")
      keysize=str(keysize*8)
      sbin1=format(int(str1,16),'0'+keysize+'b')
      sbin2=format(int(str2,16),'0'+keysize+'b')

      for c1,c2 in zip(sbin1, sbin2):
        if  c1!=c2:
          count+=1

      return count


    def keysize_dist(filelocation):
      f=open(filelocation,'r')
      lines=[]
      for line in f.readlines():
        line=line.strip('\n')
        lines.append(line)
      lines=''.join(lines).strip('\n')
      normalized=[]
      for keysize in range(2,40):
        count=compute_distance(lines[0:keysize*2],lines[keysize*2:keysize*4],keysize)

        normalized.append(float(count)/keysize)


      return lines,int(min(normalized))
philippe lhardy
  • 3,096
  • 29
  • 36
Nikola Lošić
  • 479
  • 1
  • 6
  • 18
  • If it's working why do you think you might not have done it correctly? –  Nov 02 '14 at 08:41
  • Because minimum normalized distance is for keysize 5 and I didn't succeed in decoding wih keysize 5.. – Nikola Lošić Nov 02 '14 at 08:42
  • 1
    So then it's not working... –  Nov 02 '14 at 08:43
  • 1
    @MikeW perhaps because in crypto we should take care to never do things ourselves but rely on existing knowledge or libraries. – philippe lhardy Nov 02 '14 at 08:43
  • 1
    I wouldn't ask here and do it by myself if I found a good library to do this for me.. What library would you suggest? – Nikola Lošić Nov 02 '14 at 08:46
  • 1
    @NikolaLošić it was more a general comment to explain why you submit a working code. It is good in security field to not be overconfident. – philippe lhardy Nov 02 '14 at 09:29
  • 1
    nt(str1,16) => by using always 2x8bits i fear you can't correctly compute any odd KEYSIZE. – philippe lhardy Nov 02 '14 at 12:56
  • 1
    and a file sample or a generator for your base16 file could be useful. – philippe lhardy Nov 02 '14 at 12:57
  • Here is all you need. Just take a note that in breaking function splitting is not done right because I was trying solving different part of blocks. But that part isn't important to you because the keysize is only problem, not splitting, I know how to do that..And function xor_single is function to brute_force,also has a few errors because of previously mentioned trying different blocks. Function I have wrote is tested and it worked perfectly with a 60 character lines. [Python code and material](https://drive.google.com/folderview?id=0B_lFRVMSVOmQNVBHMGZWM09EZ1U&usp=sharing) – Nikola Lošić Nov 02 '14 at 13:17
  • And for that int(str1,16), keysize=str(keysize*8) solves problem for odd keysize, if keysize is for example 3, for 3 bytes we need 3*8 binary digits, and that is always even number :) Edit: If you want to see complete problem, go to [Cryptopals](http://cryptopals.com/sets/1/challenges/6/) – Nikola Lošić Nov 02 '14 at 13:40

1 Answers1

1

This is way i understood from your post. I did a python program that generate the ciphered xor stream with cycling key and that try to apply hamming string distance normalized method to find the best potential cycling keysize. I don't convert things into base64 and i apply directly string distance not binary distance.

#!/usr/bin/python

import sys
from itertools import cycle

def xor_file_with_cycling_strkey(filelocation,outfile,key):
  print filelocation
  f=open(filelocation,'r')
  f2=open(outfile,'w')
  lines=[]
  text=f.read()
  if text != '':
    for c,k in zip(text,cycle(key)):
      r=chr(ord(c)^ord(k))
      f2.write(r)
  f2.close()
  f.close()

# not used here, see compute_distance_char based on same idea.
def compute_distance(str1,str2,keysize):
  count=0
  print '%s %s' % (str1,str2)
  str1=str1.replace("\n", "")
  str2=str2.replace("\n", "")
  keysize=str(keysize*8)
  sbin1=format(int(str1,16),'0'+keysize+'b')
  sbin2=format(int(str2,16),'0'+keysize+'b')
  return hamming_distance_str(sbin1,sbin2)

#do preferer hamming_distance_bin which quicker.
def compute_distance_char(str1,str2,keysize):
  count=0
  str1=str1.replace("\n", "")
  str2=str2.replace("\n", "")
  keysize=str(keysize*8)
  sbin1=''
  sbin2=''
  for c in str1:
    sbin1=sbin1 + format(ord(c),'0'+keysize+'b')
  for c in str2:
    sbin2=sbin2 + format(ord(c),'0'+keysize+'b')
  return hamming_distance_str(sbin1,sbin2)

def hamming_distance_str(str1,str2):
  count=0
  for c1,c2 in zip(str1, str2):
    if  c1!=c2:
      count+=1
  return count

def hamming_distance_bin(str1,str2):
  count=0
  for c1,c2 in zip(str1, str2):
    if  c1!=c2:
      # quick hamming distance, counting number of differing bits.
      s=ord(c1)^ord(c2)
      # count number of bits sets using Wegner algorithm
      while s !=0:
        s&=(s-1);
        count+=1
  return count

def keysize_dist(filelocation):
  potential_keysize=0
  min_dist=40.0
  f=open(filelocation,'r')
  lines=[]
  for line in f.readlines():
    line=line.strip('\n')
    lines.append(line)
  lines=''.join(lines).strip('\n')
  normalized=[]
  for keysize in range(2,40):
# should first create base16 entries for that one , then don't use it : count_bin1=compute_distance(lines[0:keysize*2],lines[keysize*2:keysize*4],keysize)
    # proof that both functions compute same value :
    count_bin1=compute_distance_char(lines[0:keysize*2],lines[keysize*2:keysize*4],keysize)
    count_bin2=hamming_distance_bin(lines[0:keysize*2],lines[keysize*2:keysize*4])
    if ( count_bin1 != count_bin2 ):
      print 'Discrepency between compute_distance_char->%i and hamming_distance_bin->%i' % (count_bin1,count_bin2)
    count=hamming_distance_str(lines[0:keysize*2],lines[keysize*2:keysize*4])

    normalized_distance=float(count)/keysize
    print '%s %f' % (keysize,normalized_distance)
    if ( normalized_distance < min_dist ):
      potential_keysize=keysize
      min_dist=normalized_distance
#  we are more interested in keysize corresponding to minimal distance, tha n to minimal distance itself.
  return potential_keysize,min_dist

def main(args=sys.argv):
 if ( len(args) < 2 ):
   print 'Please enter cleartext origin file to be ciphered then checked an optionaly a key string ( max length 40 )'
   return 1
 if ( len(args) > 2):
   key=args[2]
 else:
   # on purpose default to key with a KEYSIZE char length 5.
   key='12345'
 xor_file_with_cycling_strkey(args[1],args[1]+'.ciphered',key)
 xor_file_with_cycling_strkey(args[1]+'.ciphered',args[1] + '.cleartext',key)

 # raw non base64 encoded.
 print keysize_dist(args[1] + '.ciphered')

if __name__ == "__main__":
    main()

With that code your can get all inputs needed to fully resolve your problem.

./hamming_detect_xor_cycle.py cleartext 123456789ABCDE ... (14, 1.7857142857142858)

It does not detect correctly all size, but i think this is a statistical effect and depends on cleartext that itself can have cycling properties. as your subject tells : using with more blocks can give a better result.

philippe lhardy
  • 3,096
  • 29
  • 36
  • I am not sure why xor_file_with_cycling_strkey returns t, it is just an empty string, it has not been changed so why to return it? And can string distance really be used when I don't have clear strings, I have encoded strings into hexadecimal characters or maybe it is the same as I have clear strings? – Nikola Lošić Nov 02 '14 at 11:34
  • 1
    decode your base64 to have ciphertext in ciphered char stream, then apply str char distzance on it. I tested it with binary distance and it seem worse than with str/char. And i added here a quicker implementation for binary than using format() and hamming over the resulting string. – philippe lhardy Nov 02 '14 at 12:37
  • I really appreciate that :) One more question about xor_file_with_cycling_strkey. If I am right, we don't actually need that function if we don't know the key? In other words, if I have understood your program, it first encodes string, then decodes it with the key given and then finally computes distance to check if it works correctly. Am I right? – Nikola Lošić Nov 02 '14 at 13:02
  • yes you are right, this program contains its own input generation. and you can check taht it correctly crypt stream in looking in decrypted .clearstream extension file. nobody is above bugs ( and certainly not me ) so checking and test are always needed. You can tweak it do adapt it to your final needs. – philippe lhardy Nov 02 '14 at 15:55
  • given your last comments it is in fact the binary hamming computation you have to use. – philippe lhardy Nov 02 '14 at 16:50
  • So my code for calculating hamming distance was good from the beginning? – Nikola Lošić Nov 02 '14 at 20:24
  • 1
    somehow perhaps, but how did you used it in your code to obtain a keysize of 29 ... BTW i broke ex 6 too but wiht a very more complex algorithm... "Play #hat f)nky 1usic Com5won" – philippe lhardy Nov 02 '14 at 23:08
  • If I remember correctly that is the number someone who solved it told me, but I am not 100% sure.. Are you saying "Play #hat f)nky 1usic Com5won" is the key ? – Nikola Lošić Nov 02 '14 at 23:48
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/64152/discussion-between-philippe-lhardy-and-nikola-losic). – philippe lhardy Nov 03 '14 at 07:15