-1

I am reading 64kb from a generated text file "generated.txt" and writing the data to txt files at the beginning of every written txt file there is strange data, except at the first written file.

The generated test file

Script to generate the file

If i use:

with open('generated.txt', 'rb') as f:

instead of:

f = open('generated.txt', 'rb')

I get the same strange data in the first file.

Beginning of the second 64k block of the orginal file in hex:

0a31303935300d0a31303935310d0a31

"Strange" Data form the second txt file in hex:

e7fadb0930588fb74d1aba3fd3bafc84

Beginning of the second file encryptet in hex:

bde07ad1e305193105655a42998a1fc9

Unfortunately not the same

Full Code below:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from Crypto.Cipher import AES
from Crypto.Protocol.KDF import PBKDF2
from Crypto import Random

key_size = 32
iterations = 10000
key = 'password'

salt = Random.new().read(key_size)
iv = Random.new().read(AES.block_size)
derived_key = PBKDF2(key, salt, key_size, iterations)
cipher = AES.new(derived_key, AES.MODE_CFB, iv)

i = 1
f = open('generated.txt', 'rb')


while True:

    data = f.read(65536)

    if not data:
        break

    encodedtext = iv + cipher.encrypt(data)
    decodedtext = str(cipher.decrypt(encodedtext))[16:]
    print 'Writing ' + str(i) 

    g = open('LOG_' + str(i) + '.txt', 'wb')
    g.write(decodedtext)
    g.close()

    d = open('LOG_' + str(i) + '_ENC.txt', 'wb')
    d.write(encodedtext)
    d.close()
    i = i+1

f.close()

Thanks for your help :)

D3473R
  • 281
  • 3
  • 4
  • The link to `txt` file is broken. If its content is not big, put it in the post body or use something like pastebin.org to avoid dependency of any account based storage service. – DontVoteMeDown Nov 21 '13 at 19:07
  • Are you using linux? If not, I'd be weary of opening the ascii file in 'b'inary mode. – gregb212 Nov 21 '13 at 19:22
  • No change if using "w" and "r" instead of "wb" and "rb". I want to encrypt other files later, like images or video files ;) – D3473R Nov 21 '13 at 19:26
  • Could you show the start of the encrypted text, including the "weird data"? Note that encryption results are *not text*. – Maarten Bodewes Nov 21 '13 at 19:30
  • OK, could you give the first 32 bytes of input and output in *hexadecimals*? Printing binary as a string is not going to help much. – Maarten Bodewes Nov 21 '13 at 22:02

1 Answers1

0

Please make sure that you understand about and .

Character encoding defines what text looks like if you represent it as bytes. There are many encoding standards, but the most common ones include ASCII, Windows-1252, UTF-8 and UTF-16. The first one is a limited single (7-bit) set, Windows-1252 contains mostly European/Latin characters and the latter two are encodings used for Unicode text. If a file is viewed in the wrong encoding then it may show like nonsense - but most text editors can guess the text encoding pretty well nowadays.

Note that not all - and sometimes even many - bytes can be interpreted as text in most encodings. Bytes may not represent a character at all, or it may encode (or encode part of) a control-character. It is possible to go around this issue by encoding your bytes as text. A common encoding uses hexadecimals. Each byte is encoded in exactly two characters, 4 bits per character. This makes it easy for programmers to see the contents of the bytes. Base64 on the other hand is not that legible as it encodes each 6 bits as a character. So base 64 is a more efficient encoding, but it is not very easy for humans to read.

Make sure you use a good character-encoding if you want your plaintext input to be legible. The ciphertext - despite the name - consists of bytes. If you want to create characters from that, use base 64.

Maarten Bodewes
  • 90,524
  • 13
  • 150
  • 263
  • The wrong data is only at the beginning of a new 64kb block, the rest of the data is encoded right. – D3473R Nov 21 '13 at 21:00
  • Unfortunately this is hard to debug for us. But please note that using string functions of any data that you view as binary is dangerous. For instance, you call `str()` functions on 64Ki blocks, but it could well be that you split one character in two if you have an encoding different from ASCII. One of the hex blocks shows both binary and ASCII digits. If you want to encrypt any file, you should not use *any* character encoding (except for the password). – Maarten Bodewes Nov 25 '13 at 02:12