2

I am experiencing unexpected behavior with file-reading and hashing (in Python 3.7).

I have a file that simply has the text "helloworld" in it, without a newline at the end:

>>hexdump -C input.txt
00000000  68 65 6c 6c 6f 77 6f 72  6c 64 0a                 |helloworld.|
0000000b

I run the following Python script:

def hashit(inp):
    return hashlib.md5(inp.encode('utf-8')).hexdigest()

from_var = 'helloworld'

with open('input.txt', 'r') as fo:
    from_file = fo.read()

print(f' from_file      : { repr(from_file) }')
print(f' from_var       : { repr(from_var) }')

print(f' from_file hash : { hashit(from_file) }')
print(f' from_var  hash : { hashit(from_var) }')

I get the following output:

from_file      : 'helloworld\n'
from_var       : 'helloworld'
from_file hash : d73b04b0e696b0945283defa3eee4538
from_var  hash : fc5e038d38a57032085441e7fe7010b0

The first thing I notice is the newline at the end when I read the file. Where does this come from?

Given the trailing newline, it is not surprising that the hashes are different for the two strings.

To check, I then ran md5sum utility directly on the file:

>>md5sum input.txt 
d73b04b0e696b0945283defa3eee4538  input.txt

This I don't get at all. The md5sum from the shell is the same as the md5sum of the string with the trailing newline - even though there is no newline in the file.

So my questions are:

  1. Why does .read() append a newline to the end of the file?
  2. Why does the md5sum from the command line correspond to the string **with** the trailing newline, even though the file has no newline?
feob
  • 1,930
  • 5
  • 19
  • 31

0 Answers0