0

I have some binary files. When i store them to local files i can read them as binary files.

with open("binary_file", 'rb') as f:
    print("Binary file:  ", f.read())

Result:

Binary file:   b'Ix\x9d\xdf\xd2\xf6\x83\xe8B\x95.... (a long binary)

But i want to store them and retrieve them from HDFS. When i use the following commands:

f = os.popen("hdfs dfs -cat binary_file")
print("Binary file:  ", f.read())

I get an error on the 'print':

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9d in position 2: invalid start byte

I think that this commands reads the file as a text file. How can i explicitly read the file as binary?

Ermolai
  • 303
  • 4
  • 15

1 Answers1

1

os.popen() defaults to a text r mode.

Use subprocess.check_output() instead; it defaults to binary:

import subprocess
output = subprocess.check_output("hdfs dfs -cat encrypted_file", shell=True)
AKX
  • 152,115
  • 15
  • 115
  • 172