2

I have a lot of csv files and I want to transform them into binary files, so I want to create a python script that can automate this task for me. My CSV files contain either 0 or 255.(every file has 80 row and 320 columns)

I wrote this code :

import numpy as np
import csv

csv_filename = '320x80_ImageTest_1_255.csv'
filename = "output.bin"

with open(csv_filename) as f:
    reader = csv.reader(f, delimiter =';')
    lst = list(reader)

array = np.array(lst)

with open ('new_binary.bin','wb') as FileToWrite:
    for i in range(len(array)):
        for j in range(len(array[0])):
            FileToWrite.write(''.join(chr(int(array[i][j]))).encode())

The problem is the output file is like this : screen of the output file

But intead of this caracter i want ff which corresponds to 255 in hex, where am i doing something wrong? can someone help me?

  • https://docs.python.org/3/library/functions.html#ord – rasjani Jun 06 '23 at 12:52
  • I'm still wondering what you are trying to archive there. This looks like you want to figure out the most complex way to make a copy of a file. What output do you expect from what input? – Klaus D. Jun 06 '23 at 12:53
  • Let's say for example my csv file is something like this : [['0;0;0;0;255;255;0;0;255], ['255;255;0;0;0;255;0;0;255], ['0;255;0;0;255;0;255;0;0], ['0;0;255;0;0;255;0;0;0]] I want to get a binary file like this : 0 0 0 0 ff ff 0 0 ff ff ff 0 0 0 ff 0 0 ff 0 ff 0 0 ff 0 ff 0 0 0 0 ff 0 0 ff 0 0 0 – mohamed AFASSI Jun 06 '23 at 13:01
  • If you want the file to actually contain pairs of `f` characters, that's not binary at all - it would be exactly as much of a text file as your original CSV, just with numbers represented in a different base. The code you posted achieves exactly what you said you wanted, that's just apparently not actually what you wanted. – jasonharper Jun 06 '23 at 13:19
  • I think you're right, looking into the hex view of the output file, I see [this](https://imgur.com/a/casG274). So instead of the C3BF i want FF, that is my goal. I'm very new to the field so I might understand something in the wrong way and i'm still not fully used to work with binary – mohamed AFASSI Jun 06 '23 at 13:43
  • I think you are doing it right (opening as binary) - but from your source and what you want, you seem to serialize the arrays incorrectly. I understand that from `[[0;0;0;0;255;255;0;0;255], [255;255;0;0;0;255;0;0;255],..` you want `0000XX00XXX000X00X...` (0 and X representing bytes 0x00 and 0xFF respectively. Is this correct ? Note: is there really a single quote at the start of each block ? Like `[['0;0;0;0;255;255;0;0;255], ['255;..` - or is that just your editing here ? – MyICQ Jun 06 '23 at 14:22
  • Please [*do not* use (sole) images of code/data/errors](https://meta.stackoverflow.com/a/285557/3439404) in your [mcve]. Copy the actual text, paste it into the question, then format it as code. Post a hexadecimal dump in case of (partially) binary content. Please [edit] your question to improve your [mcve]. In particular, share your `csv` file (about first 3-5 lines should suffice). BTW, your output file could be good as `chr(255)` returns `ÿ`? – JosefZ Jun 06 '23 at 15:36
  • @MyICQ Yes it's want I want, for the quote there's really one at the start of each block. I don't know how the client made up there csv files but it's like that when i print the lst variable. – mohamed AFASSI Jun 07 '23 at 07:04

2 Answers2

3

Do you want something like the following:

rows = [
    ["0", "0", "0", "0", "255", "255", "0", "0", "255"],
    ["255", "255", "0", "0", "0", "255", "0", "0", "255"],
    ["0", "255", "0", "0", "255", "0", "255", "0", "0"],
    ["0", "0", "255", "0", "0", "255", "0", "0", "0"],
]

with open("output.bin", "wb") as f_out:
    for row in rows:
        for field in row:
            f_out.write(int(field).to_bytes())

Then, inspecting output.bin:

with open("output.bin", "rb") as f_in:
    while True:
        x = f_in.read(9)
        if len(x) == 0:
            break
        print(x)
b'\x00\x00\x00\x00\xff\xff\x00\x00\xff'
b'\xff\xff\x00\x00\x00\xff\x00\x00\xff'
b'\x00\xff\x00\x00\xff\x00\xff\x00\x00'
b'\x00\x00\xff\x00\x00\xff\x00\x00\x00'

Thanks to Writing integers in binary to file in python for showing me the to_bytes(...) method, and for MyICQ for pointing out the defaults.

Zach Young
  • 10,137
  • 4
  • 32
  • 53
  • 1
    Notice that the function `to_bytes()` was changed in 3.11. Previous versions have no default arguments, so need *two arguments*, thus : `to_bytes(1,'big')` (although technically byteorder is indifferent for 1..). For Python 3.11, both arguments are optional, and defaults to 1. So you could do `to_bytes()`. – MyICQ Jun 07 '23 at 14:03
  • @MyICQ, thank you for pointing that out; I've updated my post. Cheers. – Zach Young Jun 07 '23 at 20:24
  • I use python 3.9 ( sued full arguments) but still gets the same caracters when i visualize the file I get in the output, I literally copied and pasted your code but we get different results. I'll try leveling up python to the latest version then keep you guys updated – mohamed AFASSI Jun 08 '23 at 13:07
1

This does pretty much what is described.

I left out the reading of the input to a variable, it should be trivial. Since the input contains the ' character it can't be read as json. Instead I see it as a series of numbers, separated by something. Then a regular expression is applied to turn the numbers into an array.

# Regular expression support
import re

# the input, should be read from file
dirtyinput = "[['0;0;0;0;255;255;0;0;255], ['255;255;0;0;0;255;0;0;255], ['0;255;0;0;255;0;255;0;0], ['0;0;255;0;0;255;0;0;0]]"

# extract numbers
numbers = re.findall(r'\d+', dirtyinput)

# Using function from answer by Zach Young
with open("output.bin", "wb") as f_out:
    for n in numbers:
        f_out.write(int(n).to_bytes(1, 'big'))

# --------- another method, iterating the data (efficient if the data is large)
#
with open("output2.bin", "wb") as f:
    for x in re.finditer(r'\d+', dirtyinput):
        f.write(int(x.group()).to_bytes(1,'big'))

# -------- testing result
# 
with open("output.bin", "rb") as f_in:
    while True:
        x = f_in.read(9)
        if len(x) == 0:
            break
        print(x)
b'\x00\x00\x00\x00\xff\xff\x00\x00\xff'
b'\xff\xff\x00\x00\x00\xff\x00\x00\xff'
b'\x00\xff\x00\x00\xff\x00\xff\x00\x00'
b'\x00\x00\xff\x00\x00\xff\x00\x00\x00'

I get same result as answer above.

MyICQ
  • 987
  • 1
  • 9
  • 25