CSV to binary using Python

Question

I have a lot of csv files and I want to transform them into binary files, so I want to create a python script that can automate this task for me. My CSV files contain either 0 or 255.(every file has 80 row and 320 columns)

I wrote this code :

import numpy as np
import csv

csv_filename = '320x80_ImageTest_1_255.csv'
filename = "output.bin"

with open(csv_filename) as f:
    reader = csv.reader(f, delimiter =';')
    lst = list(reader)

array = np.array(lst)

with open ('new_binary.bin','wb') as FileToWrite:
    for i in range(len(array)):
        for j in range(len(array[0])):
            FileToWrite.write(''.join(chr(int(array[i][j]))).encode())

The problem is the output file is like this : screen of the output file

But intead of this caracter i want ff which corresponds to 255 in hex, where am i doing something wrong? can someone help me?

I'm still wondering what you are trying to archive there. This looks like you want to figure out the most complex way to make a copy of a file. What output do you expect from what input? — Klaus D., Jun 06 '23 at 12:53
Let's say for example my csv file is something like this : [['0;0;0;0;255;255;0;0;255], ['255;255;0;0;0;255;0;0;255], ['0;255;0;0;255;0;255;0;0], ['0;0;255;0;0;255;0;0;0]] I want to get a binary file like this : 0 0 0 0 ff ff 0 0 ff ff ff 0 0 0 ff 0 0 ff 0 ff 0 0 ff 0 ff 0 0 0 0 ff 0 0 ff 0 0 0 — mohamed AFASSI, Jun 06 '23 at 13:01
If you want the file to actually contain pairs of `f` characters, that's not binary at all - it would be exactly as much of a text file as your original CSV, just with numbers represented in a different base. The code you posted achieves exactly what you said you wanted, that's just apparently not actually what you wanted. — jasonharper, Jun 06 '23 at 13:19
I think you're right, looking into the hex view of the output file, I see [this](https://imgur.com/a/casG274). So instead of the C3BF i want FF, that is my goal. I'm very new to the field so I might understand something in the wrong way and i'm still not fully used to work with binary — mohamed AFASSI, Jun 06 '23 at 13:43
I think you are doing it right (opening as binary) - but from your source and what you want, you seem to serialize the arrays incorrectly. I understand that from `[[0;0;0;0;255;255;0;0;255], [255;255;0;0;0;255;0;0;255],..` you want `0000XX00XXX000X00X...` (0 and X representing bytes 0x00 and 0xFF respectively. Is this correct ? Note: is there really a single quote at the start of each block ? Like `[['0;0;0;0;255;255;0;0;255], ['255;..` - or is that just your editing here ? — MyICQ, Jun 06 '23 at 14:22
Please [*do not* use (sole) images of code/data/errors](https://meta.stackoverflow.com/a/285557/3439404) in your [mcve]. Copy the actual text, paste it into the question, then format it as code. Post a hexadecimal dump in case of (partially) binary content. Please [edit] your question to improve your [mcve]. In particular, share your `csv` file (about first 3-5 lines should suffice). BTW, your output file could be good as `chr(255)` returns `ÿ`? — JosefZ, Jun 06 '23 at 15:36
@MyICQ Yes it's want I want, for the quote there's really one at the start of each block. I don't know how the client made up there csv files but it's like that when i print the lst variable. — mohamed AFASSI, Jun 07 '23 at 07:04

Zach Young · Answer 1 · 2023-06-07T20:22:01.590

3

Do you want something like the following:

rows = [
    ["0", "0", "0", "0", "255", "255", "0", "0", "255"],
    ["255", "255", "0", "0", "0", "255", "0", "0", "255"],
    ["0", "255", "0", "0", "255", "0", "255", "0", "0"],
    ["0", "0", "255", "0", "0", "255", "0", "0", "0"],
]

with open("output.bin", "wb") as f_out:
    for row in rows:
        for field in row:
            f_out.write(int(field).to_bytes())

Then, inspecting output.bin:

with open("output.bin", "rb") as f_in:
    while True:
        x = f_in.read(9)
        if len(x) == 0:
            break
        print(x)

b'\x00\x00\x00\x00\xff\xff\x00\x00\xff'
b'\xff\xff\x00\x00\x00\xff\x00\x00\xff'
b'\x00\xff\x00\x00\xff\x00\xff\x00\x00'
b'\x00\x00\xff\x00\x00\xff\x00\x00\x00'

Thanks to Writing integers in binary to file in python for showing me the to_bytes(...) method, and for MyICQ for pointing out the defaults.

edited Jun 07 '23 at 20:22

answered Jun 06 '23 at 20:21

Zach Young

10,137
4
32
53

1

Notice that the function `to_bytes()` was changed in 3.11. Previous versions have no default arguments, so need *two arguments*, thus : `to_bytes(1,'big')` (although technically byteorder is indifferent for 1..). For Python 3.11, both arguments are optional, and defaults to 1. So you could do `to_bytes()`. – MyICQ Jun 07 '23 at 14:03
@MyICQ, thank you for pointing that out; I've updated my post. Cheers. – Zach Young Jun 07 '23 at 20:24
I use python 3.9 ( sued full arguments) but still gets the same caracters when i visualize the file I get in the output, I literally copied and pasted your code but we get different results. I'll try leveling up python to the latest version then keep you guys updated – mohamed AFASSI Jun 08 '23 at 13:07

MyICQ · Answer 2 · 2023-06-08T22:40:38.303

This does pretty much what is described.

I left out the reading of the input to a variable, it should be trivial. Since the input contains the ' character it can't be read as json. Instead I see it as a series of numbers, separated by something. Then a regular expression is applied to turn the numbers into an array.

# Regular expression support
import re

# the input, should be read from file
dirtyinput = "[['0;0;0;0;255;255;0;0;255], ['255;255;0;0;0;255;0;0;255], ['0;255;0;0;255;0;255;0;0], ['0;0;255;0;0;255;0;0;0]]"

# extract numbers
numbers = re.findall(r'\d+', dirtyinput)

# Using function from answer by Zach Young
with open("output.bin", "wb") as f_out:
    for n in numbers:
        f_out.write(int(n).to_bytes(1, 'big'))

# --------- another method, iterating the data (efficient if the data is large)
#
with open("output2.bin", "wb") as f:
    for x in re.finditer(r'\d+', dirtyinput):
        f.write(int(x.group()).to_bytes(1,'big'))

# -------- testing result
# 
with open("output.bin", "rb") as f_in:
    while True:
        x = f_in.read(9)
        if len(x) == 0:
            break
        print(x)

b'\x00\x00\x00\x00\xff\xff\x00\x00\xff'
b'\xff\xff\x00\x00\x00\xff\x00\x00\xff'
b'\x00\xff\x00\x00\xff\x00\xff\x00\x00'
b'\x00\x00\xff\x00\x00\xff\x00\x00\x00'

I get same result as answer above.

This was tested using Python 3.9. This is why I had to use `to_bytes(1,'big')` instead of just `to_bytes()`. See my comment above. — MyICQ, Jun 08 '23 at 22:34

CSV to binary using Python

2 Answers2