How to save a list variable with multiple arrays to a single *.dat file in python?

Question

Hi everyone I'm new here. Now I'm confronting a problem that after a signal procession, I got a list object with 436 elements, each element is an array and their length ranges from 1,000 to 450,000. I need to save them to a *.dat file which maybe the input of an ANN of another colleague. Here are my codes:

#pulse_data_list is the mentioned list object
filename = 'F:/work/signal.dat'
file = open(filename , "wb")
file.write(pulse_data_list)
file.close()
print('Data has been saved, dir=',filename)

They doesn't work, shows "Type error: a byte-like object is required, not 'list'", what should I do?

tom10 · Accepted Answer · 2018-10-22T21:56:49.270

To answer your specific question, you probably want to use numpy's savez or savez_compressed. But then don't call it a .dat file, call it a .npz file.

I'm answering this because I've seen a number of questions that refer to *.dat files as though this is a specific, well defined format. It's not. I guess it needs a longer explanation, since comments to that effect don't seem to get the point across.

A file with a *.dat extension usually indicates that the file contains data but does not conform to a standard format. Therefore, whatever format you store the data in, you will need some additional documentation to tell the person who reads the file how the data is represented in the file. There's nothing wrong with using a *.dat format, but be aware that often you think you're not going to use data later, but then it turns out you do, and it can be very difficult to recover data that's stored in an unknown format.

For example, here's a Wikipedia list of the uses of the DAT extension. Notice that except for one, they are all just highly specific types of data files, and certainly, the generic "Data file in special format or ASCII" is going to be what you see the most.

Here are some guidelines to consider when deciding on a format:

1) If you are going to use a standard format, then use an extension that indicates that format, even if there isn't a standard extension. You don't want to be looking back 4 yrs later, and have 1000 different *.dat files, in 127 different formats. For example, if you use the pickle format, call the file *.p or *.pickle or something like that. Also, with large arrays, you're probably using numpy or scipy, and they have standard output formats, so use an extension that would give you a guess of the format (eg, *.npd, *.numpydata).

2) You need to decide if your format has a header, like a *.wav audio file. Generally a header is a fixed number of bits at the front of the file which will tell the person reading the file how the data is stored. In the example of a *.wav file, it will contain things like the sample rate and whether the data is stereo or mono. As demonstrated by a wav file, the advantage of the header is that you can store more variations of the same type of data. Another example of a header is that it can make it easy to store multiple things in a single file, like your case of multiple arrays, because the header can say where one ends and the next begins.

3) You need to decide whether to use ASCII or binary. ASCII files are easy to read and recover (because they can be read in a standard word processor), but they take more space. Binary formats are more efficient to store, read, and write, but you can't see anything useful if you don't know the format. (It's also common to store the data as an ASCII file and then compress it, which somewhat mitigates the storage size issue.)

With that in mind, if you want to write you own *.dat file format:

If you want to use ASCII, you can use the standard file.write.

If you want a binary format, you can use the standard module struct.

Thank you for your answer, it’s really helpful! Actually our team is still discussing the format to use, for most of us have different intellectual backgrounds and uses different tools like matlab, cool edit and they need to use the product I produced as their inputs. The original files of a signal was usually conveyed by a *.dat file with multiple bytes in int16, int32, float, string. Another *.pls file were created automatically to describe it like metadata. I’m trying to combine those files. — Qbz, Oct 22 '18 at 22:21
@Qbz: Personally, I think having two files is a big mistake - too easy to get them separated. Combining is good. Since you have a large file with binary data as a starting point, you could have the meta data info as binary data in the first N bits of this larger file. Then have a small program that can quickly read that header. This can be made as convenient as needed, from a shell script to a gui to a right-click or mouse-over. — tom10, Oct 24 '18 at 02:20

score 0 · Answer 2 · answered Oct 22 '18 at 12:42

0

You can use lib pickle — Python object serialization. (https://docs.python.org/3/library/pickle.html)

answered Oct 22 '18 at 12:42

user803422

2,636
2
18
36

How to save a list variable with multiple arrays to a single *.dat file in python?

2 Answers2