The most efficient way to store data from sensor using Python on Raspberry Pi

Question

I'm using SPI reading data from IMU LSM9DS1. I want to store the data to a file. I have tried to save as a txt file using with open as file and .write. the speed is 0.002s.

while flag:
    file_path_g = '/home/pi/Desktop/LSM9DS1/gyro.txt'
    with open(file_path_g, 'a') as out_file_g:
        dps = dev.get_gyro()
        out_file_g.write(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f'))
        out_file_g.write(" {0:0.3f}, {1:0.3f}, {2:0.3f}\n".format(dps[0], dps[1], dps[2]))

    file_path_a = '/home/pi/Desktop/LSM9DS1/accel.txt'
    with open(file_path_a, 'a') as out_file_a:
        acc = dev.get_acc()
        out_file_a.write(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f'))
        out_file_g.write(" {0:0.3f}, {1:0.3f}, {2:0.3f}\n".format(acc[0], acc[1], acc[2]))
    # time.sleep(0.2)

print("interrupt occured")

dev.close()

I also tried to use pandas to save the data as a .csv file. the speed is slower than the first one.

while flag:
    t = time.time()
    acc = dev.get_acc()
    dps = dev.get_gyro()
    ax = acc[0]
    ay = acc[1]
    az = acc[2]
    gx = dps[0]
    gy = dps[1]
    gz = dps[2]
    result = pd.DataFrame({'time':t, 'ax':ax,'ay':ay,'az':az,'gx':gx,'gy':gy,'gz':gz},index=[0])
    result.to_csv('/home/pi/Desktop/LSM9DS1/result.csv', mode='a', float_format='%.6f',
    header=False, index=0)

dev.close()

what can I do to improve the reading speed?

I update the code, outside the path.

file_path = '/home/pi/Desktop/LSM9DS1/result.txt'
while flag:
    with open(file_path, 'a') as out_file:
        acc = dev.get_acc()
        dps = dev.get_gyro()
        out_file.write(datetime.datetime.now().strftime('%S.%f'))
        out_file.write(" {0:0.3f}, {1:0.3f}, {2:0.3f}".format(acc[0], acc[1], acc[2]))
        out_file.write(" {0:0.3f}, {1:0.3f}, {2:0.3f}\n".format(dps[0], dps[1], dps[2]))

this is the other way

while flag:
    t = time.time()
    acc = dev.get_acc()
    dps = dev.get_gyro()
    arr = [t, acc[0], acc[1], acc[2], dps[0], dps[1],dps[2]],
    np_data = np.array(arr)
    result = pd.DataFrame(np_data,index=[0])
    result.to_csv('/home/pi/Desktop/LSM9DS1/result.csv', mode='a', float_format='%.6f', header=False, index=0)

Thanks for Mark's answer. I did what he said, changed the code as below.

samples=[]
for i in range(100000):
    t = time.time()
    acc = dev.get_acc()
    dps = dev.get_gyro()
    # Append a tuple (containing time, acc and dps) onto sample list
    samples.append((t, acc, dps))

name = ['t','acc','dps']
f = pd.DataFrame(columns=name,data=samples)
f.to_csv('/home/pi/Desktop/LSM9DS1/result.csv', mode='a', float_format='%.6f', header=False, index=0)
print('done')

I have calculated the space of time (first 600 data), the average is 0.000265, it's much faster than before, almost 10 times as before.

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/208300/discussion-on-question-by-yu-bohang-the-most-efficient-way-to-store-data-from-se). — Bhargav Rao, Feb 22 '20 at 02:21

Mark Setchell · Accepted Answer · 2020-02-21T20:06:49.567

1

As I said in the comments: "The answer is vastly different depending on what you are trying to do! If the gyro is on a drone and you are sending the data to a PC to control the direction, you need to get the latest reading to the PC with the minimum latency - this requires no storage, and data from 4 seconds ago is useless. If you are running an experiment for 4 hours and analysing the results later, you probably want to read the gyro at the maximum rate, storing it all locally and transferring it at the end - this requires more storage."

The fastest place to store a large number of samples is in a list in RAM:

samples=[]
while flag:
    t = time.time()
    acc = dev.get_acc()
    dps = dev.get_gyro()
    # Append a tuple (containing time, acc and dps) onto sample list
    samples.append((t, acc, dps))

Benchmark

Running in IPython on my desktop, this can store 2.8 million tuples per second, each containing the time and 2 lists of 3 elements each:

In [92]: %%timeit 
...:  
...: samples=[] 
...: for i in range(2800000): 
...:     t = time.time() 
...:     acc = [1,2,3] 
...:     dps = [4,5,6] 
...:     # Append a tuple (containing time, acc and dps) onto sample list 
...:     samples.append((t, acc, dps))

1.05 s ± 7.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Feb 21 '20 at 20:06

answered Feb 21 '20 at 16:35

Mark Setchell

191,897
31
273
432

Hello Mark, I want to know if the speed of saving data to file have some influence. When the saving speed is slower than read, will we loss some data? – Yu Bohang Feb 22 '20 at 09:00
The code I show does not write to a file. It stores the samples in a Python list in memory (RAM). – Mark Setchell Feb 22 '20 at 09:44
What should I do if I want to store the "samples" into a file, another answer said that it is better to save as a binary format file. I try use "struct", the TypeError said "a bytes-like object is required, not 'tuple'" – Yu Bohang Feb 22 '20 at 09:57
I am suggesting you store your data in memory during your experiment because you said you wanted to go as fast as possible and memory is 1000s of times faster than disk. I am then suggesting you write your data to disk at the end of the experiment when it doesn't matter if it takes 12 or 15 seconds. So the format on disk is unimportant. – Mark Setchell Feb 22 '20 at 10:27
Thanks a lot! I tried to save the data as a csv file, but the values in second and third columns is also a list. How can I read the data in MATLAB. – Yu Bohang Feb 22 '20 at 14:24

score 0 · Answer 2 · answered Feb 21 '20 at 14:36

Some ideas which may improve speed and which you may try:

use binary format instead of text - write binary time (see: Write and read Datetime to binary format in Python) and write binary floats. You may process them later offline.
call get_acc and get_gyro in parallel
store some number of measurements in memory and write whole buffer of them at once instead of calling write many times
have separate thread for writing and separate thread for getting measurements
rewrite in C

Thanks for your advices, I try to use binary format first, after can I change it to float. — Yu Bohang, Feb 22 '20 at 09:03

The most efficient way to store data from sensor using Python on Raspberry Pi

2 Answers2