I am writing a script to add a "column" to a Python list of lists at 500 Hz. Here is the code that generates test data and passes it through a separate thread:
fileA:
import random, time, threading
data = [[] for _ in range(4)] # list with 4 empty lists (4 rows)
column = [random.random() for _ in data] # synthetic column of data
def synthesize_data():
while True:
for x,y in zip(data,column):
x.append(y)
time.sleep(0.002) # equivalent to 500 Hz
t1 = threading.Thread(target=synthesize_data).start()
# example of data
# [[0.61523098235, 0.61523098235, 0.61523098235, ... ],
# [0.15090349809, 0.15090349809, 0.15090349809, ... ],
# [0.92149878571, 0.92149878571, 0.92149878571, ... ],
# [0.41340918409, 0.41340918409, 0.41340918409, ... ]]
fileB (in Jupyter Notebook):
[1] import fileA, copy
[2] # get a copy of the data at this instant.
data = copy.deepcopy(fileA.data)
for row in data:
print len(row)
If you run cell [2] in fileB, you should see that the lengths of the "rows" in data
are not equal. Here is example output when I run the script:
8784
8786
8787
8787
I thought I might be grabbing the data in the middle of the for
loop, but that would suggest that the lengths would be off by 1 at the most. The differences get more severe over time. My question: why is quickly adding columns to a list of lists unstable? Is it possible to make this process for stable?
You might suggest I use something like Pandas, but I want to use Python lists because of their speed advantage (the code needs to be as fast as possible). I tested the for
loop, map()
function, and Pandas data frame. Here is my test code (in Jupyter Notebook):
import pandas as pd
import random
channels = ['C3','C4','C5','C2']
a = [[] for _ in channels]
b = [random.random() for _ in a]
def add_col((x,y)):
x.append(y)
df = pd.DataFrame(index=channels)
b_pandas = pd.Series(b, index=df.index)
%timeit for x,y in zip(a,b): x.append(y) # 1000000 loops, best of 3: 1.32 µs per loop
%timeit map(add_col, zip(a,b)) # 1000000 loops, best of 3: 1.96 µs per loop
%timeit df[0] = b # 10000 loops, best of 3: 82.8 µs per loop
%timeit df[0] = b_pandas # 10000 loops, best of 3: 58.4 µs per loop
You might also suggest that I append the samples to data
as rows and then transpose when it's time to analyze. I would rather not do that also in the interest of speed. This code will be used in a brain-computer interface, where analysis happens in a loop. Transposing would also have to happen in the loop, and this would get slow as the data grows.