One answer was flagged as being low quality for not explaining itself. But none of the other three do that, and they are just replicas of each other.
In [227]: names = """
...: 1 2 1
...: 1 1 0
...: 0 1 1
...: """
In [238]: np.genfromtxt(StringIO(names), dtype=int)
Out[238]:
array([[1, 2, 1],
[1, 1, 0],
[0, 1, 1]])
In [239]: timeit np.genfromtxt(StringIO(names), dtype=int)
135 µs ± 286 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Actually we don't need the StringIO
layer; just split the string into lines (sometimes we need a format=None
parameter):
In [242]: np.genfromtxt(names.splitlines(), dtype=int)
Out[242]:
array([[1, 2, 1],
[1, 1, 0],
[0, 1, 1]])
The original function is 10x faster than the accepted one(s):
def orig(names):
names_list = names.splitlines()
tem = []
for i in [row for row in names_list if row]:
tem.append([col for col in list(i) if col != ' '])
return np.array(tem, dtype=np.int)
In [244]: orig(names)
Out[244]:
array([[1, 2, 1],
[1, 1, 0],
[0, 1, 1]])
In [245]: timeit orig(names)
11.1 µs ± 194 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
genfromtxt
does basically the same thing - split lines, collect values in a list of lists, and turn that into an array. It is not compiled.
The flagged answer replaces the list comprehension with a split
method:
def czisws(names):
names_list = names.splitlines()
tem = []
for i in [row for row in names_list if row]:
tem.append(i.split())
return np.array(tem, dtype=np.int)
In [247]: timeit czisws(names)
8.58 µs ± 274 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
It is faster, which isn't surprising. split
is a string method. Builtin methods typically are faster, and preferable even if they aren't.
Split is also more general purpose:
In [251]: 'abc de f'.split()
Out[251]: ['abc', 'de', 'f']
In [252]: [i for i in list('abc de f') if i!=' ']
Out[252]: ['a', 'b', 'c', 'd', 'e', 'f']