I have a pandas series say
import pandas as pd
a = pd.Series([
[1, 2, 3, 4, 5],
[6, 7, 8, 3, 334],
[333, 4, 5, 3, 4]
])
I want to find the largest element in all lists, which is 334, what is the easy way to do it?
I have a pandas series say
import pandas as pd
a = pd.Series([
[1, 2, 3, 4, 5],
[6, 7, 8, 3, 334],
[333, 4, 5, 3, 4]
])
I want to find the largest element in all lists, which is 334, what is the easy way to do it?
Option 1
Only works if elements are actually list
. This is because sum
concatenates lists. This is also likely very slow.
max(a.sum())
334
Option 2
minimal two tiered application of max
max(map(max, a))
334
Option 3
Only works if all lists are same length
np.max(a.tolist())
334
Option 4
One application of max
on an unwound generator
max(x for l in a for x in l)
334
This is one way:
max(max(i) for i in a)
Functional variant:
max(map(max, a))
Alternative method which only calculates one max
:
from toolz import concat
max(concat(a))
For the fun of it below is some benchmarking. The lazy function concat
and optimised map
/ list comprehension do best, then come numpy
functions, pandas
methods usually worse, clever sum
applications last.
import numpy as np
from toolz import concat
import pandas as pd
a = pd.Series([list(np.random.randint(0, 10, 100)) for i in range(1000)])
# times in ms
5.92 max(concat(a))
6.29 max(map(max, a))
6.67 max(max(i) for i in a)
17.4 max(x for l in a for x in l)
19.2 np.max(a.tolist())
20.4 np.concatenate(a.values).max()
64.6 pd.DataFrame(a.values.tolist()).max().max()
373 np.max(a.apply(pd.Series).values)
672 max(sum(a,[]))
696 max(a.sum())
To dataframe
pd.DataFrame(a.values.tolist()).max().max()
Out[200]: 334
Or numpy.concatenate
np.concatenate(a.values).max()
Out[201]: 334
Or
max(sum(a,[]))
Out[205]: 334
Yet another answer using np.max:
import numpy as np
np.max(a.apply(pd.Series).values)
Out[175]: 334