4

I have a list of tuples, corresponding to (x,y) coordinates of some points (can be from 8 to hundreds of points):

mylist = [(x0,y0), (x1,y1), ..., (xn,yn)]

I want to get the min and the max values of x and y coordinates (min of all x, whatever they, and so on). It is to optimize the scale to draw the points into a rectangular area.

So I have two solutions :

  • The first solution: create two lists with coordinates [foo[0] for foo in mylist] and the same with foo[1]. Then I can get easily the min and max. But I have to create the lists (in order not to do the comprehension two times, one for a min, one for a max).

  • second solution: sort the list twice, once according to the first coordinate, then to second coordinate and each time get first and last value. Less memory usage, but need sorting.

What would be the best solution?

Ch3steR
  • 20,090
  • 4
  • 28
  • 58
rvil76
  • 159
  • 8
  • 2
    Please explain in *objective* terms what you mean by "best". – TylerH Feb 03 '20 at 17:40
  • The best answer is one that works. Attempting *a* solution provides insights to that solution. Also, refactoring is a big part of programming. You learn more about the problem as you attempt to solve it, and you learn more about a particular implementation as you.. write the implementation. Additionally, as the previous comment sais, that depends on your needs. What is considered best for a particular use case may in fact be counterproductive in other cases, even most cases. Don't prematurely optimize. Consider Big-O alogrithms. But often just starting with one will lead you a better answer. – SherylHohman Feb 07 '20 at 18:11

5 Answers5

6

You could use max with itemgetter() function which I think it's more efficient solution comparing to lambda, conform to this answer.

from operator import itemgetter
max_x = max(mylist,key=itemgetter(0))[0]
Mihai Alexandru-Ionut
  • 47,092
  • 13
  • 101
  • 128
5

You can use zip here.

In [1]: a=[(1,2),(3,4),(5,6)]

In [2]: x,y=zip(*a)

In [3]: x
Out[3]: (1, 3, 5)

In [4]: y
Out[4]: (2, 4, 6)

In [5]: min(x),max(x)
Out[5]: (1, 5)  #1 in min and 5 is max in x

In [6]: min(y),max(y)
Out[6]: (2, 6)   #2 is min and 5 is max in y

timeit analysis on google colab.

%timeit minmax(z) #ch3ster's answer
1 loop, best of 3: 546 ms per loop

%timeit  minmax1(z) #CDJB's answer
1 loop, best of 3: 1.22 s per loop

%timeit minmax2(z) #Mihai Alexandru-Ionut's answer
1 loop, best of 3: 749 ms per loop

%timeit minmax3(z) #Yevhen Kuzmovych's answer
1 loop, best of 3: 1.59 s per loop

EDIT: We can still reduce the execution time if we use set here.

In [24]: def minmax(a):
    ...:     x=set()
    ...:     y=set()
    ...:     for i,j in a:
    ...:         x.add(i)
    ...:         y.add(j)
    ...:     return max(x),min(x),max(y),min(y)

A list of tuples (size of 3 million or 30 lakh) is used for benchmarking.

z=[(randint(0,10),randint(0,10)) for _ in range(3000000)]

timeit analysis as of this edit(4th Feb 12:28 AM) in python 3.7 and windows 10.

In [25]: timeit minmax(z) #Ch3steR's set answer.
384 ms ± 26.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [44]: timeit minmax1(z) #Ch3steR's zip answer.
626 ms ± 3.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [39]: timeit minmax2(z) #CDJB's answer max with lambda
1.18 s ± 25.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [40]: timeit minmax3(z) #Mihai Alexandru-Ionut's answer max with itemgetter
739 ms ± 42.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [41]: timeit minmax4(z) #Yevhen Kuzmovych's answer with updating max and min while iterating
1.97 s ± 42.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Ch3steR's set answer < Ch3steR's zip answer < Mihai Alexandru-Ionut's answer max and min with itemgetter < CDJB's answer max and min with lambda < Yevhen Kuzmovych's answer with updating max and min while iterating

when 0<= x,y <=1000000 List used for benchmarking.

x=[(randint(0,1000000),randint(0,1000000)) for _ in range(3000000)]

timeit analysis.

In [48]: timeit minmax(x) #Ch3steR's set answer.
1.75 s ± 92.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [49]: timeit minmax1(x) #Ch3steR's zip answer.
753 ms ± 31.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [51]: timeit minmax2(x) #CDJB's answer max with lambda
1.29 s ± 115 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [52]: timeit minmax3(x) #Mihai Alexandru-Ionut's answer max with itemgetter
794 ms ± 35.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [53]: timeit minmax4(x) #Yevhen Kuzmovych's answer with updating max and min while iterating
2.3 s ± 164 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

NOTE :

Ch3steR's set is efficient when 0< x,y < 10 but when 0< x,y <1000000 it averages to 1.7s

I strongly suggest using Ch3steR's answer with zip or Mihai Alexandru-Ionut's answer max and min with itemgetter when 0< x,y < 1000000.

Ch3steR
  • 20,090
  • 4
  • 28
  • 58
2

Here is another solution:

max_x, max_y = min_x, min_y = mylist[0]
for x, y in mylist:
    max_x = max(max_x, x)
    max_y = max(max_y, y)
    min_x = min(min_x, x)
    min_y = min(min_y, y)
Yevhen Kuzmovych
  • 10,940
  • 7
  • 28
  • 48
1

You can use min() and max() with a key argument. To get your required result, you could use:

max_y = max(mylist, key=lambda x: x[1])[1]
min_y = min(mylist, key=lambda x: x[1])[1]
max_x = max(mylist, key=lambda x: x[0])[0]
min_x = min(mylist, key=lambda x: x[0])[0]
CDJB
  • 14,043
  • 5
  • 29
  • 55
1

Welcome to SO!

Hope this helps you.

I suggest you go for option1. Here you can further optimize your approach by the following steps

  • step1: Parse the whole list once, to get x_min and y_min. Complexity is O(N)
  • step2: Store only the indexes of the tuples with x_min and y_min (more than 50% space is saved). Complexity is O(N).

If you are looking for just min or max, then never sort such a big list. Sorting can take from complexity is O(N*N) to O(NlogN).

sam
  • 1,819
  • 1
  • 18
  • 30
  • Your answer is a bit confusing. By "parse" you mean "traverse"/"loop over"? Even though "more than 50% space is saved" is true, it is better to say that space complexity is constant = `O(1)` instead of `O(n)`. And `O(NlogN)` is better/faster/smaller complexity than `O(N*N)`, so you probably mean "from `O(NlogN)` to `O(N*N)`". – Yevhen Kuzmovych Feb 03 '20 at 10:54
  • @YevhenKuzmovych - Yes and Yes – sam Feb 03 '20 at 10:57