2

For example, assume a given list of ints:

int_list = list(range(-10,10))
[-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

What is the most efficient way to find if any given two values in int_list sum to equal a given int, say 2?

I was asked this in a technical phone interview this morning on how to efficiently handle this scenario with an int_list of say, 100 million items (I rambled and had no good answer :/).

My first idea was:

from itertools import combinations
int_list = list(range(-10,10))
combo_list = list(combinations(int_list, 2))
desired_int = 4
filtered_tuples = list(filter(lambda x: sum(x) == desired_int, combo_list))
filtered_tuples
[(-5, 9), (-4, 8), (-3, 7), (-2, 6), (-1, 5), (0, 4), (1, 3)]

Which doesn't even work with a range of only range(-10000, 10000)

Also, does anyone know of a good online Python performance testing tool?

Douglas Denhartog
  • 2,036
  • 1
  • 16
  • 23
  • See http://stackoverflow.com/a/12775802/1899640 for a list of duplicates of this question – that other guy May 30 '14 at 22:57
  • By the way, if this is the narrow case where the list takes a single parameter, n, for range(-n,n), then I do believe my second answer is the best and most performant one. If this was the more general case of any possible list of integers, then Adam's answer is the best. – Russia Must Remove Putin May 31 '14 at 00:21
  • 2
    The sample range is just that, a sample. The real world case is any possible list of integers. – Douglas Denhartog May 31 '14 at 00:24
  • @AaronHall here you go. That said, I removed the "-1" from your second answer since the OP found it useful (though it covers only a specific case and doesn't provide a *real* answer). – Nir Alfasi Jun 02 '14 at 17:46

4 Answers4

3

For any integer A there is at most one integer B that will sum together to equal integer N. It seems easier to go through the list, do the arithmetic, and do a membership test to see if B is in the set.

int_list = set(range(-500000, 500000))
TARGET_NUM = 2

def filter_tuples(int_list, target):
    for int_ in int_list:
        other_num = target - int_
        if other_num in int_list:
            yield (int_, other_num)

filtered_tuples = filter_tuples(int_list, TARGET_NUM)

Note that this will duplicate results. E.g. (-2, 4) is a separate response from (4, -2). You can fix this by changing your function:

def filter_tuples(int_list, target):
    for int_ in int_list:
        other_num = target - int_
        if other_num in int_list:
            set.remove(int_)
            set.remove(other_num)
            yield (int_, other_num)
Adam Smith
  • 52,157
  • 12
  • 73
  • 112
3

EDIT: See my other answer for an even better approach (with caveats).

What is the most efficient way to find if any given two values in int_list sum to equal a given int, say 2?

My first inclination was to do it with the itertools module's combinations and the short-cutting power of any, but it could be quite slower than Adam's approach:

>>> import itertools
>>> int_list = list(range(-10,10))
>>> any(i + j == 2 for i, j in itertools.combinations(int_list, 2))
True

Seems to be fairly responsive for larger ranges:

>>> any(i + j == 2 for i, j in itertools.combinations(xrange(-10000,10000), 2))
True
>>> any(i + j == 2 for i, j in itertools.combinations(xrange(-1000000,1000000), 2))
True

Takes about 10 seconds on my machine:

>>> any(i + j == 2 for i, j in itertools.combinations(xrange(-10000000,10000000), 2))
True
Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
  • 1
    I was under the impression from his sample code that he needed to produce the combinations, not simply assert that they exist. I could be wrong though! Does `itertools.combinations` return a generator or a list? This could be a huge memory hog (though doubtlessly less so than `list(combinations)` in OP's code – Adam Smith May 30 '14 at 23:04
  • 1
    That's the question as stated, call me an obtuse logician, but I think it's right. – Russia Must Remove Putin May 30 '14 at 23:05
  • Running through all the combination is *not* the "most efficient way" of solving this question. – Nir Alfasi May 30 '14 at 23:14
  • @alfasin , see my other answer. – Russia Must Remove Putin May 30 '14 at 23:18
  • I ran a profiler that loops over your code 10K times and Adam's answer below is about 2.5 times faster (0.093 seconds vs. your code which ran in 0.245 seconds). I'll check your other answer as well. – Nir Alfasi May 30 '14 at 23:19
  • On a larger set your code took 22.037 seconds to execute while Adam's code took: 1.496 seconds – Nir Alfasi May 30 '14 at 23:26
  • @alfasin how did my other answer do? – Russia Must Remove Putin May 30 '14 at 23:31
  • @AaronHall your other answer has wrong assumptions about the structure of the list of ints. – Nir Alfasi May 30 '14 at 23:32
  • @alfasin No, you're making incorrect assumptions. The questioner provided a range incremented by one with an identical -start and stop. and he did thank me on the other answer as well. – Russia Must Remove Putin May 30 '14 at 23:37
  • 2
    @alfasin Worthwhile to note that mine is quicker because it simply builds a generator, it doesn't create any values. Aaron's builds the whole value list and asserts that a combination exists. If you made a test that returned `False` and ran `any(My_function(false_num, range(small_number,big_number)))` against Aaron's, his would be faster. – Adam Smith Jun 02 '14 at 16:57
  • @AdamSmith add an addendum to yours demonstrating! :) – Russia Must Remove Putin Jun 02 '14 at 17:27
2

A more literal approach using math:

Assume a given list of ints:

int_list = list(range(-10,10)) ... [-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

What is the most efficient way to find if any given two values in int_list sum to equal a given int, say 2? ... how to efficiently handle this scenario with an int_list of say, 100 million items.

It's clear that we can deduce the requirements that we can apply a single parameter, n, for the range of integers, of the form range(-n, n), which means every integer from negative n up to but not including positive n. From there the requirements are clearly to whether some number, x, is a sum of any two integers in that range.

Any such range can be trivially shown to contain a pair that sum to any number in that range and n-1 beyond it, so it's a waste of computing power to search for it.

def x_is_sum_of_2_diff_numbers_in_range(x, n):
    if isinstance(x, int) and isinstance(n, int):
        return -(n*2) < x < (n - 1)*2
    else:
        raise ValueError('args x and n must be ints')

Computes nearly instantly:

>>> x_is_sum_of_2_diff_numbers_in_range(2, 1000000000000000000000000000)
True

Testing the edge-cases:

def main():
    print x_is_sum_of_2_diff_numbers_in_range(x=5, n=4) # True
    print x_is_sum_of_2_diff_numbers_in_range(x=6, n=4) # False
    print x_is_sum_of_2_diff_numbers_in_range(x=-7, n=4) # True
    print x_is_sum_of_2_diff_numbers_in_range(x=-8, n=4) # False

EDIT:

Since I can see that a more generalized version of this problem (where the list could contain any given numbers) is a common one, I can see why some people have a preconceived approach to this, but I still stand by my interpretation of this question's requirements, and I consider this answer the best approach for this more specific case.

Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
  • I don't see how to run your answer on the following list of ints: `[1,10,7,2,15,-12,-10]` – Nir Alfasi May 30 '14 at 23:32
  • @alfasin But that's not what the questioner asked. – Russia Must Remove Putin May 30 '14 at 23:34
  • @AaronHall What about the first sentence: "For example, assume a given list of ints..." ? – Nir Alfasi May 30 '14 at 23:35
  • @alfasin No, you're making incorrect assumptions. The questioner provided a range incremented by one with an identical -start and stop, and a single such parameter. And he did thank me here. – Russia Must Remove Putin May 30 '14 at 23:38
  • @ddenhartog am I correct, or was I wrong to deduce the requirements as I stated in the body? – Russia Must Remove Putin May 31 '14 at 00:07
  • @Aaron, you did a wonderful job providing great answers! I +1'ed each, but Adam read between the lines to get me where I needed to go. Though your answers are great for other specific purposes and are ones I will use! THANKS! – Douglas Denhartog May 31 '14 at 00:26
  • 1
    Aaron: I love seeing these math-centered answers! I'm really awful at them myself. Bit twiddling and algorithms are not my strong suits, modeling and iterating are. I appreciate the (imo better) approach, even if it's a bit more limited. – Adam Smith Jun 02 '14 at 16:54
  • Assuming an interval of `[-n,n]` like you suggested, you should change your answer to `-(n*2) + 1 <= x <= (n - 1)*2 - 1` (the right side changed from `<` to `<=`) – Nir Alfasi Jun 02 '14 at 17:51
  • @alfasin it's an interval of `[-n, n)` though. `list(range(-10,10))` returns `[-10, -9, -8, ... , 7, 8, 9]` – Adam Smith Jun 02 '14 at 18:05
  • You're right, but I'm even better off getting rid of the -1 and +1 and making them both < (after assuring args are ints). – Russia Must Remove Putin Jun 02 '14 at 18:07
0

I would have thought that any solution that depends on a doubly nested iteration over the list (albeit having the inner loop concealed by a nifty Python function) is O(n^2).

It is worth considering sorting the input. For any reasonable comparison-based sort, this will be O(n.lg(n)), which is already better than O(n^2). You might do better with a radix sort or pre-sort (making something like a bucket sort) depending on the range of the input list.

Having sorted the input, it is an O(n) operation to find a pair of numbers that sum to any given number, so your overall complexity is O(n.lg(n)).

In practice, it's an open question whether, for the stipulated “large number” of elements, a brute-force O(n^2) algorithm with nice cache behavior (zipping through arrays in order) would outperform the asymptotically better algorithm that moves a lot of data around, but eventually the one with the lower asymptotic complexity will win.

Emmet
  • 6,192
  • 26
  • 39