5

I have an application where I need to be able to distinguish between numbers and bools as quickly as possible. What are the alternatives apart from running isinstance(value, bool) first?

Edit: Thanks for the suggestions. Actually, what I want to be able to do have a check for numbers that leaves out bools so that I can reorder my checks (numbers are far more prevalent) and improve my negacalls. isinstance() itself is fast enough. The x is True or x is False is intriguing.

Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
  • 1
    I believe `isinstance(value, bool)` is already idiomatic. – miku Feb 24 '15 at 10:29
  • You don't really have any choices - non-zero numbers will otherwise be evaluated as truth-y, and `isinstance(True, int)` is `True`. – jonrsharpe Feb 24 '15 at 10:30
  • Alternatively, since bools are exclusively from `[True, False]`, you could do `value is True or value is False`, but I doubt, that it's faster. – Boldewyn Feb 24 '15 at 10:30
  • 2
    @Boldewyn, it is actually a bit faster – Padraic Cunningham Feb 24 '15 at 10:34
  • @MalikBrahimi No. `True` and `False` are specific instances of the class `bool(int)`. Literal `0` and `1` are instances of `int()`. – Boldewyn Feb 24 '15 at 10:39
  • `value.__class__` is faster if you only need to differentiate them. – grc Feb 24 '15 at 11:02
  • _numbers are far more prevalent_ - Could you give an approximate fraction of ints, True, False? – user Feb 24 '15 at 11:36
  • It's not just ints. I'd say >= 98 % are numbers and the check happens a lot, often many millions of times. The `if x is True or x is False` is probably about as fast is this going to get. There are other limiting factors which become more important with it in place. – Charlie Clark Feb 24 '15 at 13:55
  • @CharlieClark If there are more `False` than `True` you could reverse the `if` to check `False` first to make it faster, although by what you are saying it will be probably negligible. – user Feb 24 '15 at 14:36
  • Yes, what I really want is to be able to run isinstance(value, NUMERIC_TYPES) (or equivalent) and to have this return False for bools. The proposed solution does shave some time (about 2 %) from the total which basically means the bottleneck is definitely elsewhere. `isinstance()` generally runs fast enough not to warrant clever optimising. – Charlie Clark Feb 24 '15 at 14:54
  • @CharlieClark You could use `if x is not True and x is not False` then. It runs about 6 times faster than isinstance() on my pc. Also, i added a corner case which might apply in your data, in the bottom of my answer. – user Feb 25 '15 at 11:01
  • @user-5061 the boolean short-circuit is, indeed, the fastest approach though your suggestion is logically the same as, though more verbose to the original suggestion. It's a nice trick, indeed and `if` statement is about the fastest thing in Python. Ideally `bool` would be not to be a number in Python, though there are good reasons for this. I guess the optimum would be a dispatch-based approach but I can't see how that would work without an additional function call. Even then it's important to avoid the temptation of micro-optimisations. – Charlie Clark Feb 25 '15 at 12:14

2 Answers2

5

So, Padraic Cunningham suggests, that the following might be a bit faster. My own quick experiments with cProfile-ing it haven't shown any difference:

isbool = value is True or value is False

I assume that's as fast as you can get: Two non-type-coercing comparisons.

Edit: I replayed the timing tests from @user 5061 and added my statement. This is my result:

>>> import timeit
>>> stmt1 = "isinstance(123, bool)"
>>> stmt2 = "123 is True or 123 is False"
>>> t1 = timeit.timeit(stmt1)
>>> t2 = timeit.timeit(stmt2)
>>> print t1
0.172112941742
>>> print t2
0.0690350532532

Edit 2: Note, that I'm using Python 2.7 here. @user 5061 might use Python 3 (telling from the print() function), so any solution provided here should be tested by OP before putting in production, for YMMV.

Boldewyn
  • 81,211
  • 44
  • 156
  • 212
3

Testing done using Python 3.4.

stmt5 was suggested by grc. stmt3 was suggested by boldewyn and seems to be the fastest option in most cases (unless data consists mostly of ints):

import timeit

setup = "a = 123; b = True"

stmt1 = "isinstance(a, bool) ; isinstance(b, bool)"
stmt2 = "isinstance(a, int) ; isinstance(b, int)"

stmt3 = "a is True or a is False; b is True or b is False"

stmt4 = "type(a) is bool; type(b) is bool"  
stmt5 = "a.__class__ is bool ; b.__class__ is bool"


repetitions = 10**6
t1 = timeit.timeit(stmt1, setup=setup, number=repetitions)
t2 = timeit.timeit(stmt2, setup=setup, number=repetitions)
t3 = timeit.timeit(stmt3, setup=setup, number=repetitions)
t4 = timeit.timeit(stmt4, setup=setup, number=repetitions)
t5 = timeit.timeit(stmt5, setup=setup, number=repetitions)


print(t1)
print(t2)
print(t3)
print(t4)
print(t5)

Results:

0.251072
0.190989
0.037483
0.140759
0.08480

Note that isinstance(123, bool) is slower than isinstance(123, int). Therefore i had to use both a and b. This is of course assuming that you have an equal amount of ints and bools.

Also, as grc suggested in the comments "True is faster because it short-circuits after the first comparison", so if you use b = False you ll get a slightly slower time for stmt3.


Only usable if the data does not contain 0, 0.0, 1, 1.0:

setup = "a = 123; b = True; s = {True, False}"

stmt3 = "a is True or a is False; b is True or b is False"
stmt6 = "a in s ; b in s"

Result:

0.037680588
0.03936778

If your data consists mostly of integers, this becomes the fastest option (0.045375 vs 0.0390963).

user
  • 5,370
  • 8
  • 47
  • 75
  • Try `stmt3 = "123 is True or 123 is False=` (basically my answer below). It is yet faster than `stmt2`, because it doesn't create a new list `(True, False)` on each comparison. – Boldewyn Feb 24 '15 at 10:41
  • True, it is faster, even though the difference is tiny. I ll edit my post accordingly including other tests soon. – user Feb 24 '15 at 10:43
  • Cool! If you also have other ideas how to distinguish bools from integers, I'm curious to see how those will do. – Boldewyn Feb 24 '15 at 10:45
  • 5
    You cannot use in, try `1 in (True, False)` – Padraic Cunningham Feb 24 '15 at 10:48
  • 1
    Indeed! My Python 2.7 says: `1 in (True, False) is True` This is surprising. :( – Boldewyn Feb 24 '15 at 10:50
  • @Boldewyn Perhaps both our answers are flawed. Trying to check if True is bool yields different times. Much different. – user Feb 24 '15 at 11:06
  • @user5061 `True` is faster because it short-circuits after the first comparison. – grc Feb 24 '15 at 11:14
  • @grc You mean in the `a is True`? Good point, i ll edit accordingly. – user Feb 24 '15 at 11:25
  • @PadraicCunningham Wow! `123 in (True, False)` works fine, would only `1`s and `0`s give True when checked like this? When i first checked what you suggested, i got the impression that any non 0, non empty etc would be considered true with this check. – user Feb 25 '15 at 07:52
  • Because `True == 1` and `False == 0`http://stackoverflow.com/questions/28686263/why-does-it-work/28686286#28686286 – Padraic Cunningham Feb 25 '15 at 09:08
  • @Boldewyn You forgot to edit your post and note that 0 and 1 will give false positives in stmt2. – user Feb 25 '15 at 10:26
  • @user5061 thanks! I removed the statement completely, since it doesn't work at all for this purpose. The `a.__class_ is bool` was a neat idea to add to the tests, by the way! – Boldewyn Feb 25 '15 at 10:51
  • `a.__class__` will be slow because it's a namespace lookup. – Charlie Clark Feb 25 '15 at 12:04