0

I have a log file as follows:

error 1020  
warning 3000  
this is an error and warning  

I am attempting to filter out the lines that have the words error or warning in it.

In the first code, I had a bracket for the or condition and in the second code I removed the bracket.

Can you please let me know why the removal of the bracket gives the right result?

>>> betterRDD = inputRDD.filter(lambda x: ("error" or "warning") in x)
>>> col4 = betterRDD.collect()
[Stage 3:>                                                          (0 + 2) / 2]

>>> print "The better result is %s" %col4
The better result is [u'error 1020', u'this is an error and warning']
>>> betterRDD = inputRDD.filter(lambda x: "error" or "warning" in x)
>>> col4 = betterRDD.collect()
[Stage 4:>                                                          (0 + 2) / 2]

>>> print "%s" %col4
[u'error 1020', u'warning 3000', u'this is an error and warning']
>>>
pault
  • 41,343
  • 15
  • 107
  • 149
Scala-la
  • 21
  • 3
  • It is coincidentally returning the correct answer. Your syntax does not do what you think it does. `"error" or "warning" in x` is evaluating the "Truthiness" of `"error"` (because of short circuiting) which is `True`. See [this post](https://stackoverflow.com/questions/15112125/how-to-test-multiple-variables-against-a-value) and [this answer](https://stackoverflow.com/a/14892812/5858851) – pault Sep 07 '18 at 18:18
  • The example log file provided is not very robust because the `filter` will return all lines. The fact that you are getting the desired result is because your code is logically equivalent to `inputRDD.filter(lambda x: True)`. The correct way to do this would be `betterRDD = inputRDD.filter(lambda x: "error" in x or "warning" in x)` – pault Sep 07 '18 at 18:24
  • Possible duplicate of [How to test multiple variables against a value?](https://stackoverflow.com/questions/15112125/how-to-test-multiple-variables-against-a-value) – pault Sep 07 '18 at 18:24
  • For completeness, the first code doesn't work because `("error" or "warning") in x` evaluates the condition in the parentheses first so it turns into `(True in x)` which evaluates to `False` – pault Sep 07 '18 at 18:34

0 Answers0