0

I'm familiar with comparing 2 lists of integers and string; however, when comparing 2 lists of strings with extra characters involved can be a little challenging.

Assume the output contains the following where I break it into a list of string. I called it diff in my code.

Output

164c164
< Apples = 
---
> Apples = 0
168c168
< Berries = 
---
> Berries = false
218c218
< Cherries = 
---
> Cherries = 20
223c223
< Bananas = 
---
> Bananas = 10
233,234c233,234
< Lemons = 2
< Strawberries = 4
---
> Lemons = 4
> Strawberries = 2
264c264
< Watermelons = 
---
> Watermelons = 524288

The second set of string contains the ignore variable where I wanted the first list to be compared against.

>>> ignore
['Apples', 'Lemons']

My code:

>>> def str_compare (ignore, output):
...     flag = 0
...     diff = output.strip ().split ('\n')
...     if ignore:
...         for line in diff:
...             for i in ignore:
...                 if i in line:
...                     flag = 1
...             if flag:
...                 flag = 0
...             else:
...                 print (line)
... 
>>>

The code works with Apple and Lemons omitted.

>>> str_compare(ignore, output)
164c164
---
168c168
< Berries = 
---
> Berries = false
218c218
< Cherries = 
---
> Cherries = 20
223c223
< Bananas = 
---
> Bananas = 10
233,234c233,234
< Strawberries = 4
---
> Strawberries = 2
264c264
< Watermelons = 
---
> Watermelons = 524288
>>>

There must be a better way to compare 2 strings that it's not O(n^2). Had my diff list doesn't contain extra characters like "Apples =" then comparing the two lists can be achieved with O(n). Any suggestions or ideas to compare without looping through the "ignore" variable on every diff element?

Update #1 To avoid confusion and using the suggested comment, I've updated the code.

>>> def str_compare (ignore, output):
...     diff = output.strip ().split ('\n')
...     if ignore:
...         for line in diff:
...             if not any ([i in line for i in ignore]):
...                 print (line)
...                 print ("---")
>>>

Regardless, it still loop through ignore twice for every diff element.

dreamzboy
  • 795
  • 15
  • 31
  • I'm confused, why not just use `if not any([i in line for i in ignore]): print(line)` instead of using `flag` – Rocky Li Nov 02 '18 at 18:56
  • what is n. useSET OR DICT for speed – Serge Nov 02 '18 at 18:58
  • @RockyLi, doing so you'll have everything printed twice since it loops through the ignore list twice. – dreamzboy Nov 02 '18 at 18:59
  • No it does not. replace everything under `for line in diff:` with that snippet and it will just print once. Granted, this doesn't answer your question because it's still O(n^2), but if that's your worry you can use a `set`, because `set` operations are completed in O(1) time. – Rocky Li Nov 02 '18 at 19:01
  • @RockyLi, my comment was before you edited your comment to if any. If using if any, then there's no need for flag but it's still 2 nested for loop. – dreamzboy Nov 02 '18 at 19:03
  • check the answer for one loop. set are faster due efficient hashing – Serge Nov 02 '18 at 19:18

1 Answers1

0

for efficiency use ignore sets not list. Use split to get the key word fromline.

>>> def str_compare (ignore, output):
...     ignore = set (ignore)
...     diff = output.strip ().split ('\n')
...     for line in diff:
...         if line.startswith('<') or line.startswith('>'):
...             var = line.split () [1]
...             if var not in ignore:
...                 print (line)
...         else:
...             print (line)
... 

Output

>>> str_compare (ignore, output)
164c164
---
168c168
< Berries = 
---
> Berries = false
218c218
< Cherries = 
---
> Cherries = 20
223c223
< Bananas = 
---
> Bananas = 10
233,234c233,234
< Strawberries = 4
---
> Strawberries = 2
264c264
< Watermelons = 
---
> Watermelons = 524288

You can eliminate need for flag by splitting and joing over "---\n" (slightly more general solution than flags or typin ----)

Note that string inclusion s1 in s2 worst case should be about len(s1) * len(2), while equality about max(len(s1),len(s2). While python implementation is pretty decent (for average case), linear complexity algos seem to exist http://monge.univ-mlv.fr/~mac/Articles-PDF/CP-1991-jacm.pdf See also Algorithm to find multiple string matches

dreamzboy
  • 795
  • 15
  • 31
Serge
  • 3,387
  • 3
  • 16
  • 34