1

I have a list of JSON responses that I want to parse before putting it into a DataFrame.

In my list of 15,000 responses I want to remove those that do not have a certain key in it.

What I have so far seems to be playing funny business with the looping after I delete an element and I'm not sure why.

If I run the below - it correctly finds 3 matches of the 15k that should be deleted.

Deleted! : 2591
Deleted! : 12306
Deleted! : 12307

-

try:
    for i in range(len(trans)):
        #print("checking for deletion: "+ str(i))
        if 'CashBooks' not in trans[i]:
            #del trans[i]
            print("Deleted! : " + str(i))
except Exception as e:
    print(str(e))
    print('passed')
    pass

However when I un-comment the del I get errors like so:

Deleted! : 2591
Deleted! : 12305
list index out of range
passed

The list is quite large so it's hard to post sample data but hopefully someone can easily spot where I'm going wrong.

Thanks for your time.

swifty
  • 1,182
  • 1
  • 15
  • 36
  • you shouldn't modify the list while iterating through – iBug Jan 17 '19 at 03:24
  • you are modifying the length of the list during the iteration, hence the lenght of the list getting shorter but you are still iterating over the original lenght of the list – sjdm Jan 17 '19 at 03:25

1 Answers1

3

You can use filter this would be faster and you won't be editing the data while looping through it

def check_not_in(value):
    return 'Cashbooks' not in value

data = filter(check_not_in, trans)

#this is only to show what ones were deleted
def check_in(value):
    return 'Cashbooks' in value

deleted = filter(check_in, trans)
for _ in deleted: print("Deleted: {}".format(_))
Jab
  • 26,853
  • 21
  • 75
  • 114
  • 1
    `data = (item for item in trans if 'CashBooks' not in item)` would work if you're looking for a one liner, since that's exactly what filter does, but I like this answer. – Tomas Farias Jan 17 '19 at 03:37
  • 1
    Filter is just cleaner to write in this instance, plus with this now you have a check function written. *even though you could with that as well* – Jab Jan 17 '19 at 03:39
  • Also, @TomasFarias according to [this](https://stackoverflow.com/a/52634167/225020) post, filter can be faster if you're using Python 3.x. And the benefit can be of use if his data is large. Then again if you read down, someone says otherwise. In the end I just prefer this style more. – Jab Jan 17 '19 at 03:42
  • 1
    That post doesn't apply in this case. When the function parameter is `None`, `filter` actually runs as `(item for item in iterable if item)`, omitting the function call that we are doing in this case. Try running the tests with a function that does the same thing, like: `def filter_func(n): return n`, I did and performance was worse than list comp. Also, just to clarify, I DO like the answer, thought the one liner added something of value if OP wanted to save a few lines, and if you're going to re use the function filter is the way to go, like your previous comment mentions it. – Tomas Farias Jan 17 '19 at 04:07
  • 1
    Ahh, gotcha! Thanks for the insight – Jab Jan 17 '19 at 04:14