0

I'm attempting to write a recursive generator function to flatten a nested json object of mixed types, lists and dictionaries. I am doing this partly for my own learning so have avoided grabbing an example from the internet to ensure I better understand what's happening, but have got stuck, with what I think is the correct placement of the yield statement in the function in relation to the loop.

The source of the data passed to the generator function is the output of an outer loop which is iterating through a mongo collection.

When I used a print statement in the same place as the Yield statement I get the results I am expecting but when I switch that to a yield statement the generator seems to only yield one item per iteration of the outer loop.

Hopefully someone can show me where I am going wrong.

columns = ['_id'
    , 'name'
    , 'personId'
    , 'status'
    , 'explorerProgress'
    , 'isSelectedForReview'
           ]
db = MongoClient().abcDatabase

coll = db.abcCollection


def dic_recurse(data, fields, counter, source_field):
    counter += 1
    if isinstance(data, dict):
        for k, v in data.items():
            if k in fields and isinstance(v, list) is False and isinstance(v, dict) is False:
                # print "{0}{1}".format(source_field, k)[1:], v
                yield "{0}{1}".format(source_field, k)[1:], v
            elif isinstance(v, list):
                source_field += "_{0}".format(k)
                [dic_recurse(l, fields, counter, source_field) for l in data.get(k)]
            elif isinstance(v, dict):
                source_field += "_{0}".format(k)
                dic_recurse(v, fields, counter, source_field)
    elif isinstance(data, list):
        [dic_recurse(l, fields, counter, '') for l in data]


for item in coll.find():
    for d in dic_recurse(item, columns, 0, ''):
        print d

And below is a sample of the data it's iterating, but the nesting does increase beyond what's shown.

{ 
    "_id" : ObjectId("5478464ee4b0a44213e36eb0"), 
    "consultationId" : "54784388e4b0a44213e36d5f", 
    "modules" : [
        {
            "_id" : "FF", 
            "name" : "Foundations", 
            "strategyHeaders" : [
                {
                    "_id" : "FF_Money", 
                    "description" : "Let's see where you're spending your money.", 
                    "name" : "Managing money day to day", 
                    "statuses" : [
                        {
                            "pid" : "54784388e4b0a44213e36d5d", 
                            "status" : "selected", 
                            "whenUpdated" : NumberLong(1425017616062)
                        }, 
                        {
                            "pid" : "54783da8e4b09cf5d82d4e11", 
                            "status" : "selected", 
                            "whenUpdated" : NumberLong(1425017616062)
                        }
                    ], 
                    "strategies" : [
                        {
                            "_id" : "FF_Money_CF", 
                            "description" : "This option helps you get a picture of how much you're spending", 
                            "name" : "Your spending and savings.", 
                            "relatedGoals" : [
                                {
                                    "_id" : ObjectId("54784581e4b0a44213e36e2f")
                                }, 
                                {
                                    "_id" : ObjectId("5478458ee4b0a44213e36e33")
                                }, 
                                {
                                    "_id" : ObjectId("547845a5e4b0a44213e36e37")
                                }, 
                                {
                                    "_id" : ObjectId("54784577e4b0a44213e36e2b")
                                }, 
                                {
                                    "_id" : ObjectId("5478456ee4b0a44213e36e27")
                                }
                            ], 
                            "soaTrashWarning" : "Understanding what you are spending and saving is crucial to helping you achieve your goals. Without this in place, you may be spending more than you can afford. ", 
                            "statuses" : [
                                {
                                    "personId" : "54784388e4b0a44213e36d5d", 
                                    "status" : "selected", 
                                    "whenUpdated" : NumberLong(1425017616062)
                                }, 
                                {
                                    "personId" : "54783da8e4b09cf5d82d4e11", 
                                    "status" : "selected", 
                                    "whenUpdated" : NumberLong(1425017616062)
                                }
                            ], 
                            "trashWarning" : "This option helps you get a picture of how much you're spending and how much you could save.\nAre you sure you don't want to take up this option now?\n\n", 
                            "weight" : NumberInt(1)
                        }, 

Update I've made a few changes to the generator function, although I'm not sure that they've really changed anything and I've been stepping through line by line in a debugger for both the print version and the yield version. The new code is below.

def dic_recurse(data, fields, counter, source_field):
    print 'Called'
    if isinstance(data, dict):
        for k, v in data.items():
            if isinstance(v, list):
                source_field += "_{0}".format(k)
                [dic_recurse(l, fields, counter, source_field) for l in v]
            elif isinstance(v, dict):
                source_field += "_{0}".format(k)
                dic_recurse(v, fields, counter, source_field)
            elif k in fields and isinstance(v, list) is False and isinstance(v, dict) is False:
                counter += 1
                yield "L{0}_{1}_{2}".format(counter, source_field, k.replace('_', ''))[1:], v
    elif isinstance(data, list):
        for l in data:
            dic_recurse(l, fields, counter, '')

The key difference between the two versions when debugging seems to be that when this section of code is hit.

elif isinstance(data, list):
            for l in data:
                dic_recurse(l, fields, counter, '')

If I am testing the yield version the call to dic_recurse(l, fields, counter, '') line get's hit but it doesn't seem to call the function because any print statements I set at the opening of the function aren't hit, but if I do the same using print then when the code hits the same section it happily calls the function and runs back through the whole function.

I'm sure I'm probably misunderstanding something fundamental about generators and the use of the yield statement.

Simon Tulett
  • 149
  • 1
  • 9

1 Answers1

0

In lieu of any response on this I just wanted to post my updated solution in case it proves useful for anyone else.

I need to add additional yield statements to the function so the result of each recursive call of the generator function can be handed off to be used by the next, at least that's how I've understood it. Happy to be corrected.

def dic_recurse(data, fields, counter, source_field):
    if isinstance(data, dict):
        counter += 1
        for k, v in data.items():
            if isinstance(v, list):
                for field_data in v:
                    for list_field in dic_recurse(field_data, fields, counter, source_field):
                        yield list_field
            elif isinstance(v, dict):
                for dic_field in dic_recurse(v, fields, counter, source_field):
                    yield dic_field
            elif k in fields and isinstance(v, list) is False and isinstance(v, dict) is False:
                yield counter, {"{0}_L{1}".format(k, counter): v}
    elif isinstance(data, list):
        counter += 1
        for list_item in data:
            for li2 in dic_recurse(list_item, fields, counter, ''):
                yield li2
Simon Tulett
  • 149
  • 1
  • 9