I'm attempting to write a recursive generator function to flatten a nested json object of mixed types, lists and dictionaries. I am doing this partly for my own learning so have avoided grabbing an example from the internet to ensure I better understand what's happening, but have got stuck, with what I think is the correct placement of the yield statement in the function in relation to the loop.
The source of the data passed to the generator function is the output of an outer loop which is iterating through a mongo collection.
When I used a print statement in the same place as the Yield statement I get the results I am expecting but when I switch that to a yield statement the generator seems to only yield one item per iteration of the outer loop.
Hopefully someone can show me where I am going wrong.
columns = ['_id'
, 'name'
, 'personId'
, 'status'
, 'explorerProgress'
, 'isSelectedForReview'
]
db = MongoClient().abcDatabase
coll = db.abcCollection
def dic_recurse(data, fields, counter, source_field):
counter += 1
if isinstance(data, dict):
for k, v in data.items():
if k in fields and isinstance(v, list) is False and isinstance(v, dict) is False:
# print "{0}{1}".format(source_field, k)[1:], v
yield "{0}{1}".format(source_field, k)[1:], v
elif isinstance(v, list):
source_field += "_{0}".format(k)
[dic_recurse(l, fields, counter, source_field) for l in data.get(k)]
elif isinstance(v, dict):
source_field += "_{0}".format(k)
dic_recurse(v, fields, counter, source_field)
elif isinstance(data, list):
[dic_recurse(l, fields, counter, '') for l in data]
for item in coll.find():
for d in dic_recurse(item, columns, 0, ''):
print d
And below is a sample of the data it's iterating, but the nesting does increase beyond what's shown.
{
"_id" : ObjectId("5478464ee4b0a44213e36eb0"),
"consultationId" : "54784388e4b0a44213e36d5f",
"modules" : [
{
"_id" : "FF",
"name" : "Foundations",
"strategyHeaders" : [
{
"_id" : "FF_Money",
"description" : "Let's see where you're spending your money.",
"name" : "Managing money day to day",
"statuses" : [
{
"pid" : "54784388e4b0a44213e36d5d",
"status" : "selected",
"whenUpdated" : NumberLong(1425017616062)
},
{
"pid" : "54783da8e4b09cf5d82d4e11",
"status" : "selected",
"whenUpdated" : NumberLong(1425017616062)
}
],
"strategies" : [
{
"_id" : "FF_Money_CF",
"description" : "This option helps you get a picture of how much you're spending",
"name" : "Your spending and savings.",
"relatedGoals" : [
{
"_id" : ObjectId("54784581e4b0a44213e36e2f")
},
{
"_id" : ObjectId("5478458ee4b0a44213e36e33")
},
{
"_id" : ObjectId("547845a5e4b0a44213e36e37")
},
{
"_id" : ObjectId("54784577e4b0a44213e36e2b")
},
{
"_id" : ObjectId("5478456ee4b0a44213e36e27")
}
],
"soaTrashWarning" : "Understanding what you are spending and saving is crucial to helping you achieve your goals. Without this in place, you may be spending more than you can afford. ",
"statuses" : [
{
"personId" : "54784388e4b0a44213e36d5d",
"status" : "selected",
"whenUpdated" : NumberLong(1425017616062)
},
{
"personId" : "54783da8e4b09cf5d82d4e11",
"status" : "selected",
"whenUpdated" : NumberLong(1425017616062)
}
],
"trashWarning" : "This option helps you get a picture of how much you're spending and how much you could save.\nAre you sure you don't want to take up this option now?\n\n",
"weight" : NumberInt(1)
},
Update I've made a few changes to the generator function, although I'm not sure that they've really changed anything and I've been stepping through line by line in a debugger for both the print version and the yield version. The new code is below.
def dic_recurse(data, fields, counter, source_field):
print 'Called'
if isinstance(data, dict):
for k, v in data.items():
if isinstance(v, list):
source_field += "_{0}".format(k)
[dic_recurse(l, fields, counter, source_field) for l in v]
elif isinstance(v, dict):
source_field += "_{0}".format(k)
dic_recurse(v, fields, counter, source_field)
elif k in fields and isinstance(v, list) is False and isinstance(v, dict) is False:
counter += 1
yield "L{0}_{1}_{2}".format(counter, source_field, k.replace('_', ''))[1:], v
elif isinstance(data, list):
for l in data:
dic_recurse(l, fields, counter, '')
The key difference between the two versions when debugging seems to be that when this section of code is hit.
elif isinstance(data, list):
for l in data:
dic_recurse(l, fields, counter, '')
If I am testing the yield version the call to dic_recurse(l, fields, counter, '')
line get's hit but it doesn't seem to call the function because any print statements I set at the opening of the function aren't hit, but if I do the same using print then when the code hits the same section it happily calls the function and runs back through the whole function.
I'm sure I'm probably misunderstanding something fundamental about generators and the use of the yield statement.