0

I want to find all instances of str in any nested data structure of dicts and lists. Not all terminal items will be str.

A data example:

data = {'iso_seq_bams': [['5/X/tmp2oWhu5.tmp', 'y/H/tmp6Po0_X.tmp']], 
        'annotation': None, 
        'bams': {'BAM': {'ERR579132Aligned.sortedByCoord.out.bam': ['Y/o/tmpntzREn.tmp', 'z/c/tmp6DmQhS.tmp']}, 
                 'INTRONBAM': {}}}

And the result expected would thus be ['5/X/tmp2oWhu5.tmp', 'y/H/tmp6Po0_X.tmp', 'Y/o/tmpntzREn.tmp', 'z/c/tmp6DmQhS.tmp']

I have attempted to implement this in a recursive fashion, but it doesn't seem to work. The result is an empty list, currently.

def descend_object(obj):
    if isinstance(obj, dict):
        for item in obj.values():
            descend_object(item)
    elif isinstance(obj, list):
        for item in obj:
            descend_object(item)
    elif isinstance(obj, str):
        yield obj
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
Ian Fiddes
  • 2,821
  • 5
  • 29
  • 49

1 Answers1

3

If on Python 3, use yield from

In [3]: def descend_object(obj):
   ...:     if isinstance(obj, dict):
   ...:         for item in obj.values():
   ...:             yield from descend_object(item)
   ...:     elif isinstance(obj, list):
   ...:         for item in obj:
   ...:             yield from descend_object(item)
   ...:     elif isinstance(obj, str):
   ...:         yield obj
   ...:

In [4]: list(descend_object(data))
Out[4]:
['Y/o/tmpntzREn.tmp',
 'z/c/tmp6DmQhS.tmp',
 '5/X/tmp2oWhu5.tmp',
 'y/H/tmp6Po0_X.tmp']

If on Python 2, you have to manually iterate over the recursive call:

In [6]: def descend_object(obj):
   ...:     if isinstance(obj, dict):
   ...:         for item in obj.values():
   ...:             for d in descend_object(item):
   ...:                 yield d
   ...:     elif isinstance(obj, list):
   ...:         for item in obj:
   ...:             for d in descend_object(item):
   ...:                 yield d
   ...:     elif isinstance(obj, str):
   ...:         yield obj
   ...:

In [7]: list(descend_object(data))
Out[7]:
['Y/o/tmpntzREn.tmp',
 'z/c/tmp6DmQhS.tmp',
 '5/X/tmp2oWhu5.tmp',
 'y/H/tmp6Po0_X.tmp']
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172