1

I have a deeply nested data presented as dictionary with mix of lists & dicts (e.g. Dict -> List of dicts -> Dict -> List of dicts).

I'd like to parse it's values with glom (hopefully to something pandas-dataframeable), but struggling with mixing dicts & lists in glom's spec.

I can access each individual value by chaining dict keys & indexing lists like this:

dict['TradePlaceList'][0]['TradeList'][0]['Trade']['Message']['soap:Envelope']['soap:Body']['SetBiddingProcessInfo']

but need help in expressing the same logic in glom's spec.

Data example (reduced due to size):

{'TradePlaceList': [OrderedDict([('@INN', '6164265896'),
               ('TradeList',
                [OrderedDict([('Trade',
                               OrderedDict([('@ID_EFRSB', '1201661'),
                                            ('@ID_EXTERNAL', 'ПП-49739'),
                                            ('Message',
                                             OrderedDict([('@ID', '10958517'),
                                                          ('soap:Envelope',
                                                           OrderedDict([('@xmlns:xsi',
                                                                         'http://www.w3.org/2001/XMLSchema-instance'),
                                                                        ('@xmlns:xsd',
                                                                         'http://www.w3.org/2001/XMLSchema'),
                                                                        ('@xmlns:soap',
                                                                         'http://schemas.xmlsoap.org/soap/envelope/'),
                                                                        ('soap:Body',
                                                                         OrderedDict([('SetBiddingProcessInfo',
                                                                                       OrderedDict([('@xmlns',
                                                                                                     'https://services.fedresurs.ru/BiddingService2'),
                                                                                                    ('BiddingProcessInfo',
                                                                                                     OrderedDict([('@TradeId',
                                                                                                                   'ПП-49739'),
                                                                                                                  ('@EventTime',
                                                                                                                   '2021-05-03T00:00:00'),
                                                                                                                  ('PriceInfo',
                                                                                                                   OrderedDict([('@LotNumber',
                                                                                                                                 '1'),
                                                                                                                                ('@NewPrice',
                                                                                                                                 '3049997.96')]))]))]))]))]))]))]))]),                         
Krank
  • 141
  • 1
  • 8

1 Answers1

2

The following glom spec works on the example you posted:

import pandas as pd
import glom
from collections import OrderedDict

data = {'TradePlaceList': [OrderedDict([('@INN', '6164265896'),
               ('TradeList',
                [OrderedDict([('Trade',
                               OrderedDict([('@ID_EFRSB', '1201661'),
                                            ('@ID_EXTERNAL', 'ПП-49739'),
                                            ('Message',
                                             OrderedDict([('@ID', '10958517'),
                                                          ('soap:Envelope',
                                                           OrderedDict([('@xmlns:xsi',
                                                                         'http://www.w3.org/2001/XMLSchema-instance'),
                                                                        ('@xmlns:xsd',
                                                                         'http://www.w3.org/2001/XMLSchema'),
                                                                        ('@xmlns:soap',
                                                                         'http://schemas.xmlsoap.org/soap/envelope/'),
                                                                        ('soap:Body',
                                                                         OrderedDict([('SetBiddingProcessInfo',
                                                                                       OrderedDict([('@xmlns',
                                                                                                     'https://services.fedresurs.ru/BiddingService2'),
                                                                                                    ('BiddingProcessInfo',
                                                                                                     OrderedDict([('@TradeId',
                                                                                                                   'ПП-49739'),
                                                                                                                  ('@EventTime',
                                                                                                                   '2021-05-03T00:00:00'),
                                                                                                                  ('PriceInfo',
                                                                                                                   OrderedDict([('@LotNumber',
                                                                                                                                 '1'),
                                                                                                                                ('@NewPrice',
                                                                                                                                 '3049997.96')]))]))]))]))]))]))]))])])])]}

print(glom.glom(data, ('TradePlaceList', ['TradeList', ['Trade.Message.soap:Envelope.soap:Body.SetBiddingProcessInfo']])))

This is using the path you posted, plus iterating over lists using [] in glom.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
  • Is it possible to create a spec which gets all required values in the lists presented? Like analogue of looping through the nested lists from 0 to len(list)? – Krank Jan 09 '22 at 07:08
  • 1
    @Krank Yeah, I just added an example of how to do that. – Nick ODell Jan 09 '22 at 17:56