0

Below is the sample data -

({'age': 61,
  'name': ['Emiko', 'Oliver'],
  'occupation': 'Medical Student',
  'telephone': '166.814.5565',
  'address': {'address': '645 Drumm Line', 'city': 'Kennewick'},
  'credit-card': {'number': '3792 459318 98518', 'expiration-date': '12/23'}},
 {'age': 54,
  'name': ['Wendolyn', 'Ortega'],
  'occupation': 'Tractor Driver',
  'telephone': '1-975-090-1672',
  'address': {'address': '1274 Harbor Court', 'city': 'Mustang'},
  'credit-card': {'number': '4600 5899 6829 6887',
   'expiration-date': '11/25'}})

We can apply filter on the dask bag root elemnets as below. b.filter(lambda record: record['age'] > 30).take(2) # Select only people over 30

However I need to access the nested element i.e credit-card.expiration-date Any help will be appriciated.

gauravpks
  • 15
  • 2

1 Answers1

0

You can simply do this:

import dask.bag as db

data = ({'age': 61,
         'name': ['Emiko', 'Oliver'],
         'occupation': 'Medical Student',
         'telephone': '166.814.5565',
         'address': {'address': '645 Drumm Line', 'city': 'Kennewick'},
         'credit-card': {'number': '3792 459318 98518', 'expiration-date': '12/23'}},
        {'age': 54,
         'name': ['Wendolyn', 'Ortega'],
         'occupation': 'Tractor Driver',
         'telephone': '1-975-090-1672',
         'address': {'address': '1274 Harbor Court', 'city': 'Mustang'},
         'credit-card': {'number': '4600 5899 6829 6887',
                         'expiration-date': '11/25'}})

bag = db.from_sequence(data)

result = bag.map(lambda record: record['credit-card']['expiration-date']).compute()

print(result)

which returns

['12/23', '11/25']

In those cases where you have several cards per individual, do this:

import dask.bag as db

data = ({
            'age': 61,
            'name': ['Emiko', 'Oliver'],
            'occupation': 'Medical Student',
            'telephone': '166.814.5565',
            'address': {'address': '645 Drumm Line', 'city': 'Kennewick'},
            'credit-card': {'number': '3792 459318 98518', 'expiration-date': '12/23'}
        },
        {
            'age': 54,
            'name': ['Wendolyn', 'Ortega'],
            'occupation': 'Tractor Driver',
            'telephone': '1-975-090-1672',
            'address': {'address': '1274 Harbor Court', 'city': 'Mustang'},
            'credit-card': [
                {'number': '4600 5899 6829 6887', 'expiration-date': '11/25'},
                {'number': '4610 5899 6829 6887', 'expiration-date': '11/26'},
            ]
        })

bag = db.from_sequence(data)

result = bag.map(lambda record: record['credit-card']['expiration-date'] 
                  if isinstance(record['credit-card'], dict) 
                  else [card['expiration-date'] for card in record['credit-card']]).compute()

print(result)

which will return

['12/23', ['11/25', '11/26']]
  • Thanks Serge for the answer. Just wanted to understand how to pull if the data in in array like below and we need to pull both of them data = ( {'age': 54, 'name': ['Wendolyn', 'Ortega'], 'occupation': 'Tractor Driver', 'telephone': '1-975-090-1672', 'address': {'address': '1274 Harbor Court', 'city': 'Mustang'}, 'credit-card': [{'number': '4600 5899 6829 6887', 'expiration-date': '11/25'}, {'number': '4610 5899 6829 6887', 'expiration-date': '11/26'}, ]}) – gauravpks Mar 17 '23 at 11:40
  • @gauravpks I updated my answer to deal with several cards per individual. – Serge de Gosson de Varennes Mar 17 '23 at 13:34