0

Let's imagine you are storing a list of items in the Google Cloud DataStore. Each item has a list of sub-items (list of features) and sub-sub-items (list of infos each with a list of attribute-key).

[
    {
        "key": "product_1234",
        "name": "Bio-Tofu",
        "features" : {
            "country": "Germany",
            "currency": "Euro"
        },
        "infos": {
            "info_A": {
                "attr_A": "key_A",
                "attr_B": "key_B",
                "attr_C": "key_C",
            },
            "info_B": {
                "attr_D": "key_D",
            }
        }
    },
    {
        "key": "product_6789",
        "name": "Bio-Soya-Dring",
        "features" : {
            "country": "Austria",
            "currency": "Euro"
        },
        "infos": {
            "info_A": {
                "attr_A": "key_A",
                "attr_E": "key_E",
            },
            "info_C": {
                "attr_F": "key_F",
            }
        }
    }
]

Question: Is there any way to retrieve all 'infos' or all 'attribute-key' pairs of all items (products) at once? What I want to retrieve is for instance:

{
    "attr_A": "key_A",
    "attr_B": "key_B",
    "attr_C": "key_C",
    "attr_D": "key_D",
    "attr_E": "key_E",
    "attr_F": "key_F",
}

or

infos: [
    "info_A",
    "info_B",
    "info_C"
]

The reason why I'm trying to do this is, that I want to create a Pandas DataFrame (Pivot-Table) having the infos or features as columns and the items as rows. I could just read all of the items and create the list of features and infos in Python code, but maybe there's SQL way to ask Google Cloud Storage to give me that list (subset, union, or whatever you name it)?

Thanks for any hint.

Matthias
  • 5,574
  • 8
  • 61
  • 121
  • I'd split these massive entities into sets of smaller ones, without such nesting and yes, it'll be much easier to obtain results like the one you seek. See https://stackoverflow.com/questions/49239548/manage-nested-list-of-entities-within-entities-in-google-cloud-datastore/49261568#49261568 – Dan Cornilescu Mar 28 '18 at 05:38
  • Yeah, might be a solution. I’ll try that. Thanks. But what is the query to return all kinds of them (distinct)? – Matthias Mar 28 '18 at 06:57
  • That could be a projection query, which supports a `distinct` option, see https://stackoverflow.com/questions/44865227/google-app-engine-how-could-i-do-python-ndb-query-on-results-returned-by-a-ndb/44867305#44867305 – Dan Cornilescu Mar 28 '18 at 07:08
  • You misunderstood my question. I have features and those features belong to items. And I want to retrieve all possible features. In your example I would need to store all features and items separately and then create a third Entity that indicates which feature is used by which item (relationship). Not best for big data, but would work. Hard to maintain since you need to remove all relations/features for an item and re-insert then if the number of features used by an item changes. In my original data structure you would simply update the item with its nested features at once. – Matthias Mar 28 '18 at 07:14
  • The structure of the data is critical for good/efficient datastore use. Just because one such structure is simpler to update doesn't mean it's simpler to use overall - you need to think about retrieval as well. Only when you have the end-to-end picture in place you can really weigh it. – Dan Cornilescu Mar 29 '18 at 03:54

1 Answers1

0

You could do something like this:

[[pair for key,pair in element['infos'].items()] for index, element in enumerate(elements)]

And after that, you will need to flatten the list.

Mangu
  • 3,160
  • 2
  • 25
  • 42
  • This is Python Code that retrieved all items and iterates through them. I was thinking of a GQL Version (Google Query Language). – Matthias Mar 29 '18 at 13:37