1

Currently I am working with the Shopify GraphQL Bulk Query.
This Query returns a JSON Lines file. Such a file may look like this:

{"id":"gid:\/\/shopify\/Product\/5860091625632","title":"Levis Jeans","description":"Cool Jeans","vendor":"Levis","status":"ACTIVE"}
{"id":"gid:\/\/shopify\/ProductImage\/20289865679008","__parentId":"gid:\/\/shopify\/Product\/5860091625632"}
{"id":"gid:\/\/shopify\/ProductVariant\/37178118963360","title":"32","position":1,"image":null,"selectedOptions":[{"name":"Size","value":"32"}],"inventoryItem":{},"__parentId":"gid:\/\/shopify\/Product\/5860091625632"}
{"available":10,"location":{"id":"gid:\/\/shopify\/Location\/57510625440"},"__parentId":"gid:\/\/shopify\/ProductVariant\/37178118963360"}
{"id":"gid:\/\/shopify\/ProductVariant\/37178118996128","title":"31","position":2,"image":null,"selectedOptions":[{"name":"Size","value":"31"}],"inventoryItem":{},"__parentId":"gid:\/\/shopify\/Product\/5860091625632"}
{"available":5,"location":{"id":"gid:\/\/shopify\/Location\/57510625440"},"__parentId":"gid:\/\/shopify\/ProductVariant\/37178118996128"}
{"available":3,"location":{"id":"gid:\/\/shopify\/Location\/57951518880"},"__parentId":"gid:\/\/shopify\/ProductVariant\/37178118996128"}
{"id":"gid:\/\/shopify\/ProductVariant\/37178119028896","title":"34","position":3,"image":null,"selectedOptions":[{"name":"Size","value":"34"}],"inventoryItem":{},"__parentId":"gid:\/\/shopify\/Product\/5860091625632"}
{"available":5,"location":{"id":"gid:\/\/shopify\/Location\/57510625440"},"__parentId":"gid:\/\/shopify\/ProductVariant\/37178119028896"}
{"available":15,"location":{"id":"gid:\/\/shopify\/Location\/57951518880"},"__parentId":"gid:\/\/shopify\/ProductVariant\/37178119028896"}

Each line of this file is a valid JSON-object and the lines are connected via __parentId with each other.
My Goal is to Deserialize this into C# Classes like this:

class Product
{
    public string Id { get; set; }
    public string Title { get; set; }
    public string Description { get; set; }
    public IEnumerable<ProductImage> Images { get; set; }
    public IEnumerable<ProductVariant> Variants { get; set; }
}
class ProductImage
{
    public string Id { get; set; }
}
class ProductVariant
{
    public string Id { get; set; }
    public IEnumerable<IDictionary<string, string>> SelectedOptions { get; set; }
    public IEnumerable<InventoryLevel> Levels { get; set; }
}
class InventoryLevel
{
    public int Available { get; set; }
} 

And the output of a potential function performing the deserialization:

var file = new System.IO.StreamReader(@"c:\test.jsonl");
var products = DeserializeJsonL<IEnumerable<Product>>(file);

Shopify suggests to read the file in reverse. I get the Idea. But I cannot imagine how to deserialize this file in a type safe way. How could I determine if the current line is a ProductVariant, a ProductImage or something else? I cannot influence the JSONL Output to include type information.

I am pretty sure without type information I cannot deserialize it safely. But how should I handle this data then to insert into a database for example?

EDIT the classname in {"id":"gid:\/\/shopify\/Product\/5860091625632"} cannot be used to determine the Type!

After_8
  • 189
  • 1
  • 4
  • 16
  • 1
    Why in reverse? Seems to me that if you're reading it forwards, deser'ing lines as you go, then subsequent lines are related in some way to previous lines, so if you're using eg a Dictionary to track which Products you've seen, then deser'ing forwards means you've seen some entity before, will have indexed it and can add the current line to the entity graph you're building. Your question about "how should I insert this into a database" also seems very broad; in what way does this structure not map to your database structure currently? What does your DB look like? What problem are you solving? – Caius Jard Dec 27 '20 at 14:41
  • @CaiusJard the database is just a example of using the deserialized data. Imagine each Class (`Product`, `ProductVariant` etc.) is their own table in my SQL Database. So How do i know that the second line is a `ProductVariant` and not a `ProductImage` to be inserted correctly into my DB? – After_8 Dec 27 '20 at 14:45
  • Your second line doesnt contain any interesting info other than id and parentid, though it seems that the id claims image rather than variant.. presumably at some point you'll find something with more attributes that refers to this id as a parent? And that other thing will be more esily type'able - why cannot you assume from the id containing "ProductImage" that it's a product image? – Caius Jard Dec 27 '20 at 14:50
  • I stripped the JSON-Lines example a bit. All lines will contain more attributes then just a `id`. So the second line may include a `url` attribute. I cannot use the `Id`to determine the type because Shopify does not gurantee that the id-string contains `ProductImage`. The `Id`can be any random string without any more information about the object. – After_8 Dec 27 '20 at 14:56
  • So what, actually, is your question? It seems to be "I have some json that might look a bit like this, or this, or this, and I want someone to give me some logic so I can know to deser it to that, or that, or that depending..." - except you've chopped the JSON down so it's not representative any more, and I'm not even clear on whether this is a shopify specific thing (question would be best answered by someone with shopify specific knowledge) or if you're after a generic technique for analyzing some JSON X and deciding which of N different C# objects to deser it to – Caius Jard Dec 27 '20 at 17:08
  • I think this is a general problem. Other applications, like MongoDB, solve this Polymorphism problem with added type information to the schema. So i don´t have any type information and i thought shopify has a reason why they do it like that. But i could not figure out how to process this data without any type information. I thought that there will be some common strategies or best practtices to handle such data. I generalised the schema and code to focus more on the problem then my specific issue – After_8 Dec 27 '20 at 18:54
  • 1
    If you boil it down, in the way you say Mongo solves it, then you're really just saying "they add more data to the data". To me it looks like a fingerprinting thing; a ProductImage will have some various number of attributes, some core, some not. A variant will have a different set of core attributes. It should be possible to identify if a line is an image or variant by examining the presence of various attributes (before deser, or after, if you want to deser to an object that represents the superset of all attribs you care about – Caius Jard Dec 27 '20 at 19:02
  • @CaiusJard thanks for your inspiration. I finally found a solution using GraphQL field aliases! – After_8 Jan 09 '21 at 20:21

1 Answers1

0

I ended up adding some sort of type information to my graphql-query by defining a unique fieldname for each type which may be on a new line in the resulting JSON Lines file.

For that i used GraphQL field aliases:

someQuery {
   uniqueFieldAlias : fieldName
}

When i read the file i search on each line for the unique fieldname. Then i deserialize the line into the corresponding class.

using (var file = new StreamReader(await res.Content.ReadAsStreamAsync()))
{
    string line;

    while ((line = await file.ReadLineAsync()) != null)
    {
        if (line.Contains("\"uniqueFieldAlias\""))
        {
            var product = JsonSerializer.Deserialize<Product>(line);
            products.Add(product);
            continue;
        }
        if (line.Contains("\"otherUniqueAlias\""))
        {
            var somethingElse = JsonSerializer.Deserialize<SomeClass>(line);
            products[productIndex].Something.Add(somethingElse);
            continue;
        }
    }
}

The idea is inspired by @Caius Jard comments

After_8
  • 189
  • 1
  • 4
  • 16