0

Im trying to learn F# by rewriting some scripts from python, where I query a graphql endpoint and load the json into a Pandas DataFrame for cleaning and analysis:

json like this:

apiResponse = {"data":
                    {"Shipments":{"ErrorMessage":null,
                    "Success":true,
                    "ValidationResult":null,
                    "TotalCount":494,
                    "Data":[
                       {"Building":"B7",
                        "Comment":"Test",
                        "CompletedDate":"2021-04-12T10:13:13.436Z",
                        "ItemId":"dd4520bb-aa0a-...",
                        "NoOfUnit":5,
                        "Organization":{
                              "OrganizationId":"cac43a32-1f08-...",
                              "OrganizationName":"XXX"},
                       "Adress":"Road 5"
                      },
                      {"Building":"B7",
                      ....}]}}}

Python:

data = request.json()
#only the "Data" path
json_list = data['data']['Shipments']['Data']
df = json_normalize(json_list)

The same approach with using Fsharp.Data, while "file.json" is only the "Data[]" part ( I tried to create a fiddle, but I could not get it to run. here:


type ApiTypes = JsonProvider<"file.json"> //where file.json has only "Data" path of apiResponse
let jsonResponse = JsonValue.Parse(apiResponse)
let data = jsonResponse?data
let Shipments = data?Shipments
let Data = Shipments?Data

let input = 
 ApiTypes.Parse(Data.ToString())

let df = Frame.ofRecords input

But this is not working. So my questions:

  1. Is this the right way to work with this json?
  2. Is there a better way to create a DataFrame with json?

Any help apreciated. Thanks

nanuuq
  • 169
  • 9

1 Answers1

0

I don't have any experience with Deedle, but I think ofRecords requires an actual static type to work correctly, so it may not be compatible with JsonProvider. (I couldn't get it to work, at least.)

Instead, I would define the type by hand and then deserialize to it, like this:

open Newtonsoft.Json.Linq

type Datum =
    {
        Building : string
        Comment : string
        CompletedDate : DateTime
        ItemId : Guid
        NoOfUnit : int
        Organization :
            {|
                OrganizationId : Guid
                OrganizationName : string
            |}
        Adress : string
    }

let jobj = JObject.Parse(apiResponse)
let frame =
    jobj.["data"].["Shipments"].["Data"].Children()
        |> Seq.map (fun jtok -> jtok.ToObject<Datum>())
        |> Frame.ofRecords
Brian Berns
  • 15,499
  • 2
  • 30
  • 40
  • Okay, this would have been my next approach, defining types on my own. I will give it a try. Why did you call the type "Datum"? I ask, because the JsonProvider did the same when I loaded the complete Json. Is the naming of `data` and `Data` in the structure a problem here? – nanuuq May 17 '21 at 06:39
  • “Datum” is just singular for “data”. I deliberately used the same name that `JsonProvider` did. The name isn’t a problem. – Brian Berns May 17 '21 at 06:40
  • 1
    And in German, "Datum" means "date", that was confusing me..... – nanuuq May 17 '21 at 14:04
  • Thanks, that worked. Just for completion, I had to deal with some `null` values for the `DateTime` types with ` Nullable`. Now I have only to find out, how to deal with empty arrays in the json. – nanuuq May 17 '21 at 18:44
  • Actually, the problem is not an empty array, it is a list like `"ShipmentContent":[{"Item1": "xxx", "Item2": "yyy"}]` that gives me the error. – nanuuq May 17 '21 at 18:58
  • I would suggest creating a separate SO question for this problem, where you can put the details. – Brian Berns May 17 '21 at 20:07
  • I will do, but first I have to read the docs a bit more... – nanuuq May 18 '21 at 18:16