4

I am trying to parse JSON objects, which are generally of the form

{
  "objects": [a bunch of records that can assume a few different forms],
  "parameters": [same deal],
  "values": {
              "k1": "v1",
              "k2": "v2",
              ...
            }
}

using Haskell's Aeson library. Part of this task is simple in the sense that the parameters and values fields need no custom parsing whatsoever (and so seem to need only a generically derived instance of FromJSON), and most of the records contained within the array associated to objects also need no special parsing. However, there are some parts of parsing the records within the array of objects that, when considered separately, have documented solutions, but together present problems that I haven't figured out how to address.

Now, the possible variants of record inside the objects and parameters arrays are finite in number and often contain the same keys; for example, all of them have a "name" key or an "id" key, or such. But also many of them have a "type" key, which is a reserved keyword, and so cannot be parsed generically. This is the first problem.

The second problem is that one of the possible variants of record inside objects can have a key -- "depends" let's say -- whose value may assume different types. It can either be a single record

{
  "objects": [
    {
      "depends": {
        "reference": "r1"
      },
    ...
  ],
  ...
}

or a list of records

{
  "objects": [
    "depends": [
      {"reference": "r1"},
      {"reference": "r2"},
      etc.
    ],
  ],
...
}

and it happens that this is the one field that I would like to manipulate in a custom fashion after converting to a Haskell object (eventually I want to represent the collection of such "depends" references as a Data.Graph graph).

My initial attempt was to create one huge record type that subsumes all of the possible keys in the elements of the objects and parameters arrays. Something like this:

{-# LANGUAGE DeriveAnyClass    #-}
{-# LANGUAGE DeriveGeneric     #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards   #-}

import Data.Aeson
import GHC.Generics

data Ref = Ref
  { ref :: String
  } deriving (Show, Generic, FromJSON, ToJSON)

data Reference 
  = Reference Ref
  | References [Ref]
  deriving (Show, Generic, FromJSON, ToJSON)

type MString = Maybe String -- I'm writing this a lot using this approach

data PObject = PObject 
  -- Each of the object/parameter records have these keys
  { _name    :: String
  , _id      :: String

  -- Other keys that might appear in a given object/parameter record
  , _type    :: MString
  , _role    :: MString
  , _depends :: Maybe Reference

  -- A bunch more
  } deriving Show

instance FromJSON PObject where
  parseJSON = withObject "PObject" $ \o -> do
    _name    <- o .: "name"
    _id      <- o .: "id"
    _type    <- o .:? "type"
    _role    <- o .:? "role"
    _depends <- o .:? "depends"
    -- etc.
    return PObject{..}

And then finally, the whole JSON object would be represented like

data MyJSONObject = MyJSONObject
  { objects    :: Maybe [PObject]
  , parameters :: Maybe [PObject]
  , values     :: Maybe Object
  } deriving (Show, Generic, FromJSON)

This works until it tries to parse a "depends" field, reporting that

"Error in $.objects[2].depends: key \"tag\" not present"

There are no "tag" keys, so I'm not sure what this means. I suspect it has to do with the generic instances of FromJSON for Ref and Reference.

My questions:

  1. What does this error indicate? So far in my learning of Haskell, the errors have always been very helpful. This one is not. Do I need to do something special for the "depends" key in my parseJSON function?
  2. All of this boilerplate is really because of two keys -- "type" and "depends". Is there a more elegant way to deal with these keys?
  3. Relatedly, this is part of my first real Haskell project, so I have a more general design question. Experienced Haskellers and Aeson users, how would you lay out your types and instances for this type of JSON? I tried listing out each possible variant of objects/parameters record as its own separate type, and only writing custom FromJSON instances for those that have a "depends" or "type" key, but this produced a lot more boilerplate code and in any case doesn't solve any of the other issues I have. General pointers on "best practices", idiomatic usage, etc. would be extremely useful and appreciated.
user4601931
  • 4,982
  • 5
  • 30
  • 42
  • Try to boil it down to a single question. Your third question can be answered on [CodeReview.SE] as soon as your code works. – Zeta Oct 22 '17 at 17:58

2 Answers2

3

There are no "tag" keys, so I'm not sure what this means. I suspect it has to do with the generic instances of FromJSON for Ref and Reference.

That's spot on. By default, aeson will use the defaultTaggedObject to encode sum types. References is a sum type. Therefore, aeson introduces a tag to distinguish the constructors. You can try that with a short example:

ghci> data Example = A () | B deriving (Generic,ToJSON)
ghci> encode B
"{\"tag\":\"B\",\"contents\":[]}"

When you use _depends <- o .:? "depends", the Reference parser does not find its tag. You have to write some parsing code there yourself.

Zeta
  • 103,620
  • 13
  • 194
  • 236
1

All of this boilerplate is really because of two keys -- "type" and "depends". Is there a more elegant way to deal with these keys?

You could keep the underscores in the field names and use fieldLabelModifier in the Options data type to strip them for parsing purposes.

danidiaz
  • 26,936
  • 4
  • 45
  • 95