2

I am trying to build a lattice-type of FCA-type of data structure in Haskell where I could check if two entities have a join or not. Actually, I'm not even sure that the lattice is the right structure as it might be a bit "too much".

Here is the context. In a COBOL program analyzer, I have an ontology of data set names, files, records and fields. A data set name can have multiple file names depending on the program, a file can have multiple records and a record can have multiple fields. I'd like this hierarchy to be reflected in the Haskell data structure. But I'd like also to be able to have a relation inherited for file1 and file2 such that I can check if file1 and file2 belong to the same data set name. Actually, this relation could almost be that of "==". But it could simply be that they do have a join in dsn0, for instance.

I have other ontologies in that context that would benefit from a lattice or FCA data structure. For example, I have programs that belongs to job steps and job steps that belong to jobs. If I could easily figure out if two programs belong to the same job, that would be great. Here also, it seems like a "join" operator. Getting the extension (code) of a certain entity would be useful too.

I'm still a bit new to Haskell. I tried to look at the Lattice library but I'm not sure where to go from there, concretely. Any idea of how to get started? A small example of a lattice in Haskell would be very helpful. Thank you very much for your help (and patience).

UPDATE: The Lattice might not be the best formalism for this as mentioned in the comments. I realize I might just have to use a regular class type of data structure along those lines:

data DSN = DSN {
    programFiles :: [ProgramFile]
    name :: String
    ddn :: DDN
}
data ProgramFile = ProgramFile {
    records :: [Record]
    name :: String
}
data Record = Record {
    fields :: [Field]
    name :: String
}
data Field = Field {
    name :: String
    order :: Int
}

I guess my initial intention behind using a tree/lattice/FCA type of structure is to take full advantage of the functor potential in Haskell which should lead to interesting lattice operations including geting all the extension of a concept, checking that two concepts belong to the same higher-level concept, checking equality '==' of two files through their DSN,...

Maybe a non-binary tree structure would be better? Is it easy to do that in Haskell?

  • 1
    To the extent I understand what you're trying to do ... It sounds like there's different sense of 'belong to'; and one entity might belong to other entities in different ways (or might belong to the same entity in multiple ways?) Then I'm not sure a lattice is the right structure -- all the 'belong to's are in the same sense. I would flatten out all the links into a relational database. A within-programming-language data structure doesn't seem the right tool for the job. – AntC Apr 18 '21 at 23:23
  • 1
    A data set can have multiple files. Can a file belong to multiple data sets? If not, do you want that constraint baked into the types, or are you willing to maintain a non-compiler-checked invariant yourself? – Daniel Wagner Apr 19 '21 at 16:20
  • Thank you @DanielWagner for these really good clarifying questions. The Lattice might not be the best formalism for this, indeed. I realize I might just have to use a regular class type of data structure along those lines: DSN = DSN { programFiles :: [ProgramFile] name :: String ddn :: DDN } data ProgramFile = ProgramFile { records :: [Record] name :: String } ... I guess my initial intention behind using a tree/lattice/FCA type of structure is to take full advantage of the functor potential in Haskell which should lead to interesting lattice operations. – Hugolin Bergier Apr 19 '21 at 16:45
  • Thank you @AntC for your comment. I replied above as both comments kind of headed the same way. – Hugolin Bergier Apr 19 '21 at 16:46
  • @HugolinBergier I don't think your edit or comment actually answered my clarifying questions. Could you take another stab at them? I have several specific solutions in mind, but which one is most appropriate depends on the answers to those questions. – Daniel Wagner Apr 19 '21 at 17:45
  • @DanielWagner, no a program file cannot belong to multiple data sets. The same way, a program cannot belong to multiple job steps. In general, across the whole ontology we have a 1-to-many relationship, not many-to-many. So it's more of a non-binary tree structure than a lattice. Yes, I want that constraint baked into the types. Thank you! – Hugolin Bergier Apr 19 '21 at 18:21

1 Answers1

3

I recommend making an abstract data type to represent one-to-many relationships. It might look something like this:

module OneToMany (OMRel, empty, insert, delete, source, targets) where

import Data.Map (Map)
import Data.Set (Set)
import qualified Data.Map.Strict as M
import qualified Data.Set as S

data OMRel a b = OMRel
    { oneToMany :: Map a (Set b)
    , manyToOne :: Map b a
    } deriving (Eq, Ord, Read, Show)

empty :: OMRel a b
empty = OMRel M.empty M.empty

insert :: (Ord a, Ord b) => a -> b -> OMRel a b -> OMRel a b
insert a b (OMRel otm mto) = OMRel
    { oneToMany = M.insertWith S.union a (S.singleton b) $
        case M.lookup b mto of
            Just oldA -> M.adjust (S.delete b) oldA otm
            Nothing -> otm
    , manyToOne = M.insert b a mto
    }

delete :: (Ord a, Ord b) => a -> b -> OMRel a b -> OMRel a b
delete a b (OMRel otm mto) = OMRel (M.adjust (S.delete b) a otm) (M.delete b mto)

source :: Ord b => b -> OMRel a b -> Maybe a
source b = M.lookup b . manyToOne

targets :: Ord a => a -> OMRel a b -> Set b
targets a = M.findWithDefault S.empty a . oneToMany

(Of course you can flesh out the API with more efficient bulk operations like merging, bulk insert, sequential composition, etc. But this is sort of the minimal construct/consume API that gets you where you need to go.)

Then you need a couple data types to represent your various ontological entries:

newtype Dataset = Dataset { dataset :: String }
newtype Record = Record { record :: String }
newtype Field = Field { order :: Int }

From there you can use values with types like OMRel Dataset FilePath to represent the fact that datasets "contain" files. For querying containment equality, you can write this once and for all via the OMRel API above:

sameSource :: (Eq a, Ord b) => OMRel a b -> b -> b -> Bool
sameSource rel b b' = source b rel == source b' rel

(You may need an extra clause if two missing targets should be considered inequal.) Then, e.g., this can be specialized to

sameSource :: OMRel Dataset FilePath -> FilePath -> FilePath -> Bool

and friends.

You will not be able to make a Functor (or Bifunctor) instance for OMRel because of the Ord constraints. It's not clear to me that fmap/bimap make a TON of sense for this particular data structure, though. For example, if we have associations x :: a <-> y :: b and x' :: a <-> y' :: b, and f b = f b', then should fmap f associate f b with a or a'? If they do turn out to have a sensible interpretation and be useful, you could either pursue the constrained-monads approach or simply offer a function named bimap from the OneToMany module with the appropriate type, but without making it an instance method.

Daniel Wagner
  • 145,880
  • 9
  • 220
  • 380
  • Thank you @DanielWagner, this is perfect! I think this is exactly what I need to get started. I'm just a bit surprised that there is no standard library or built-in functions for these types of relationships. But what you provided is even better because I can customize it to my needs and fully understand it. – Hugolin Bergier Apr 19 '21 at 19:05
  • I can see now how the functor would not make that much sense for such data structure. Maybe that's not what I need. An example I had in mind is if we have a hierarchy of program ontologies (using OneToMany) on one hand and a hierarchy of source code reference (a partial order) on the other hand and we want to establish a functor between the two. But I need to move forward on this first to give it more thought and clarify things more. Again, great thanks! – Hugolin Bergier Apr 19 '21 at 19:05
  • @HugolinBergier Hackage has a data structure for [many-to-many relations](http://hackage.haskell.org/package/relation), and a data structure for [one-to-one relations](http://hackage.haskell.org/package/bimap), but it seems there is a hole indeed for many-to-one. The many-to-many structure could be used as an alternative implementation technique for `OMRel`; perhaps it would involve a little less boilerplate, though it would probably also be slightly less efficient. Perhaps you will be the person to upload a package that fills this hole! – Daniel Wagner Apr 19 '21 at 19:10
  • ha! Once I get proficient in Haskell like you, maybe yes... Little question: isn't the type of insert missing an argument (or a value)? Maybe we should add '-> OMRel a b' at the end of the line? – Hugolin Bergier Apr 19 '21 at 20:55
  • @HugolinBergier I wouldn't be surprised if there were half a dozen small mistakes -- I typed out the whole module more or less off-the-cuff and without actually checking with the compiler that it agreed with me, let alone testing its functionality. This seems like a good place to put that legalese about NOT SUITABLE FOR ANY USE or whatever, if I knew what the right phrasing was. The real goal was to communicate an idea to a human, so I didn't bother checking if I had communicated that idea to the machine. – Daniel Wagner Apr 19 '21 at 21:50