2

I'm attempting to implement a simple typed lisp in the TypeScript type system, drawing inspiration from JSON Logic, but with types. Here is an example of what that looks like:

/**
 * Given a context object C, this type will allow you to construct
 * expressions that can reference that context and perform operations
 * on the data therein.
 *
 * Not shown below are the definitions for 
 *   JSONData: A simple typed representation of JSON non-object) in a 
 *   JSONPath: A union type of a terminal paths (i.e. 
 *             given JSON object.
 *   JSONPathValueType: The value type for a given path in a JSONObject
 */

type Expr<A extends JSONData, C extends JSONData> = A extends number
  ?
      | number
      | Ref<number, C>
      | ['+', ...Expr<number, C>[]]
      | ['first', Expr<number[], C>]
  : A extends JSONObject
  ? JSONObject | Ref<JSONObject, C>
  : A extends JSONArray<infer T>
  ?
      | T[]
      | Ref<T[], C>
      | ['map', Expr<JSONArray, C>, Expr<T, C>]
  : never;


type Ref<T, C extends JSONData> = {
  [P in JSONPath<C>]: JSONPathValueType<P, C> extends T ? `$.${P}` : never;
}[JSONPath<C>];

Given the above definition, the following are all valid typed expressions:

type Context = { foo: 1; bar: { baz: 2; bin: [5]; buz: [{ fiz: 3 }] } };
type SampleNumberExpr = Expr<number, Context>;

const two: SampleNumberExpr = ['+', 1, 1];
const three: SampleNumberExpr = ['+', '$.foo', '$.bar.baz'];
const four: SampleNumberExpr = ['first', [4, 5, 6]];
const five: SampleNumberExpr = ['+', 1, ['first', [4, 5, 6]]];
const six: SampleNumberExpr = ['+', '$.foo', ['first', '$.bar.bin']];

However I start to get into trouble when I try to implement and use the map functionality. Ideally I would be able to use the same Ref $.* notation in the context of an expression to access each item within the provided list:

const doesNotWork: SampleNumberExpr = 
  ['first', ['map', '$.bar.buz', ['+', '$.item.fiz', 1]]]

Yet every attempt I have made to implement this functionality is met with either type errors or the Type instantiation is excessively deep and possibly infinite. compiler error.

Specifically, trying to create a new context for the map functionality to be able to Ref, where the $.item key matches the type of each element of the inputted Array Expr is where I'm stuck. Obviously this is all beyond where the TS type system is really intended to go, but I have this intuition that it should be possible – I would love any help in finding out if that is, in fact, the case.

EDIT For those who want to play around with this, here is a full playground example

Tobias S.
  • 21,159
  • 4
  • 27
  • 45
wvandaal
  • 4,265
  • 2
  • 16
  • 27
  • Do you think you could try to pare this down to a [mre] that displays your issue but has fewer moving parts? I don't know if people are likely to dig all the way through this as it's currently written. – jcalz Nov 19 '22 at 20:59
  • Without something more tractable it looks like you are passing `C` down as-is in places where it should be transformed into something that drills down and keeps track of where you are in the tree. [See playground link](//tsplay.dev/wOA0rm). See how I refactored to give the chunks of your things names, so that the quickinfo actually gives you relatively modest unions of named types instead of a dramatic explosion of names. I reached the place where you've got `Expr` and expecting `"$.item.fiz"` to match, and of course it doesn't since `Context` knows nothing about `$.item`. – jcalz Nov 19 '22 at 21:25
  • But as for how to *fix* it, well, I'm not inclined to wade farther into this. It's just too complex. If you pare it down to a toy example with the relevant issue, with an attempt to do the drilling down that's missing here, maybe I could take another look? Or maybe someone else will come along. Anyway if you do [edit] and want to ping me about it, mention @jcalz in a comment. Good luck! – jcalz Nov 19 '22 at 21:28
  • 1
    I feel like it's a lot easier if instead of a type it was a function i.e. `lisp(...)`. – kelsny Nov 28 '22 at 15:03
  • Is there a specific reason why you're using tuples containing the operation at the first position instead of records? – WolverinDEV Nov 28 '22 at 18:16
  • @WolverinDEV mainly just that they're more similar to Lisp S-Expressions. I've actually also tried Record types, but they behave in much the same way. – wvandaal Nov 28 '22 at 18:23
  • Okey I just asked since I'm playing around with record types as they're more readable for me. – WolverinDEV Nov 28 '22 at 18:33
  • Credit to @caTS for the inspiration to solve this problem using functions – wvandaal Nov 28 '22 at 20:55
  • I'm still waiting for someone to hopefully do this without functions. I also feel like this is possible but is just out of arm's reach. Bounty will go to your self-answer if no one else answers. – kelsny Dec 01 '22 at 20:32

1 Answers1

0

Answering this question and providing more context in the hopes that this helps people who have also been struggling with this type of issue. I will try to also address the critiques raised by @jcalz in the comments section of the original question.

First, some context

I'm currently working on a rules engine for automating the execution of tasks in an event-driven AWS cloud backend. The goal is for a user to be able to specify an abstract definition of what they would like to be done in response to a certain event taking place, storing that definition in a database, and then executing a concrete action when an instance of the event(s) in question take place.

EXAMPLE: A user registers on the website and we send them a series of emails at intervals defined by the user (e.g. 3, 5, and 7 days after – stopping the automation if and when they respond).

Given these constraints, I needed to design a rules DSL for defining these automations in the abstract. Furthermore, given the fact that the automation definition is created by the user in a UI and stored in a database, it must be encodable using pure JSON. Finally, to protect against programmer error when constructing these expressions programmatically, I would like this DSL to be strictly typed for arbitrarily complex expression structures.

A Naive Approach

Having had some experience working in Clojure/Scheme in the past, and given my experiences designing a number of DSLs / rules engines over the years, I generally have found that the S-expressions used in Lisps are the most flexible and easy way to implement such a language.

The end result might have the following appearance:

type Event = {
  timestampMillis: number;
  data: JSONObject;
};

type Context = Readonly<{
  priorEvents: Event[];
  currentEvent: Event
}>

type Expr<ReturnType extends JSONData, Context extends JSONObject> = ???

const DAY_MILLIS = 86400000;

const isThreeDaysAfterRegistration: Expr<boolean, Context> = 
  { ">=": 
    { "ref": "$.currentEvent.timestampMillis" }, 
    { "+": [
      { "*": [ 3, DAY_MILLIS ] },
      { "first", { "map": {
        "$collection": { "ref": "$.priorEvents" }, 
        "$fn": { "ref": "$.item.timestampMillis" }
    ]}
  } as const;

For simple constructions, this approach works well-enough. Basic boolean algebra, arithmetic, and string manipulation are all straightforward and easy enough to implement:

type Expr<ReturnType extends JSONData, Context extends JSONObject> = 
  ReturnType extends boolean ?
    | { "or": readonly Expr<boolean, Context>[] }
    | { "and": readonly Expr<boolean, Context>[] }
    // ...
  : ReturnType extends number ?
    | { "+": readonly Expr<number, Context>[] }
    | { "-": readonly [Expr<number, Context>, ...Expr<number, Context>[]] }
    // ...
  : never;

However, we run into issues when we start dealing with higher-kinded types and recursion – namely with JSONData[] and { readonly [S in string]: JSONData }, i.e. JSON arrays and nested objects.

Big Trouble in (Little) Generics

Consider the following simple formulation: I would like to access and return the timestampMillis value from the first Event in the priorEvents array.

We can structure a small set of expression structures to attempt to encode this:

// Reference a value from the context
type Ref<T, C extends JSONData> = {
  [P in JSONPath<C>]: JSONPathValueType<P, C> extends T ? `$.${P}` : never;
}[JSONPath<C>];

type Expr<ReturnType extends JSONData, Context extends JSONObject> = 
  ReturnType extends number ?
    { "get": any extends Expr<infer B, Context> ? B extends JSONObject ? {
        "obj": Expr<B, Context>;
        "path": Ref<ReturnType, B>;
      } : never : never;
    }
    // ...
  : ReturnType extends JSONData[] ? 
    { "ref": Ref<ReturnType, Context> }
  : ReturnType extends JSONObject ? 
    { "first": Expr<ReturnType[], Context> }
  : never;

// Compilation Error: 
// Type 'string' is not assignable to type 'never'.
//
// The expected type comes from property 'path' which is declared here 
// on type '{ obj: { first: { ref: "$.priorEvents"; }; }; path: never; }'
const firstTimestamp: Expr<number, Context> = {
  get: {
    obj: { first: { ref: '$.priorEvents' } },
    path: '$.timestampMillis',
  },
};

It turns out that the best the Typescript compiler can do with respect to the path variable is to tell you that it is a string. Since the type of path is a function of the inferred generic type B, and the type of B can only be narrowed down to the concrete type JSONObject, we cannot deduce anything about the concrete path strings which would return the subset of nested values of type number (in this case). Moreover, the use of infer above is entirely unnecessary since its use is merely a failed attempt to conjure a Rank-2 generic in a solely Rank-1 type system.

A Rant About Ranks

So that last paragraph was unhelpfully dense and technical, so let's take a break and discuss what is actually attempting to be accomplished and then work into a discussion about Rank-N Polymorphism and how we can use an understanding of its mechanics to hopefully get us most of the way to where we want to go.

Looking at our failing expression from before, the culprit behind our compilation error is the serialization of the get expression. At a high level, this expression represents a trivially simple operation: namely, that of accessing a value on an object given its path. As an added bonus, this get function also allows us to limit ourselves to

  1. The finite set of strings which represent valid paths in a given object.
  2. The subset of the set of valid paths which contain a value matching the given ReturnType.

The second constraint is of particular importance, since it allows us to actually type the get expression in a concrete way; e.g. if I want to say that firstTimestamp is an expression of type number

const firstTimestamp: Expr<number, Context> = {
  get: {
    obj: { first: { ref: '$.priorEvents' } },
    path: '$.timestampMillis',
  },
};

I want to be sure that I constrain the values of path to only those paths within obj which return a value of type number.

Unfortunately, the two constraints listed above are exactly where we get into trouble. At their core, they are making the statement that some unknown and arbitrary type B, a subtype of JSONObject, can be derived from the value of obj at the site of instantiation for a concrete Expr and further used to derive the set of valid path strings. This dependent formulation of generic typing is called Rank-2 Polymorphism (see also this helpful explainer from the Swift community).

I will skip the highly technical explanations here, as there are others who have done it far more justice than I could ever hope to. Suffice it to say that the TypeScript type system does not fully support beyond Rank-1 polymorphism within type definitions. To illustrate what this means, consider the following practical example:

You wish to write a function which takes two generic types and produces two arrays containing those types. More importantly, you want to allow the caller to specify the function that produces the arrays for the two inputted generic types. A first attempt in TypeScript might look something like this:

function generateArrays<A, B, C>(
  arrayBuilder: (a: A) => A[],
  first: B,
  second: C,
): [B[], C[]] {
  return [arrayBuilder(first), arrayBuilder(second)];
}

However running this through the TS compiler will yield a warning that type A in your arrayBuilder function does not match types B or C. Thankfully, TypeScript (unlike many other languages), provides us with a mechanism to get around this: namely a polymorphic function type:

function generateArrays<B, C>(
  arrayBuilder: <A>(a: A) => A[], // Conjuring a rank-2 generic type 
  first: B,
  second: C,
): [B[], C[]] {
  return [arrayBuilder(first), arrayBuilder(second)];
}

"What wonderful news!" you say, "now all I need to do is use this to help me define something equivalent for the get expression type." Sadly this sorcery is not available to us in with the strict realm of types. The attempt to use infer B in the previous formulation of is insufficient to derive any useful information about the shape of the inputted data at the instantiation site for a concrete Expr instance.

Conclusion: A Dream of S-Expressions

Thankfully all is not lost! We can use what we have just learned about the special functions to define and use generics to define a set of functions (@caTS smartly intuited this in their comment) which will help us build this data DSL while still maintaining all of the type safety we want to keep ourselves honest while building complex rule sets. With allowances made for some unsafe type-casting, this solution should accomplish all of our stated goals with a clean and readable syntax using function composition (squint and it'll look like S-expressions):

/*
 * Working implementation of prior example
 */
const { get, ref, first } = new Expression<Context>();

const firstTimestamp: Expr<number, Context> = get(
  'timestampMillis',
  first(ref<Event[]>('$.priorEvents'))
);


/*
 * Encoding the DSL types and factory functions
 */
type Expr<
  Type extends Readonly<JSONData>,
  Context extends Readonly<JSONObject> = Readonly<Record<string, never>>
> = Type | Ref<Type, Context>;

// Utility namespace for all Expr factory functions
class Expression<C extends Readonly<JSONObject>> {
  get<
    O extends JSONObject,
    K extends JSONPath<O>,
    V extends JSONPathValueType<K, O>
  >(key: K, obj: Expr<O, C>): Expr<V, C> {
    return {
      '(get)': {
        '(obj)': obj,
        '(key)': key,
      },
    } as const as unknown as Expr<V, C>;
  }

  first<T extends JSONData>(list: Expr<T[], C>): Expr<T, C> {
    return {
      '(first)': list,
    } as const as unknown as Expr<T, C>;
  }

  ref<T extends JSONData>(ref: Ref<T, C>): Expr<T, C> {
    return ref;
  }
}


wvandaal
  • 4,265
  • 2
  • 16
  • 27
  • Implementing your own Lisp dialect should be considerably easier and more reliable than holding the TypeScript compiler at gunpoint, but still, a great thought experiment nonetheless. – kelsny Nov 28 '22 at 21:00
  • @caTS You'd be surprised at the utility of a small language like this. To be clear, it's only Lisp-esque, there is no support for typed meta-programming within the language itself, which would be the real prerequisite for this to be considered a typed Lisp in the traditional sense. That said, for the use case I outlined in my answer, the expressions and contexts have the potential to be quite complex, especially as our system evolves over time. The benefits of being able to have teams lean on the compiler to ensure that the expression generation they're doing is safe is well worth the effort. – wvandaal Nov 28 '22 at 23:02