Double Dispatch and inheritance

Question

I have a number of dumb object classes that I would like to serialize as Strings for the purpose of out-of-process storage. This is a pretty typical place to use double-dispatch / the visitor pattern.

public interface Serializeable {
  <T> T serialize(Serializer<T> serializer);
}

public interface Serializer<T> {
  T serialize(Serializeable s);
  T serialize(FileSystemIdentifier fsid);
  T serialize(ExtFileSystemIdentifier extFsid);
  T serialize(NtfsFileSystemIdentifier ntfsFsid);
}

public class JsonSerializer implements Serializer<String> {
  public String serialize(Serializeable s) {...}
  public String serialize(FileSystemIdentifier fsid) {...}
  public String serialize(ExtFileSystemIdentifer extFsid) {...}
  public String serialize(NtfsFileSystemIdentifier ntfsFsid) {...}
}

public abstract class FileSystemIdentifier implements Serializeable {}
public class ExtFileSystemIdentifier extends FileSystemIdentifier {...}
public class NtfsFileSystemIdentifier extends FileSystemIdentifier {...}

With this model, the classes that hold data don't need to know about the possible ways to serialize that data. JSON is one option, but another serializer might "serialize" the data classes into SQL insert statements, for example.

If we take a look at the implementation of one of the data classes, the impementation looks pretty much the same as all the others. The class calls the serialize() method on the Serializer passed to it, providing itself as the argument.

public class ExtFileSystemIdentifier extends FileSystemIdentifier {
  public <T> T serialize(Serializer<T> serializer) {
    return serializer.serialize(this);
  }
}

I understand why this common code cannot be pulled into a parent class. Although the code is shared, the compiler knows unambiguously when it is in that method that the type of this is ExtFileSystemIdentifier and can (at compile time) write out the bytecode to call the most type-specific overload of the serialize().

I believe I understand most of what is happening when it comes to the V-table lookup as well. The compiler only knows the serializer parameter as being of the abstract type Serializer. It must, at runtime, look into the V-table of the serializer object to discover the location of the serialize() method for the specific subclass, in this case JsonSerializer.serialize()

The typical usage is to take a data object, known to be Serializable and serialize it by giving it to a serializer object, known to be a Serializer. The specific types of the objects are not known at compile time.

List<Serializeable> list = //....
Serializer<String> serializer = //....

list.stream().map(serializer::serialize)

This instance works similar to the other invocation, but in reverse.

public class JsonSerializer implements Serializer<String> {
  public String serialize(Serializeable s) {
    s.serialize(this);
  }
  // ...
}

The V-table lookup is now done on the instance of Serializable and it will find, for example, ExtFileSystemIdentifier.serialize. It can statically determine that the closest matching overload is for Serializer<T> (it just so happens to also be the only overload).

This is all well and good. It achieves the main goal of keeping the input and output data classes oblivious to the serialization class. And it also achieves the secondary goal of giving the user of the serialization classes a consistent API regardless of what sort of serialization is being done.

Imagine now that a second set of dumb data classes exist in a different project. A new serializer needs to be written for these objects. The existing Serializable interface can be used in this new project. The Serializer interface, however, contains references to the data classes from the other project.

In an attempt to generalize this, the Serializer interface could be split into three

public interface Serializer<T> {
  T serialize(Serializable s);
}

public interface ProjectASerializer<T> extends Serializer<T> {
  T serialize(FileSystemIdentifier fsid);
  T serialize(ExtFileSystemIdentifier fsid);
  // ... other data classes from Project A
}

public interface ProjectBSerializer<T> extends Serializer<T> {
  T serialize(ComputingDevice device);
  T serialize(PortableComputingDevice portable);
  // ... other data classes from Project B
}

In this way, the Serializer and Serializable interfaces could be packaged and reused. However, this breaks the double-dispatch and it results in an infinite loop in the code. This is the part I'm uncertain about in the V-table lookup.

When stepping through the code in a debugger, the issue arises when in the data class' serialize method.

public class ExtFileSystemIdentifier implements Serializable {
  public <T> T serialize(Serializer<T> serializer) {
    return serializer.serialize(this);
  }
}

What I think is happening is that at compile time, the compiler is attempting to select the correct overload for the serialize method, from the available options in the Serializer interface (since the compiler knows it only as a Serializer<T>). This means by the time we get to the runtime to do the V-table lookup, the method being looked for is the wrong one and the runtime will select JsonSerializer.serialize(Serializable), leading to the infinite loop.

A possible solution to this problem is to provide a more type-specific serialize method in the data class.

public interface ProjectASerializable extends Serializable {
  <T> T serialize(ProjectASerializer<T> serializer);
}

public class ExtFileSystemIdentifier implements ProjectASerializable {
  public <T> T serialize(Serializer<T> serializer) {
    return serializer.serialize(this);
  }
  public <T> T serialize(ProjectASerializer<T> serializer) {
    return serializer.serialize(this);
  }
}

Program control flow will bounce around until the most type-specific Serializer overload is reached. At that point, the ProjectASerializer<T> interface will have a more specific serialize method for the data class from Project A; avoiding the infinite loop.

This makes the double-dispatch slightly less attractive. There is now more boilerplate code in the data classes. It was bad enough that obviously duplicate code can't be factored out to a parent class because it circumvented the double-dispatch trickery. Now, there is more of it and it compounds with the depth of the inheritance of the Serializer.

Double-dispatch is static typing trickery. Is there some more static typing trickery that will help me avoid the duplicated code?

score 0 · Answer 1 · answered Feb 10 '16 at 12:52

as you noticed the serialize method of

public interface Serializer<T> {
  T serialize(Serializable s);
}

does not make sense. The visitor pattern is there for doing case analysis but with this method you make no progress (you already know it is a Serializable), hence the inevitable infinite recursion.

What would make sense is a base Serializer interface that has at least one concrete type to visit, and that concrete type shared between the two projects. If there is no shared concrete type, then there is no hope of a Serializer hierarchy being useful.

Now if you are looking to reduce boilerplate when implementing the visitor pattern I suggest the use of a code generator (via annotation processing), eg. adt4j or derive4j.

Double Dispatch and inheritance

1 Answers1