I have a number of dumb object classes that I would like to serialize as Strings for the purpose of out-of-process storage. This is a pretty typical place to use double-dispatch / the visitor pattern.
public interface Serializeable {
<T> T serialize(Serializer<T> serializer);
}
public interface Serializer<T> {
T serialize(Serializeable s);
T serialize(FileSystemIdentifier fsid);
T serialize(ExtFileSystemIdentifier extFsid);
T serialize(NtfsFileSystemIdentifier ntfsFsid);
}
public class JsonSerializer implements Serializer<String> {
public String serialize(Serializeable s) {...}
public String serialize(FileSystemIdentifier fsid) {...}
public String serialize(ExtFileSystemIdentifer extFsid) {...}
public String serialize(NtfsFileSystemIdentifier ntfsFsid) {...}
}
public abstract class FileSystemIdentifier implements Serializeable {}
public class ExtFileSystemIdentifier extends FileSystemIdentifier {...}
public class NtfsFileSystemIdentifier extends FileSystemIdentifier {...}
With this model, the classes that hold data don't need to know about the possible ways to serialize that data. JSON is one option, but another serializer might "serialize" the data classes into SQL insert statements, for example.
If we take a look at the implementation of one of the data classes, the impementation looks pretty much the same as all the others. The class calls the serialize()
method on the Serializer
passed to it, providing itself as the argument.
public class ExtFileSystemIdentifier extends FileSystemIdentifier {
public <T> T serialize(Serializer<T> serializer) {
return serializer.serialize(this);
}
}
I understand why this common code cannot be pulled into a parent class. Although the code is shared, the compiler knows unambiguously when it is in that method that the type of this
is ExtFileSystemIdentifier
and can (at compile time) write out the bytecode to call the most type-specific overload of the serialize()
.
I believe I understand most of what is happening when it comes to the V-table lookup as well. The compiler only knows the serializer
parameter as being of the abstract type Serializer
. It must, at runtime, look into the V-table of the serializer
object to discover the location of the serialize()
method for the specific subclass, in this case JsonSerializer.serialize()
The typical usage is to take a data object, known to be Serializable
and serialize it by giving it to a serializer object, known to be a Serializer
. The specific types of the objects are not known at compile time.
List<Serializeable> list = //....
Serializer<String> serializer = //....
list.stream().map(serializer::serialize)
This instance works similar to the other invocation, but in reverse.
public class JsonSerializer implements Serializer<String> {
public String serialize(Serializeable s) {
s.serialize(this);
}
// ...
}
The V-table lookup is now done on the instance of Serializable
and it will find, for example, ExtFileSystemIdentifier.serialize
. It can statically determine that the closest matching overload is for Serializer<T>
(it just so happens to also be the only overload).
This is all well and good. It achieves the main goal of keeping the input and output data classes oblivious to the serialization class. And it also achieves the secondary goal of giving the user of the serialization classes a consistent API regardless of what sort of serialization is being done.
Imagine now that a second set of dumb data classes exist in a different project. A new serializer needs to be written for these objects. The existing Serializable
interface can be used in this new project. The Serializer
interface, however, contains references to the data classes from the other project.
In an attempt to generalize this, the Serializer
interface could be split into three
public interface Serializer<T> {
T serialize(Serializable s);
}
public interface ProjectASerializer<T> extends Serializer<T> {
T serialize(FileSystemIdentifier fsid);
T serialize(ExtFileSystemIdentifier fsid);
// ... other data classes from Project A
}
public interface ProjectBSerializer<T> extends Serializer<T> {
T serialize(ComputingDevice device);
T serialize(PortableComputingDevice portable);
// ... other data classes from Project B
}
In this way, the Serializer
and Serializable
interfaces could be packaged and reused. However, this breaks the double-dispatch and it results in an infinite loop in the code. This is the part I'm uncertain about in the V-table lookup.
When stepping through the code in a debugger, the issue arises when in the data class' serialize
method.
public class ExtFileSystemIdentifier implements Serializable {
public <T> T serialize(Serializer<T> serializer) {
return serializer.serialize(this);
}
}
What I think is happening is that at compile time, the compiler is attempting to select the correct overload for the serialize
method, from the available options in the Serializer
interface (since the compiler knows it only as a Serializer<T>
). This means by the time we get to the runtime to do the V-table lookup, the method being looked for is the wrong one and the runtime will select JsonSerializer.serialize(Serializable)
, leading to the infinite loop.
A possible solution to this problem is to provide a more type-specific serialize
method in the data class.
public interface ProjectASerializable extends Serializable {
<T> T serialize(ProjectASerializer<T> serializer);
}
public class ExtFileSystemIdentifier implements ProjectASerializable {
public <T> T serialize(Serializer<T> serializer) {
return serializer.serialize(this);
}
public <T> T serialize(ProjectASerializer<T> serializer) {
return serializer.serialize(this);
}
}
Program control flow will bounce around until the most type-specific Serializer
overload is reached. At that point, the ProjectASerializer<T>
interface will have a more specific serialize
method for the data class from Project A; avoiding the infinite loop.
This makes the double-dispatch slightly less attractive. There is now more boilerplate code in the data classes. It was bad enough that obviously duplicate code can't be factored out to a parent class because it circumvented the double-dispatch trickery. Now, there is more of it and it compounds with the depth of the inheritance of the Serializer.
Double-dispatch is static typing trickery. Is there some more static typing trickery that will help me avoid the duplicated code?