5

I decompiled the Map class using javap. The class definition still shows the presence of generic types K and V. This should have been erased by the concept of type erasure. Why does that not happen ?

./javap -verbose java.util.Map

Classfile jar:file:/opt/jdk1.8.0_101/jre/lib/rt.jar!/java/util/Map.class
    Last modified 22 Jun, 2016; size 4127 bytes
    MD5 checksum 238f89b3e2ff9bebe07aa22b0a3493a9
    Compiled from "Map.java"

public interface java.util.Map<K extends java.lang.Object, V extends java.lang.Object>
    minor version: 0
    major version: 52
    flags: ACC_PUBLIC, ACC_INTERFACE, ACC_ABSTRACT

Constant pool:
  • 1
    You are looking at debug information. – user207421 Sep 27 '17 at 10:35
  • 1
    The interface definition still has generic parameters K and V. Shouldn't they have been erased by the concept of type erasure ? – Suraj Chakraborty Sep 27 '17 at 10:35
  • 1
    Yes and no. There is still some metadata included about type parameters and generic supertypes. There has to be, otherwise you could not consume generic types unless you had the source code to them, as the compiler would have no way of knowing they were generic. But that metadata is ‘extra’ information in the class file. `javap` uses this metadata to show you the generic signatures instead of the raw (erased) signatures. – Mike Strobel Sep 27 '17 at 10:38
  • Obviously it *is* working, otherwise you would be unable to compile and execute your code. The types that get erased are in *your* code, such as `Map extends Number>`, which gets erased to `Map`. What's in `java/util/Map.class` is irrelevant. – user207421 Sep 27 '17 at 11:15
  • @user207421 why would that be *irrelevant*? `Map` itself is generic so looking at `class` file makes perfect sense for me. and looking into that class (or any other generic class) with `javap` will show that for types generic information is retained – Eugene Aug 31 '18 at 19:33
  • @MikeStrobel so you are saying that `void go(List list)` is truly erased to `void go(List list)` and `javap` looking into some "meta-data" in the class file can tell that the method is actually `void go(java.util.List)`? I find it a bit different honestly. I think that type information is preserved for types, but erased at all call-sites. – Eugene Aug 31 '18 at 19:37
  • @Eugene The method descriptor should be `go(Ljava/util/List;)V`, fully erased. There may or may not be a `Signature` attribute in the class file for that method, which would contain the full generic signature. `javac` will emit one, but it’s not mandatory, and could be stripped out by a post processor. Same goes for the local variable type table, which would could be used to deduce the parameter type (look for the type of local `1` at offset `0`). I think Java 8 started recording some additional metadata about arguments, but I forget the details. Like the others, it’s optional. – Mike Strobel Aug 31 '18 at 20:29
  • @MikeStrobel thank you! I thought I was going crazy with this, now it all makes sense, faith in SO restored, I guess :) – Eugene Sep 03 '18 at 10:13
  • `K extends Object` looks like the type information is *practically* erased. I don't know the bytecode-level details though. – Tamas Rev Sep 04 '18 at 13:34

2 Answers2

11

If generic signature information were completely erased, it would not be possible to consume generic types or methods unless you also had the source code. Think about it: in order to use generics effectively, the compiler must know that a type or method is generic, and it must know the number, position, and bounds of the generic parameters.

To that end, javac emits what's called a Signature attribute on types and methods which are themselves generic, or whose signatures contain type variables or instantiations of other generic types.

For a generic type like Map<K, V>, the class definition will emitted with a Signature attribute describing:

  1. All generic parameters (type variables) declared by the type, and their bounds;
  2. The full generic signature of the type's base class;
  3. The full generic signature of the interfaces implemented by the type.

For the Map interface, the Signature value looks like this:

<K:Ljava/lang/Object;V:Ljava/lang/Object;>Ljava/lang/Object;

You can see this attribute in javap -v at the very end of the output, on the line following the closing }. To see what a more complete generic signature looks like, take a look at the HashMap class, which has a generic base class and implements multiple interfaces:

<K:Ljava/lang/Object;V:Ljava/lang/Object;>Ljava/util/AbstractMap<TK;TV;>;Ljava/util/Map<TK;TV;>;Ljava/lang/Cloneable;Ljava/io/Serializable

From this signature, the compiler knows the following about type HashMap:

  1. There are two generic parameters, K and V, both of which extend java.lang.Object.
  2. The base class is java.util.AbstractMap<K, V>. To clarify, K and V here refer to the parameters defined by HashMap (not AbstractMap).
  3. The class implements java.util.Map<K, V>, java.lang.Cloneable, and java.io.Serializable.

Methods may also have Signature attributes, but in the case of methods, the signature describes:

  1. All generic parameters (type variables) declared by the method, and their bounds;
  2. The full generic signature of the method's parameter types;
  3. The full generic signature of the method's return type.

However, a method's Signature is considered extra metadata; you will never see one referenced directly in bytecode. Instead, you will see references to the method descriptor, which is similar to a signature that has had generic erasure applied recursively. Unlike Signature attributes, method descriptors are mandatory. javap -v is kind enough to show you both. For example, given the HashMap method public V put(K, V):

  1. The method descriptor is (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;.
  2. The generic Signature is (TK;TV;)TV;.

The Signature tells the compiler and your IDE the full generic signature of the method, enabling enforcement of type safety. The descriptor is how the method is actually referenced in the bytecode at a call site. For example, given the expression map.put(0, "zero") where map is a Map<Integer, String>, the instruction sequence would be something like:

aload            (some variable holding a Map)
iconst_0
invokestatic     java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
ldc              "zero"
invokeinterface  java/util/Map.put:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;

Note how there is no generic information retained. Limited type safety is enforced at runtime by the insertion of checkcast instructions, which perform runtime casts. For example, a call to map.get(0) on a Map<Integer, String> would include an instruction sequence similar to:

aload            (some variable holding a Map)
iconst_0
invokestatic     java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
invokeinterface  java/util/Map.get:(Ljava/lang/Object;)Ljava/lang/Object;
checkcast        Ljava/lang/String;

Thus, even though the Map type is fully erased at the call site, the emitted bytecode ensures that any value retrieved from a Map<Integer, String> is actually a String, and not some other Object.

It's important to stress that, like most metadata in a classfile, Signature attributes are completely optional. And while javac will emit them when necessary, it is possible for them to be stripped out by post processors like bytecode optimizers and obfuscators. This would, of course, make it impossible to consume generics in the manner intended. If, for example, you were to strip out the Signature attributes in java/util/Map.class, you could only consume Map as a non-generic class equivalent to Map<Object, Object>, and you would have to handle type checking yourself.

Mike Strobel
  • 25,075
  • 57
  • 69
  • excellent, you made a developer very happy today. was searching for an answer like this for waaaay too long, thank you so much! – Eugene Sep 25 '18 at 12:19
  • @Eugene I've found many of your answers and comments on this site helpful, so I'm glad I could return the favor! :D – Mike Strobel Sep 25 '18 at 12:43
  • a follow up question if you don't mind. In my understanding if the compiler would not generate that `Signature` it would not be possible for the callers to know if a certain method was generic at all or insert `checkcast`s into them, right? Was this how the compiler would do things before, I wonder? – Eugene Sep 27 '18 at 09:03
  • @Eugene When referencing *compiled* classes, you are correct: without the `Signature` attributes, the compiler would not know a type or method is generic. Obviously, if you have the source files, and they are included in the compilation, the compiler could derive the information that way. I'm not sure what you mean by, "how the compiler would do things **before**". If you meant "before generics", then it would be up to the coder to insert casts (e.g., when retrieving an item out of a collection). – Mike Strobel Sep 27 '18 at 12:54
  • by *before* I really meant was there ever a compiler that would not generate the `Signature`, sorry – Eugene Sep 27 '18 at 12:55
  • @Eugene I believe `Signature` attributes were added in Java 1.5, specifically for generics. `javac` started emitting them at that time, but I don't know about other compilers. I would think a failure to emit them would make them non-compliant, but I'd have to check the ever-so-detailed specification. – Mike Strobel Sep 27 '18 at 14:16
  • you have no idea what chaos(in a very good way!) you have caused in my office at work - this has been debated the whole day. I feel like serving you a beer now :) again, thank you so much – Eugene Sep 27 '18 at 14:17
0

There is extra information inside the bytecode that is used decode the generic information.

Ankit Agarwal
  • 166
  • 1
  • 7