Samstag, 15. Juni 2013

Subtyping in Java generics

Generic types introduce a new spectrum of type safety to Java program. At the same type, generic types can be quite expressive, especially when using wildcards. In this article, I want to explain how subtyping works with Java generics.

General thoughts on generic type subtyping


Different generic types of the same class or interface do not define a subtype hierarchy linear to the subtype hierarchy of possible generic argument types. This means for example that List<Number> is not a supertype of List<Integer>. The following prominent example gives a good intuition why this kind of subtyping is prohibited:

// assuming that such subtyping was possible
ArrayList<Number> list = new ArrayList<Integer>();
// the next line would cause a ClassCastException
// because Double is no subtype of Integer
list.add(new Double(.1d))

Before discussing this in further detail, let us first think a little bit about types in general: types introduce redundancy to your program. When you define a variable to be of type Number, you make sure that this variable only references objects that know how to handle any method defined by Number such as Number.doubleValue. By doing so, you make sure that you can safely call doubleValue on any object that is currently represented by your variable and you do not longer need to keep track of the actual type of the variable's referenced object. (As long as the reference is not null. The null reference is actually one of the few exceptions of Java's strict type safety. Of course, the null "object" does not know how to handle any method call.) If you however tried to assign an object of type String to this Number-typed variable, the Java compiler would recognize that this object does in fact not understand the methods required by Number and would throw an error because it could otherwise not guarantee that a possible future call to for example doubleValue would be understood. However, if we lacked types in Java, the program would not change its functionality just by that. As long if we never made an errornous method call, a Java program without types would be equivalent. Viewed in this light, types are merely to prevent us developers of doing something stupid while taking away a little bit of our freedom. Additionally, types are a nice way of implicit documentary of your program. (Other programming languages such as Smalltalk do not know types and besides being anoying most of the time this can also have its benefits.)

With this, let's return to generics. By defining generic types you allow users of your generic class or interface to add some type safety to their code because they can restrain themselfs to only using your class or interface in a certain way. When you for example define a List to only contain Numbers by defining List<Number>, you advice the Java compiler to throw an error whenever you for example try to add a String-typed object into this list. Before Java generics, you simply had to trust that the list only contained Numbers. This could be especially painful, when you handed references of your collections to methods defined in third-party code or received collections from this code. With generics, you could assure that all elements in your List were of a certain supertype even at compile time. 

At the same time, by using generics you loose some type-safety within your generic class or interface. When you for example implement a generic List

class MyList<T> extends ArrayList<T> { }

you do not know the type of T within MyList and you have to expect that the type could be as unsophisticated as Object. This is why you can restrain your generic type to require some minimum type:

class MyList<T extends Number> extends ArrayList<T> {
  double sum() { 
  double sum = .0d;
    for(Number val : this) {
      sum += val.doubleValue();
    }
  return sum;
  }
}

This allows you to asume that any object in MyList is a subtype of Number. That way, you gain some type safety within your generic class.

Wildcards


Wildcards are the Java equivalent to saying whatever type. Consequently, you are not allowed to use wildcards when instanciating a type, i.e. defining what concrete type some instance of a generic class should represent. A type instanciation occurs for example when instanciating an object as new ArrayList<Number> where you among other things implicitly call the type constructor of ArrayList which is contained in its class definition

class ArrayList<T> implements List<T> { ... }

with ArrayList<T> being a trivial type constructor with one single argument. Thus, neither within ArrayList's type constructor definition (ArrayList<T>)  nor in the call of this constructor (new ArrayList<Number>) you are allowed to use a wildcard. When you are however only referring to a type without instanciating a new object, you can use wildcards, such as in local variables. Therefore, the following definition is allowed:

ArrayList<?> list;

By defining this variable, you are creating a place holder for an ArrayList of any generic type. With this little restriction of the generic type however, you cannot add objects to the list via its reference by this variable. This is because you made such a general assumption of the generic type represented by the variable list that it would not be safe to add an object of for example type String, because the list beyond list could require objects of any other subtype of some type. In general this required type is unknown and there exists no object which is a subtype of any type and could be added safely. (The exception is the null reference which abrogates type checking. However, you should never add null to collections.) At the same time, all objects you get out of the list will be of type Object because this is the only safe asumption about a common supertype of al possible lists represented by this variable. For this reason, you can form more elaborate wildcards using the extends and super keywords:

ArrayList<? extends Number> list1 = new ArrayList<Integer>();
ArrayList<? super Number> list2 = new ArrayList<Object>();

When a wildcard defines a minimum subtype via extends such as list1, the compiler will enforce that any objects you get out of this list will be some subtype of Number such as for example Integer. Similarly, when defining a maximum subtype via super as in list2, you can expect any list to represent a supertype of Number such as Object. Thus you can safely add instances of any subtype of Number to this list.

Finally, you should note that you can actually use wildcards within type constructors if the used type arguments are itself generic. The following use of a type constructor is for example perfectly legal:

ArrayList<?> list = new ArrayList<List<?>>();

In this example, the requirement that the ArrayList must not be constructed by using a wildcard type is fullfilled because the wildcard is applied on the type argument and not on the constructed type itself.

As for subtyping of generic classes, we can summarize that some generic type is a subtype of another type if the raw type is a subtype and if the generic types are all subtypes to each other. Because of this we can define

List<? extends Number> list = new ArrayList<Integer>();

because the raw type ArrayList is a subtype of List and because the generic type Integer is a subtype of ? extends Number.

Finally, be aware that a wildcard List<?> is a shortcut for List<? extends Object> since this is a commonly used type definition. If the generic type constructor does however enforce another lower type boundary as for example in

class GenericClass<T extends Number> { }

a variable GenericClass<?> would instead be a shortcut to GenericClass<? extends Number>.

The get-and-put principle


This observation leads us to the get-and-put principle. This principle is best explained by another famous example:

class CopyClass {
  <T> void copy(List<T> from, List<T> to) {
    for(T item : from) to.add(item);
  }
}

This method definition is not very flexible. If you had some list List<Integer> you could not copy its contents to some List<Number>  or even List<Object>. Therefore, the get-and-put principle states that you should always use lower-bounded wildcards (? extends) when you only read objects from a generic instance (via a return argument) and always use upper-bounded wildcards (? super) when you only provide arguments to a generic instance's methods. Therefore, a better implementation of MyAddRemoveList would look like this:

class CopyClass {
  <T> void copy(List<? extends T> from, List<? super T> to) {
    for(T item : from) to.add(item);
  }
}

Since you are only reading from one list and writing to the other list, Unfortunately, this is something that is easily forgoten and you can even find classes in the Java core API that do not apply the get-and-put principle. (Note that the above method also describes a generic type constructor.)

Note that the types List<? extends T> and List<? super T> are both less specific than the requirement of List<T>. Also note that this kind of subtyping is already implicit for non-generic types. If you define a method that asks for a method parameter of type Number, you can automatically receive instances of any subtype as for example Integer. Nevertheless, it is always type safe to read this Integer object you received even when expecting the supertype Number. And since it is impossible to write back to this reference, i.e. you cannot overwrite the Integer object with for example an instance of Double, the Java language does not require you to waive your writing intention by declaring a method signature like void someMethod(<? extends Number> number). Similarly, when you promised to return an Integer from a method but the caller only requires a Number-typed object as a result, you can still return (write) any subtype from your method. Similarly, because you cannot read in a value from a hypothetical return variable, you do not have to waive these hypothetical reading rights by a wildcard when declaring a return type in your method signature.

Keine Kommentare:

Kommentar veröffentlichen