tr ouwens

by the way: things I want to say

Generics in EqualsVerifier, part 2: Instantiating generic classes

In Part 1 of this two-part series, we have explored how to determine the full, generic type of a class’s fields. Now it’s time to do something with this information: instantiate an object of any given type.

Introduction

We want to dynamically create objects of the given, generic type. In other words, if we have a TypeTag(ArrayList, TypeTag(String)), we want to construct an object that is an instanceof ArrayList. Also, we want it to contain an element of type String. EqualsVerifier uses these instances to do ‘black box’ testing: it assigns certain values to the fields of an object, calls equals on the object, and checks if the result was expected.

To instantiate an object, EqualsVerifier uses a library called Objenesis. This lets us instantiate an object without calling the constructor, leaving the fields uninitialized. This is necessary, because determining the required parameters for the constructor, and cobbling together values for them, can be hard. Also, a constructor might have side-effects which are undesirable in the context of testing equals. The purpose of a constructor is to make sure that an object is in a ‘consistent state’ after creation, but we’re not usually interested in that either (unless the class actually enforces that consisten state; more about that later): we’re going to alter the values of the fields anyway.

To alter the fields, EqualsVerifier simply uses reflection. This allows us to change the value of a field to whatever value we desire, as long as it has a compatible type. It doesn’t matter if the field is private; reflection can easily bypass that.

This process works well for simple types, but with generics added to the mix, some interesting challenges present themselves. Let’s first take a look at instantiating non-generic classes, though.

Non-generics classes

Let’s say we want to instantiate the following class:

public class Point {
    private final int x;
    private final int y;

    // Constructor, getters, equals, hashCode left out for brevity.
}

It’s easy: with Objenesis, we get an instance of Point with x and y set to their default values (0). Using reflection, we can give x and y a value.

Sometimes, this doesn’t work. For example, when there is a recursion. The simplest example of that is the Node:

public class Node {
    private final Node n;

    public Node(Node n) {
        this.n = n;
    }
}

This would result in an infinite loop of trying to instantiate Node. To avoid this, EqualsVerifier detects these infinite loops and aborts when it encounters them. To get around this, EqualsVerifier keeps a collection of so-called ‘prefab values’, where the user can manually add pre-fabricated instances of these classes. Before trying to instantiate a type, EqualsVerifier always checks the prefab values to see if there’s a value it can use.

After successfully instantiating a type, it adds the newly created instance to the collection, so it doesn’t have to instantiate the same type twice. EqualsVerifier also keeps a list of standard Java API classes that are known to have recursions (or are hard to construct for other reasons), which saves the users a lot of manual prefab value adding.

But there are other reasons why using Objenesis and reflection to create instances might not work. For instance, when a class is abstract we will get AbstractMethodErrors. Also, some classes might have strict class invariants. ArrayList, for example, might throw ConcurrentModificationExceptions when elements aren’t added in the proper way. Reflection obviously bypasses this ‘proper way’.

Since Java’s Collections API contains many recursive types, abstract methods and class invariants, EqualsVerifier keeps a list of prefab values for all of Java’s collections, and also for most of Google Guava’s collections.

Generic classes

Keeping track of generics makes maintaining prefab values a lot harder. In the past, EqualsVerifier could simply instantiate an ArrayList and add a bunch of java.lang.Object instances. Because of type erasure, at runtime every ArrayList is essentially an ArrayList<Object> anyway.

Most of the time that would work. It would only fail when the list is ‘unpacked’, like in the SparseArray example in Part 1.

Now that we do keep track of generics, it doesn’t work anymore. An ArrayList<String> is no longer the same thing as an ArrayList<Integer>. That means a single instance of ArrayList is no longer sufficient. We need an instance of all the ArrayLists, which is clearly impossible.

Factories

So we have to add an extra layer on top of the collection of prefab values: we need a collection of factories for prefab values. Whenever an ArrayList<Integer> is encountered, EqualsVerifier now determines its TypeTag and gives it to the factory. The factory instantiates a raw ArrayList and finds a value for Integer which it adds to the ArrayList.

This is actually a recursive process. If EqualsVerifier encounters an ArrayList<HashSet<Integer>>, it will instantiate the ArrayList and then recursively hand over a TypeTag for HashSet<Integer> to the factory, which will instantiate a HashSet, recursively find an instance for Integer (which is easy), and add that Integer to the HashSet. Then it adds the new instance of HashSet<Integer> to the ArrayList, and return that.

How do we instantiate these collections? For ArrayList, this is easy: we can call its parameterless constructor. This is also true for many collections, but not for all of them. EnumSet, for instance, can only be instantiated with a factory method which requires very specific arguments. Therefore, we need a separate factory for EnumSet which knows about the way it can be instantiated.

Also, there’s the problem of adding values to the collection. For lists and sets it’s easy: we just call the add method. For maps it’s also easy: we call the put method (although now we need two values instead of just the one). But it means that here too, we now need two different factories: one for lists and sets, and one for maps.

In other words, we need a lot of different factories for different kinds of classes.

Finally, every value created by the factory is added to the original collection of prefab values. Therefore, it could happen that it contains instances for ArrayList<Integer> and ArrayList<HashSet<String>>, but not ArrayList<String>, for example. It depends on the specific generic types that were encountered.

Other considerations

In a [previous post](??? the things we do for compatibility), I have explained how EqualsVerifier uses reflection to create prefab values for classes that it wasn’t compiled with, and that may or may not be available at runtime. Of course there have to be separate factories for these types as well. I will not discuss these in this article, but you’re welcome to take a look at the code.

Putting it all together

We have a bunch of different factories for a bunch of different purposes, and we need to manage which TypeTags can be handled by which factory. Managing this isn’t too hard (we just keep a Map<Class<?>, Factory>), but as we’ve seen, the number of factories quickly adds up.

Still, it frequently happens that EqualsVerifier encounters a TypeTag for which it has no factory. In this case, it will attempt the old strategy of instantiating the class with Objenesis and then filling its fields, either by using instances in its cache, or by recursively trying to instantiate those.

When this doesn’t work, the only alternative is to ask the user to supply instances using EqualsVerifier’s withPrefabValues method. These instances go directly into the cache. However, this doesn’t scale to generic types that are defined by the user; the user can currently only supply one instance per type. That means if a class has a field of type MyType<String> as well as one of type MyType<Integer>, and both the String and the Integer are ‘unpacked’ inside equals or hashCode, EqualsVerifier can’t deal with that.

This is a known limitation which fortunately doesn’t come up a lot in practice. However, it will still be addressed in a future version of EqualsVerifier.

Summary

In Part 1, we have explored how we can determine the full generic type of a field in a class. In this part, we have seen how we can leverage this information to build instances of objects that respect these generics.

Because of this, users of EqualsVerifier have more freedom to implement their equals and hashCode methods, since it is now possible to directly refer to the generic components of fields, which was not possible before. For instance:

public class SparseArrayContainer {
    private final SparseArray sparseArray;
    
    // ...
    
    @Override
    public boolean equals(Object obj) {
        if (!(obj instanceof SparseArrayContainer)) {
            return false;
        }
        SparseArrayContainer other = (SparseArrayContainer)obj;
        for (int i = 0; i < sparseArray.size(); i++) {
            String a = sparseArray.get(i);
            String b = other.sparseArray.get(i);
            if (!a.equals(b)) {
                return false;
            }
        }
        return true;
    }
}

When tested by older versions of EqualsVerifier, this class would throw a ClassCastException on the line String a = sparseArray.get(i);, because sparseArray would contain instances of java.lang.Object instead of String. As of version 2.0, EqualsVerifier recognises that sparseArray is a SparseArray of String, and fills it with Strings instead of Objects, thus preventing the ClassCastException.

All this leads to up to the final conclusion of this series:

Conclusion

Java generics are kinda hard.

Generics in EqualsVerifier, part 1: Overcoming type erasure

This is part 1 of a two-part series. This part deals with overcoming type erasure. In Part 2, we will see what EqualsVerifier can do with this generic type information.

Introduction

Since version 2.0, EqualsVerifier is aware of generics. For instance, it can determine, at runtime, whether a class’s field is a List<String> or a List<Integer>. However, type erasure says this is impossible! If we call getClass() on a list instance, we’ll get a List.class object, which is unaware of the generic paramter. How does EqualsVerifier do this?

Some background

EqualsVerifier works by creating instances of objects, filling their fields with carefully chosen values, and repeatedly calling equals on them to see if something unexpected comes up. However, type erasure causes problems. Say we have the following class:

class ListContainer {
    private final List<String> list;
}

EqualsVerifier can see that this class has a field list of type List. Versions below 2.0 don’t know anything more than that: they can’t determine that it’s really a List<String>, because the generics get erased by the JVM. To work around this, EqualsVerifier simply instantiates a raw List, puts in a few values of type Object, and hopes nobody notices.

In most cases this works perfectly fine, because in most cases, an equals method will just call list.equals(other.list). List’s equals method then simply calls equals on each of its values and it doesn’t matter what the type of these values is. After all, every Java class has an equals method that you can call.

In a small number of cases, this doesn’t work. For example, Android has a type called SparseArray. It’s a generic type that contains a sequence of values, like an array or a list. However, unlike arrays and lists, it doesn’t implement its own equals method. Calling equals on a SparseArray is like calling == on it: it doesn’t matter if two SparseArrays contain exactly the same elements; if they’re not the same instance, they’re not equal. This means that a class with a SparseArray field, has to ‘unwrap’ it in equals:

public class SparseArrayContainer {
    private final SparseArray<String> sparseArray;
    
    // ...
    
    @Override
    public boolean equals(Object obj) {
        if (!(obj instanceof SparseArrayContainer)) {
            return false;
        }
        SparseArrayContainer other = (SparseArrayContainer)obj;
        for (int i = 0; i < sparseArray.size(); i++) {
            String a = sparseArray.get(i);
            String b = other.sparseArray.get(i);
            if (!a.equals(b)) {
                return false;
            }
        }
        return true;
    }
}

Now, we get a ClassCastException in the line that assigns an element of sparseArray to a: a expects a String, but it gets an Object. Oops.

Because the generic type is erased at runtime, there’s no way around this issue. Or is there?

There is!

While it’s true for objects at runtime that their generic type gets erased, there is something we can use. In EqualsVerifier, we’re always inspecting a class and its fields, and it turns out that Java does retain the fully generic type of all fields in a class. You can use reflection to access this information. So, for our SparseArrayContainer, we can do this:

Field f = SparseArrayContainer.class.getDeclaredField("sparseArray");
Type type = f.getGenericType();
if (type instanceof ParameterizedType) {
    ParameterizedType pt = (ParameterizedType)type;
    Type[] genericTypes = pt.getActualTypeArguments();
} 

The type variable has type java.lang.reflect.Type, which is an interface with several impementations. The most important implementation, and also the easiest, is good old java.lang.Class. However, our sparseArray field is parameterized, so we’ll get an instance of ParameterizedType.

On a ParameterizedType (don’t forget to cast first; java.lang.reflect.Type doesn’t declare a lot of useful methods by itself), we can call a getActualTypeArguments() method, which gives us an array of Type. The elements of this array are instances of java.lang.Class. (As said before, java.lang.Class implements the java.lang.reflect.Type interface). In other words, genericTypes[0].equals(String.class). Yay! We have the generic type we want. The rest is just an exercise of filling in the blanks.

It turns out, there’s a lot of blanks to fill in.

TypeTag

First, we need to be able to pass around our new generic type information, where EqualsVerifier used to just pass around a java.lang.Class. java.lang.reflect.Type has all the information we need, but it’s very unwieldy. Time then to build our own container: TypeTag. It looks something like this:

public final class TypeTag {
    private final Class<?> type;
    private final List<TypeTag> genericTypes;

    public TypeTag(Field field) { ... }
    
    // Getters, equals, hashCode left out for brevity.
}

In order to construct an instance, we’ll need a java.lang.reflect.Field, because as we said before, we need that to access the generic types. (There are other ways to get them; for example, if we can make a subclass of a type and instantiate that, we can use that to instance to find the generic types. However, sometimes types are final and in those cases, making a subclass is impossible. Using a field is much more reliable.)

It does mean that we can’t get the generic types at the top of the chain, though. Say we have the following class:

public final class Entity<I extends Comparable> {
    private final I id;
    
    // Constructor, getters, equals, hashCode left out for brevity.
}

There is no way to figure out that Entity has a generic type parameter I extends Comparable<I> if we only have an instance of Entity. Fortunately, this problem is not as big as it seems, because in all cases, we only need to know the type parameter when it’s used. And when the type parameter is used, it’s used in a field. And we can get at the type parameters of fields!

However, we’re not done yet. Apart from java.lang.Class and java.lang.reflect.ParameterizedType, our java.lang.reflect.Type interface has a bunch more implementations, and we need to consider each one.

TypeVariable

Consider once again the Entity class above. When we have a java.lang.reflect.Field instance of its id field, we will get an instance of java.lang.reflect.TypeVariable, which is another implementation of the java.lang.reflect.Type interface. This gives us the name (in this case a String "I") and its bounds (Comparable<I>, wrapped in some other instance of java.lang.reflect.Type). The bounds are important, because we can’t assign a java.lang.Object to id, because that’s not Comparable. Note that type variables can be bound to concrete classes (T extends Object), but also to other type variables (T extends U), which complicates matters. To make things even worse, bounds are often even recursive: T extends Comparable<T>!

And it gets more complicated. Consider the following (not even very contrived) example:

public interface Period { }
public final class Per<T extends Comparable<T>, P extends Period> {
    private final P period;
    private final T value;
    
    // Constructor, getters, equals, hashCode left out for brevity.
}

When we evaluate the types of period and value, we’ll have to match them up with Per’s generic parameters. Note that I switched the order around, to emphasize that we can’t simply say that the first field we encounter in the class will match with the first generic parameter, and the second field with the second generic parameter. No, we’ll have to match the field’s type’s name with the generic parameter’s name.

Fortunately, there is one thing we can get from a raw java.lang.Class object: the java.lang.reflect.TypeVariables it was declared with. Per.class.getTypeParameters() returns an array of TypeVariable.

So, if we want to determine the precise type of Per’s value field, we need value’s corresponding java.lang.reflect.Field, and Per’s TypeVariable. We’ll call the Per class value’s enclosing class. We’ll put Per’s TypeVariables in a hash map, keyed on the names of the type variables, so we can more easily match them with the types of the fields.

Other implementations of the Type interface

The next java.lang.reflect.Type implementation we have to consider is the wildcard, which can also have bounds. Fortunately, the wildcard only occurs on fields, not on classes, so it’s a bit easier:

private final List<? extends Point> points;

We can safely substitute a boundless wildcard with java.lang.Object, and a bounded wildcard with the bound itself. In the case of the List<? extends Point> above, we can simply pretend it’s a List<Point>. Note that wildcards, as opposed to TypeVariables, can have upper (? extends SomeType) and lower (? super SomeType) bounds. Fortunately, we can treat them the same in this case.

Finally, there’s the GenericArrayType. Fortunately, that one is pretty straightforward compared to the rest. We’ll not look at it in detail.

What do we have now?

All this effort allows us to determine the complete type of any field, and take from that the information that is relevant to EqualsVerifier. For example, take this class:

public class Container<T extends Comparable<T>> {
    private final List<String> a;
    private final Map<String, List<Integer>> b;
    private final List<T> c;
    private final List<?> d;
    private final List<? super Class> e;
    private final T[] f;
}

This gives us the following TypeTags for its fields:

TypeTag(List, TypeTag(String))
TypeTag(Map, TypeTag(String), TypeTag(List, TypeTag(Integer)))
TypeTag(List, TypeTag(Comparable))
TypeTag(List, TypeTag(Object))
TypeTag(List, TypeTag(Class))
TypeTag(Comparable[])

TypeTag has a factory method that takes a java.lang.reflect.Field and another TypeTag that represents the enclosing type, which we need to resolve TypeVariables like T. Then it determines what kind of java.lang.reflect.Type the field is, and recursively resolves it. You can look at the implementation here.

A caveat

You might have noticed that I haven’t discussed multiple bounds, like T extends Interface1 & Interface2, which are also allowed in Java generics. In all honesty, I only thought of them while researching this article. EqualsVerifier 2.0 has been out for several months by now, so I suppose these multiple bounds aren’t used often enough for people to have run into this. Nevertheless, it’s obviously quite high on my list of future improvements.

Summary

We have seen how we can determine the full, generic type of a class’s field. In Part 2, we will see how we can put this information to good use.

Get to grips with your build, with Gradle

Last month, the Dutch Java Magazine published an article written by Hanno Embregts and myself. In it, we describe how we used Gradle in our project at the Dutch Railways.

It’s a big, monolithic project with ± 25.000 lines of Ant scripts, which we wanted to break up into separate components that can be built and released separately. Doing this with Ant would add more complexity to an already much too complex system of scripts. Maven wasn’t a good fit either, because being built with Ant since its inception, our project didn’t follow Maven’s conventions, and shoe-horning it into these conventions would be impractical. Gradle provided the right mix of structure and flexibility.

We started our migration on a small subcomponent with no dependencies to other parts of our codebase, to work our way up. That way we wouldn’t have to deal with deploying the moving target of our project, which was actively in development during the migration, into Nexus so our Gradle scripts could consume it. We tried it; it didn’t work. It’s much easier to go the other way round, to consume artifacts from Nexus that were already migrated to Gradle and put under a strict regimen of Semantic Versioning.

Still, our existing Ant scripts needed to deal with our new Gradle-built artifacts. We decided it would treat these like any other 3rd party dependency, which worked fine for the most part. However, we found we had to be careful that the content of artifacts was identical to the ones built by the original Ant scripts, especially in the case of MANIFEST.MF files and various kinds of XML files embedded in the JARs. We had tools to test this, and we even went so far as to change Gradle’s unit test output layout to match that of Ant, so we could compare those easily using a diff viewer.

In the end, we were very happy with the result, even if it took more effort than we expected initially. If you want to know more about the reasoning behind some of the choices we made, or how we solved some of the other issues we ran into, you can read the full article in the physical magazine, or you can read it online. Note: it’s in Dutch.

Foobal, Moneyball

Ever since Moneyball came out, winning at sports using statistics has become more and more mainstream. As I’ve blogged before, I have also been getting in on some of that action: I’ve been competing in my family’s soccer match betting game, where we guess the results of NAC Breda, our favourite team. But instead of predicting the outcomes myself, I’m using Foobal, a little Scala program that I’ve written for this very purpose.

And, like the Oakland A’s, FC Midtjylland and Brentford FC, this has not been without success: I won this year’s pool! And by quite some margin, too. Here’s the trophy:

Trophy

On the last few lines, it says J10 anO15 which really should have been 2015 JanO. I guess the thick black line got in the way :). I’ll keep it until next year, when I pass it on to the next winner.

Anyway, Foobal probably did so well because NAC has been playing very, err… consistently this season. Sadly, due to this consistency, NAC were also relegated from the Honor Division to the First Division. I must say, it’s sad to win the pool under such sad circumstances. To Foobal’s moral credit, though, it did predict a win for the final and deciding match, which was lost during extra time.

The relegation also means that next year, NAC will have to face many teams that Foobal does not have data on. This will be quite a challenge, but it gives me a good opportunity to tweak the algorithm a bit and see if I can come up with something even more successful. I’m only getting started!

Hacking Java enums

The other day, I was debugging some enum related code in EqualsVerifier. I had this enum:

enum CallMe { YES, NO, MAYBE }

And two variables, original and clone, containing a value of said enum. Here’s what the bug looked like in Eclipse’s debugger:

Call Me Maybe

So, what do we see here? We see two variables of type EnumHack$CallMe (the enum was an inner class, so that makes sense). Both enums have the same name and ordinal, so they are equal. They also have different ids (33 and 34, respectively), so they’re not the same object.

Wait, what!?

That’s right: two different instances of the same enum constant. In other words, original.equals(clone) returns false, even though both variables are set to CallMe.MAYBE. How is that even possible? I thought it wasn’t. In the words of Joshua Bloch:

This approach [of using a one-element enum to implement a singleton, JO] provides an ironclad guarantee against multiple instantiation, even in the face of sophisticated serialization or reflection attacks.

The Java Language Specification, section 8.9 says this:

An enum type has no instances other than those defined by its enum constants. It is a compile-time error to attempt to explicitly instantiate an enum type (ยง15.9.1).

The final clone method in Enum ensures that enum constants can never be cloned, and the special treatment by the serialization mechanism ensures that duplicate instances are never created as a result of deserialization. Reflective instantiation of enum types is prohibited. Together, these four things ensure that no instances of an enum type exist beyond those defined by the enum constants.

Clearly, despite the fact that an enum constant can never be cloned, I had a clone on my hands.

So what happened? I traced the problem back to this:

@Test
public void hackAnEnum() throws Exception {
    CallMe original = CallMe.MAYBE;
    CallMe clone = ObjectAccessor.of(original).copy();
    
    assertEquals(original.name(), clone.name());
    assertEquals(original.ordinal(), clone.ordinal());
    assertFalse(original.equals(clone));
    assertFalse(original == clone);
}

I added the asserts so you can see for yourself what’s going on: if you copy/paste this to your IDE and put EqualsVerifier on the classpath, you’ll be able to run it. This test passes, which means that both original.equals(clone) and original == clone are, indeed, false.

So what does this ObjectAccessor do? It’s part of EqualsVerifier’s reflection library, and as you probably guessed, it makes a copy of the given object. If I factor out all EqualsVerifier code, I end up with this:

CallMe clone = new ObjenesisStd().newInstance(CallMe.class);
for (Field f : Enum.class.getDeclaredFields()) {
    f.setAccessible(true);
    f.set(clone, f.get(original));
}

Apart from the reference to ObjenesisStd, this is all standard Java reflection code. So, what is this Objenesis thing, then? From their website:

Objenesis is a small Java library that serves one purpose: To instantiate a new object of a particular class.

In other words, it can instantiate any object, without calling its constructor. So how does Objenesis work, exactly? Well, that depends. It uses the Strategy pattern to choose from several different ways of instantiating objects, depending on what kind of JVM you’re running, and probably some other factors, too. In my case, it expanded to this:

Constructor<Object> objectConstructor =
    Object.class.getConstructor((Class[]) null);
Class<?> reflectionFactoryClass =
    Class.forName("sun.reflect.ReflectionFactory");
Method method = reflectionFactoryClass.getDeclaredMethod("getReflectionFactory");
Object reflectionFactory = method.invoke(null);
Method newConstructorForSerializationMethod =
    reflectionFactoryClass.getDeclaredMethod("newConstructorForSerialization", Class.class, Constructor.class);
Constructor<CallMe> ctr = (Constructor<CallMe>)
    newConstructorForSerializationMethod.invoke(reflectionFactory, CallMe.class, objectConstructor);
CallMe clone = ctr.newInstance((Object[]) null);

That’s some incredibly hairy scary code. Let’s pretend we never saw this. The only thing we need to remember is that all this can be done without resorting to actually changing the bytecode at runtime; it’s all reflection.

While Objenesis maybe a relatively unknown library, it is actually pretty widely used. The most famous libraries that use it are Mockito (and basically all mocking frameworks), and Spring Framework. But there are more. Many more. EqualsVerifier uses it to instantiate values for the fields of the class it’s testing.

OK, now we that know this, can we go one further? It turns out we can:

Call Me Sometime

We can add our own enum constants. Here’s how:

CallMe sometime = new ObjenesisStd().newInstance(CallMe.class);
Field ordinal = Enum.class.getDeclaredField("ordinal");
ordinal.setAccessible(true);
ordinal.set(sometime, 4);
Field name = Enum.class.getDeclaredField("name");
name.setAccessible(true);
name.set(sometime, "SOMETIME");

It’s actually pretty simple. I guess the JVM’s guarantees aren’t ironclad enough for this particular sophisticated reflection attack. And to think I actually found it by accident! I fixed the bug in EqualsVerifier before it ever got released into production, so all’s well that ends well, I guess.

P.S. Oh and don’t try this at home kids! …or at least, not in production.