Tag Archives: collection

Java Tips: Initializing Collection

Especially in unit test, it is common case that we have to initialize an array or a collection.

Well, for array, it’s OK… A simple code that we know can solve the problem:

String[] s = new String [] {"1", "2"};

But how about Collection? Normal way to initialize collection is something like this (which pretty ugly):

List<String> s = new ArrayList<String>();
s.add("1");
s.add("2");

I hardly find an elegant solution until I see this post. There are at least three better solution for the case.

First solution:

List<String> s = new ArrayList<String>() {{ add("1"); add("2"); }};

Which unfortunately, doesn’t pass Java Code Convention (that is, if you format the code, it will become uglier than the original).

List<String> s = new ArrayList<String>() {
   {
      add("1");
      add("2");
   }
};

Second solution:

List<String> s = Arrays.asList(new String[]{"1", "2"});

This solution is the best if you use Java 1.4 or before. But if you use Java 5, the third is more elegant:

List<String> s = Arrays.asList("1", "2");

Great!

EDIT: this solution will create a fixed size BUT modifiable collection, so you may want to wrap it with ArrayList (or other Collection class) to make it writable.

Computing Map on Google Collections

2009-11-13_1134

Google always makes interesting projects. My toy nowadays is Google Collections. I don’t think I need to reintroduce it as it has been nicely covered on several blog posts:

Of course, two videos from GTUG are also nice.

Now I want to discuss one functionality from Google Collections which is not really covered by those previous articles. This functionality is called Computing Map. No.. no.. you won’t find a class with such name in their Javadoc.

It is basically a map, where the keys are parameters for a calculation, and the values are results of the calculation. Probably you ever faced such scenario where you need to do a lot of computations using complex algorithm? Fortunately, says that many of those calculations are done using the same parameters. So, instead of doing the same operation over and over again, would it be better to just cache the result and using it later?

That is basically the idea and you can even implement it without using Google Collections. You can code something like this:

private final Map<Parameter, Result> cache = new HashMap<Parameter, Result>(100000);

public Result getResult(Parameter p) {
        if (!cache.containsKey(p)) {
            prepareCache(p); // the complex calculation
        }

        return cache.get(p);
}

Easy, yes?

But wait… there is a problem with such code. First, how if two calculations are done at almost the same time? You won’t get the wrong result, but the code will still do the calculation twice due to thread problem. No easy solution for such case, double-checked locking is just failed.

And more problems may arise. With time it is possible that there are so many parameters used and your Map will grow without limit. This is standard problem for any cache implementation and using soft reference or any third-party cache implementation may solve the problem.

At the end, our solution is not just so simple anymore.

Here Google Collections may help us. The MapMaker is a very powerful factory-class that allow you to combine features of a Map you can think of. Need a Map with soft reference key and weak reference value? Need a map with strong key and soft reference value? MapMaker will allow you to do that… the easy way.

And it provides us with a Computing Map. A computing map is created with MapMaker by calling the method ‘makeComputingMap’ and defining a function that will transform Parameter to Result.

Our example before will be something like this:

private final Map<Parameter, Result> cache;

public Cache {
    cache = new MapMaker().makeComputingMap(new Function<Parameter, Result>() {

            @Override
            public Result apply(Parameter from) {
                return prepareCache(from);
            }
        });
}

public Result getResult(Parameter p) {
        return cache.get(p);
}

That is basically all. The documentation of the method is like this:

Builds a map that supports atomic, on-demand computation of values.
Map#get either returns an already-computed value for the given key,
atomically computes it using the supplied function, or, if another thread
is currently computing the value for this key, simply waits for that thread
to finish and returns its computed value. Note that the function may be
executed concurrently by multiple threads, but only for distinct keys.

If an entry’s value has not finished computing yet, query methods
besides get return immediately as if an entry doesn’t exist. In
other words, an entry isn’t externally visible until the value’s
computation completes.

Map#get on the returned map will never return null. It
may throw:

  • NullPointerException if the key is null or the computing
    function returns null

  • ComputationException if an exception was thrown by the
    computing function. If that exception is already of type
    ComputationException, it is propagated directly; otherwise it is
    wrapped.

Note: Callers of get must ensure that the key
argument is of type K. The get method accepts
Object, so the key type is not checked at compile time. Passing an object
of a type other than K can result in that object being unsafely
passed to the computing function as type K, and unsafely stored in
the map.

If put is called before a computation completes, other
threads waiting on the computation will wake up and return the stored
value. When the computation completes, its new result will overwrite the
value that was put in the map manually.

This method does not alter the state of this MapMaker instance,
so it can be invoked again to create multiple independent maps.

So you’ll get synchronization freely. And best of all, the synchronization doesn’t lock the whole Map, only threads that access the same key.

But there is still a problem with that code… It’s still using strong reference for both keys and values. That’s the default implementation if you don’t specify anything in the MapMaker. Your map will still grow limitless and you will eventually get an OutOfMemoryException.

Well, it’s easy… you can just add a call (softValues) to the creation.

private final Map<Parameter, Result> cache;

public Cache {
    cache = new MapMaker().softValues().makeComputingMap(new Function<Parameter, Result>() {

            @Override
            public Result apply(Parameter from) {
                return prepareCache(from);
            }
        });
}

public Result getResult(Parameter p) {
        return cache.get(p);
}

Now you have a proper implementation of computing map. The values and keys will be hold as long as you have enough memory, but once it need more memory, GC will remove the entries from the Map. Your application will need to calculate the complex calculation again but I think it’s the best achievement of what we can get. Of course you can increase the JVM memory easily anytime you want.

Note that you don’t want to use softKeys. Look at the Javadoc of softKeys.

Note: the map will use identity ({@code ==}) comparison
to determine equality of soft keys, which may not behave as you expect.
For example, storing a key in the map and then attempting a lookup
using a different but {@link Object#equals(Object) equals}-equivalent
key will always fail.

Hmm… that means that your key will be considered equals if it is the same object. If you recreated the Parameter with the same value and even if you override the equals and hashCode correctly, you will not using the pre-computed value. On the other hand, using just softValues is enough, because once the values is GC-ed, the keys will be removed as well. See this bug entry for more information: http://code.google.com/p/google-collections/issues/detail?id=250 or this dialog in the groups: http://groups.google.com/group/google-collections-users/browse_frm/thread/8e4bd19f5cfa9adb/24e9d9de34fadb6f?lnk=gst&q=soft+reference+identity#24e9d9de34fadb6f.

And if you still think that you have a use case for equality soft reference, I have a patch to the MapMaker you can use. It’s not nice and pretty hacky, but it works as far as I can say. I personally don’t use it anymore but maybe I will in the future (if I find a strong use case for that, which I doubt).

MapMaker.java

Java Tips: Optimizing your Map loop

Quite often, a program needs to go through all elements of a Map. Unfortunately, like a Set, a Map doesn’t have index in the data structure so you can’t just get a key of certain index or a value of certain index.

The most common practice used to iterate all elements in a Map is to get the key set and then based on the key, we can retrieve the value. The template we use for such case is something like this:

for (String k : m.keySet()) {
    Integer v = m.get(k);
   // do something with the key and value
}

This works of course, but based on quick observation, this should be not that efficient because we have to retrieve value using key for every step. It would be much better to get the key and value as a pair in the beginning. This is unfortunately less obvious and less used. The template would be something like this (of course, we need to change the generic type as properly):

for (Entry<String, Integer> e : m.entrySet()) {
    Integer v = e.getValue();
    // do something with the key and value
}

Now let’s make some tests. I use this template for the test and changing the backing object, number of iteration, and the method to get the key and value from the map (of course it’s not that real, because we only use the value and ignore the key, in which case we can use values() for better performance).

public static void main(String[] args) {
	java.util.Map<String, Integer> m = new TreeMap<String, Integer>();
	for (int i = 0; i < 500000; i++) {
		m.put(i + "", i);
	}

	List<Integer> l = new ArrayList<Integer>();

	long st = System.currentTimeMillis();

	for (String string : m.keySet()) {
		l.add(m.get(string));
	}

	System.out.println(System.currentTimeMillis() - st);
}

From my tests, I observe that the performance of the first template depends on the backing object. If we are using HashMap, the performance is less or more the same as the second template. If we are using Hashtable, the performance of the first template is about 1.5 times worse as the second template and if we are using TreeMap, the performance of the first template is more than 5 times worse as the second template. This proved my original hypotheses and we as developers should try to change our instinct to use the second template every time we encounter such problem.

UPDATE: I redo the test using nanotime and it’s just confirming my original observation.