14

Often there is the need to transform results for a query like:

select category, count(*)
from table
group by category

to a map in which keys are categories and values are count of records belonging to the same category.

Many persistence frameworks return the results of such a query as List<Object[]>, where object arrays contain two elements (category and the count for each returned result set row).

I am trying to find the most readable way to convert this list to the corresponding map.

Of course, traditional approach would involve creating the map and putting the entries manually:

Map<String, Integer> map = new HashMap<>();
list.stream().forEach(e -> map.put((String) e[0], (Integer) e[1]));

The first one-liner that came to my mind was to utilize the out of the box available Collectors.toMap collector:

Map<String, Integer> map = list.stream().collect(toMap(e -> (String) e[0], e -> (Integer) e[1]));

However, I find this e -> (T) e[i] syntax a bit less readable than traditional approach. To overcome this, I could create a util method which I can reuse in all such situations:

public static <K, V> Collector<Object[], ?, Map<K, V>> toMap() {
  return Collectors.toMap(e -> (K) e[0], e -> (V) e[1]);
}

Then I've got a perfect one-liner:

Map<String, Integer> map = list.stream().collect(Utils.toMap());

There is even no need to cast key and value because of type inference. However, this is a bit more difficult to grasp for other readers of the code (Collector<Object[], ?, Map<K, V>> in the util method signature, etc).

I am wondering, is there anything else in the java 8 toolbox that could help this to be achieved in a more readable/elegant way?

Dragan Bozanovic
  • 23,102
  • 5
  • 43
  • 110
  • 3
    You already have a working code that is a single line. I'm not sure what more "tools" you need. What kind of answers are you interested in? – Tunaki Feb 28 '16 at 22:43
  • 2
    What you're doing seems fine except I would pass a `Class` and `Class` to `toMap` so that the casts can be checked. – Radiodef Feb 28 '16 at 22:49
  • 2
    @Tunaki True. But I think that it would be beneficial for me and for the others to see any examples of how this can be further improved, so that it can be applied in this and similar use cases. – Dragan Bozanovic Feb 28 '16 at 22:56
  • 1
    Why not use a persistence framework that allows to map the data to java objects? Like **JPA** - you can map your results to a simple *persistence entity*, containing a String name and an Integer count. – slartidan Feb 28 '16 at 23:05
  • Your utility method performs unchecked operations and therefore is *not* an improvement over you first and second approaches which reflect what’s going on and won’t create heap pollution. – Holger Feb 29 '16 at 11:29
  • @Holger What's the difference between casts via type inference and explicit ones? If the underlying `Object[]` arrays contain wrong type, I'll get `ClassCastException` both ways. I agree with Radiodef that `Class` objects would need to be passed, but I would say only in situations in which inference would not work, like [chained methods](http://stackoverflow.com/questions/24794924/generic-type-inference-not-working-with-method-chaining) and similar. Do I miss something? – Dragan Bozanovic Feb 29 '16 at 11:40
  • 2
    No, with your utility method you *don’t* get class cast exceptions if the types don’t match—you’ll get a corrupted `Map` instead (that’s called “heap pollution”). The unavoidable exceptions might occur much later at a completely unrelated place where the connection to this cause is hard to trace. (Keep in mind, the problem won’t get detected when you transfer elements using `Collection.addAll`, etc, the pollution might spread) That’s why you should get compiler warnings about “unchecked operations” in your utility method. – Holger Feb 29 '16 at 11:45
  • Using `Class` parameters und `Class.cast` will replace the unchecked operation with a safe idiom and solve the compiler warnings, but `.collect(Utils.toMap(String.class,Integer.class));` is not necessarily better than `.collect(toMap(e -> (String)e[0], e -> (Integer)e[1]));` – Holger Feb 29 '16 at 11:49
  • 1
    @Holger Indeed, you are right about heap pollution here. Thanks for the explanation. – Dragan Bozanovic Feb 29 '16 at 11:56

2 Answers2

14

I think your current 'one-liner' is fine as is. But if you don't particularly like the magic indices built into the command then you could encapsulate in an enum:

enum Column {
    CATEGORY(0), 
    COUNT(1);

    private final int index;

    Column(int index) {
        this.index = index;
    }

    public int getIntValue(Object[] row) {
        return (int)row[index]);
    }

    public String getStringValue(Object[] row) {
        return (String)row[index];
    }
}

Then you're extraction code gets a bit clearer:

list.stream().collect(Collectors.toMap(CATEGORY::getStringValue, COUNT::getIntValue));

Ideally you'd add a type field to the column and check the correct method is called.

While outside the scope of your question, ideally you would create a class representing the rows which encapsulates the query. Something like the following (skipped the getters for clarity):

class CategoryCount {
    private static final String QUERY = "
        select category, count(*) 
        from table 
        group by category";

    private final String category;
    private final int count;

    public static Stream<CategoryCount> getAllCategoryCounts() {
        list<Object[]> results = runQuery(QUERY);
        return Arrays.stream(results).map(CategoryCount::new);
    }

    private CategoryCount(Object[] row) {
        category = (String)row[0];
        count = (int)row[1];
    }
}

That puts the dependency between the query and the decoding of the rows into the same class and hides all the unnecessary details from the user.

Then creating your map becomes:

Map<String,Integer> categoryCountMap = CategoryCount.getAllCategoryCounts()
    .collect(Collectors.toMap(CategoryCount::getCategory, CategoryCount::getCount));
Dragan Bozanovic
  • 23,102
  • 5
  • 43
  • 110
sprinter
  • 27,148
  • 6
  • 47
  • 78
  • Good approach. I felt that method references could be exploited somehow instead of `e -> (T) e[i]` syntax. – Dragan Bozanovic Feb 28 '16 at 23:11
  • 1
    So you’re replacing the “magic indices” with even more magic, relying on `enum` declaration order and hiding the necessary type casts even deeper in reflective operations. The code is still relying on unwritten conventions about the array contents, but only *looks* like there was more to it. By the way, `Array.getInt` doesn’t perform unboxing conversions, so it doesn’t even work here. – Holger Feb 29 '16 at 11:40
  • 1
    @Holger I like the reasoning in this answer, it doesn't have to be exactly like this. It could be `return (Integer) row[ordinal()]`, to make it work or something completely different but based on this concept. I find construct `toMap(KEY::getStringValue, COUNT::getIntValue)` more readable than `e -> (String) e[0], e -> (Integer) e[1])`. – Dragan Bozanovic Feb 29 '16 at 11:46
  • @Holger That's a pretty minor implementation detail rather than a comment on the general design of using an enum to encapsulate details of columns. I've added an index field and changed to standard casts though personally I prefer the design of having the enum in column order and using ordinal. When I use this pattern in practice I also use a standard data type enum to record the type of each column and ensure the correct cast is requested. – sprinter Feb 29 '16 at 21:42
  • 1
    Well, if you also use that `enum` to create the array, it’s indeed much cleaner. However, that implies that you have control over the creation and consumption and thus, makes it less understandable to use an array at all instead of a dedicated object holding the values of the right type in variables of the right name… – Holger Mar 01 '16 at 09:51
  • @Holger yes that's a good point. Neater would be a class representing each row which embeds knowledge of the query and how to interpret it. In fact for the sack of clarity I'll add that to the answer. – sprinter Mar 02 '16 at 00:40
2

Instead of hiding the class cast, I would make couple of functions to help with readability:

Map<String, Integer> map = results.stream()
        .collect(toMap(
                columnToObject(0, String.class),
                columnToObject(1, Integer.class)
        ));

Full example:

package com.bluecatcode.learning.so;

import com.google.common.collect.ImmutableList;

import java.util.List;
import java.util.Map;
import java.util.function.Function;

import static java.lang.String.format;
import static java.util.stream.Collectors.toMap;

public class Q35689206 {

    public static void main(String[] args) {
        List<Object[]> results = ImmutableList.of(
                new Object[]{"test", 1}
        );

        Map<String, Integer> map = results.stream()
                .collect(toMap(
                        columnToObject(0, String.class),
                        columnToObject(1, Integer.class)
                ));

        System.out.println("map = " + map);
    }

    private static <T> Function<Object[], T> columnToObject(int index, Class<T> type) {
        return e -> asInstanceOf(type, e[index]);
    }

    private static <T> T asInstanceOf(Class<T> type, Object object) throws ClassCastException {
        if (type.isAssignableFrom(type)) {
            return type.cast(object);
        }
        throw new ClassCastException(format("Cannot cast object of type '%s' to '%s'",
                object.getClass().getCanonicalName(), type.getCanonicalName()));
    }
}
Paweł Prażak
  • 3,091
  • 1
  • 27
  • 42