5

I apologize for opening another question about this general issue, but none of the questions I've found on SO seem to relate closely to my issue.

I've got an existing, working dataflow pipeline that accepts objects of KV<Long, Iterable<TableRow>> and outputs TableRow objects. This code is in our production environment, running without issue. I am now trying to implement a unit test with direct runner to test this pipeline, however, but the unit test fails when it hits the line

LinkedHashMap<String, Object> evt = (LinkedHashMap<String, Object>) row.get(Schema.EVT);

in the pipeline, throwing the error message:

java.lang.ClassCastException: com.google.gson.internal.LinkedTreeMap cannot be cast to java.util.LinkedHashMap

A simplified version of the existing dataflow code looks like this:

public static class Process extends DoFn<KV<Long, Iterable<TableRow>>, TableRow> {

    /* private variables */

    /* constructor */

    /* private functions */

    @ProcessElement
    public void processElement(ProcessContext c) throws InterruptedException, ParseException {       

      EventProcessor eventProc = new EventProcessor();
      Processor.WorkItem workItem = new Processor.WorkItem();
      Iterator<TableRow> it = c.element().getValue().iterator();

      // process all TableRows having the same id
      while (it.hasNext()) {
        TableRow item = it.next();

        if (item.containsKey(Schema.EVT))
          eventProc.process(item, workItem);
        else
          /* process by different Proc class */
      }

      /* do additional logic */

      /* c.output() is somewhere far below */

    }
}

public class EventProcessor extends Processor {

    // Extract data from an event into the WorkItem
    @SuppressWarnings("unchecked")
    @Override
    public void process(TableRow row, WorkItem item) {    

        try {
          LinkedHashMap<String, Object> evt = (LinkedHashMap<String, Object>) row.get(Schema.EVT);
          LinkedHashMap<String, Object> profile = (LinkedHashMap<String, Object>) row.get(Schema.PROFILE);

          /* if no exception, process further business logic */

          /* business logic */

        } catch (ParseException e) {
            System.err.println("Bad row");
        }
    }
}      

The relevant portion of the unit test, which prepares the main input to the Process() DoFn, looks like this:

Map<Long, List<TableRow>> groups = new HashMap<Long, List<TableRow>>();
List<KV<Long, Iterable<TableRow>>> collections = new ArrayList<KV<Long,Iterable<TableRow>>>();    
Gson gson = new Gson();    

// populate the map with events grouped by id
for(int i = 0; i < EVENTS.length; i++) {
  TableRow row = gson.fromJson(EVENTS[i], TableRow.class);
  Long id = EVENT_IDS[i];

  if(groups.containsKey(id))
    groups.get(id).add(row);
  else
    groups.put(id, new ArrayList<TableRow>(Arrays.asList(row)));        
}

// prepare main input for pipeline
for(Long key : groups.keySet())
  collections.add(KV.of(key, groups.get(key)));

The line which is causing the issue is gson.fromJson(EVENTS[i], TableRow.class);, which appears to be encoding the internal representation of the TableRow as the wrong type of LinkedTreeMap.

The encoded type of the TableRow appears to be com.google.gson.internal.LinkedTreeMap instead of the expected java.util.LinkedHashMap. Is there a way I can cast the TableRow being created in my unit test to the correct type of java.util.LinkedHashMap, so that the unit test succeeds without making any changes to the existing dataflow code that already works in production?

tk421
  • 5,775
  • 6
  • 23
  • 34
Max
  • 808
  • 11
  • 25
  • Please let me know if more code is needed to add context. – Max Dec 29 '17 at 19:42
  • Why do you need to cast to `LinkedHashMap` and not just to `Map`? – Roman Puchkovskiy Dec 29 '17 at 19:45
  • @RomanPuchkovskiy To be honest, I'm not sure. The cast to `LinkedHashMap` was existing code from sometime before. Would changing the cast to `Map` have any consequences? – Max Dec 29 '17 at 19:51
  • Exactly. Code against the interface and replace `LinkedHashMap` with `Map` instead. Then you don't need to care which map implementation the API happens to return. – Mick Mnemonic Dec 29 '17 at 19:51
  • @Max If `LinkedTreeMap ` is a child of `Map` it shouldn't be a problem, unless you use something specific from that class. but I think your best bet is to just try and see if it works – Andrei Sfat Dec 29 '17 at 19:52
  • https://github.com/google/gson/blob/master/gson/src/main/java/com/google/gson/internal/LinkedTreeMap.java It is an `AbstractMap`, so it is a `Map`. If the following code does not use anything specific to `LinkedHashMap` and `HashMap`, it should be ok. But of course, I would test first. – Roman Puchkovskiy Dec 29 '17 at 19:54
  • @RomanPuchkovskiy That appears to have worked! The unit test at least is not failing out anymore, but I'll have to run the main code through dataflow again. Thank you for the help, Roman! – Max Dec 29 '17 at 20:42
  • I would change the cast to new like new LinkedHashMap(row.get(Schema.EVT)); This way the code will keep working regardless of the return type from row.get(...) and you could keep the type of LinkedHashMap in your code which signals to the reader that the type is Map but also a one which keeps the order. Perhaps it is unfortunate that there is not an interface for the semantics 'hey, this is a Map, but at the same time a one which keeps the data in insertion order'. – Michal Dec 29 '17 at 21:26
  • @Max Great, I'm glad it helped :) I've reposted the solution as an answer, so you could accept it ;) – Roman Puchkovskiy Dec 30 '17 at 10:49

3 Answers3

6

Reposting the solution as an answer.

It is not recommended to cast to concrete classes if you do not use their specific features. In this case, it is better to cast to Map instead of LinkedHashMap. Gson's LinkedTreeMap is a Map too, so no problem should arise.

Roman Puchkovskiy
  • 11,415
  • 5
  • 36
  • 72
1

I would consider (not only that particular) cast a code smell. Every time a cast is coded, a risk is taken that a ClassCastException happens.

As the others already said, the Map interface could be used like Map<String, Object> evt = row.get(Schema.EVT);.

Alternatively, a new LinkedHashMap could be constructed by new LinkedHashMap<String, Object>(row.get(Schema.EVT));.

The second approach has the advantage of keeping the LinkedHashMap type, which might or might not be important, that depends on your scenario.

Michal
  • 2,353
  • 1
  • 15
  • 18
0

It's because LinkedHashMap is not superior to LinkedTreeMap thus they might not have the same methods. The Java compiler so thinks that casting it that way might result in evt having different methods than row.get(Schema.EVT), resulting bad stuff. However, you can cast LinkedTreeMap into AbstractMap, Map or Object as they're all superior to it. So (as many comments point out) to fix it, just use

Map<String, Object> evt = row.get(Schema.EVT);

and you should be fine.

Marko Zajc
  • 307
  • 3
  • 14