0

I am new to Apache beam,I am using Apache beam and as runner using Dataflow in GCP.I am getting following error while executing pipeline.

coder of type class org.apache.beam.sdk.coders.ListCoder has a #structuralValue method which does not return true when the encoding of the elements is equal. Element [Person [businessDay=01042020, departmentId=101, endTime=2020-04-01T09:06:02.000Z, companyId=242, startTime=2020-04-01T09:00:33.000Z], Person [businessDay=01042020, departmentId=101, endTime=2020-04-01T09:07:47.000Z, companyId=242, startTime=2020-04-01T09:06:03.000Z], Person [businessDay=01042020, departmentId=101, endTime=2020-04-01T09:48:25.000Z, companyId=242, startTime=2020-04-01T09:07:48.000Z]]

PCollection is like PCollection< KV < String,List < Person > > > and PCollection< KV < String,Iterable < List < Person > > > >

I have implemented Person as serializable POJO class and override equals and hash method also.But i think i need to write custom ListCoder for person also and register in the pipeline. I am not sure how to resolve this issue,please help.

1 Answers1

2

Here is a working example. If you clone the repo, under the playground root dir, run ./gradlew run, then you can verify the effect. You could also run with ./gradlew run --args='--runner=DataflowRunner --project=$YOUR_PROJECT_ID --tempLocation=gs://xxx/staging --stagingLocation=gs://xxx/staging' to run it on Dataflow.

The Person class should look like this if you build it from scratch:

class Person implements Serializable {
  public Person(
      String businessDay,
      String departmentId,
      String companyId
  ) {
    this.businessDay = businessDay;
    this.departmentId = departmentId;
    this.companyId = companyId;
  }

  public String companyId() {
    return companyId;
  }

  public String businessDay() {
    return businessDay;
  }

  public String departmentId() {
    return departmentId;
  }

  @Override
  public boolean equals(Object other) {
    if (this == other) {
      return true;
    }
    if (other == null) {
      return false;
    }
    if (getClass() != other.getClass()) {
      return false;
    }
    Person otherPerson = (Person) other;
    return this.businessDay.equals(otherPerson.businessDay)
        && this.departmentId.equals(otherPerson.departmentId)
        && this.companyId.equals(otherPerson.companyId);
  }

  @Override
  public int hashCode(){
    return Objects.hash(this.businessDay, this.departmentId, this.companyId);
  }

  private final String businessDay;
  private final String departmentId;
  private final String companyId;
}

I recommend

  • using AutoValue instead of creating POJO from scratch. Here are some examples. You can view the whole project here. The advantage is that you don't have to implement the equals and hashCode from scratch every time you create a new object type.

  • In the KV, if the key is an iterable such as a List, wrap it in an object and explicitly deterministically serialize it (example) because the serialization in Java is underterministic.

ningk
  • 1,298
  • 1
  • 7
  • 7
  • Thanks for your response , In Pcollection key is String only Value is List as i mentioned in question description. – akash kumar Jun 14 '20 at 10:17
  • Thanks for your response , In Pcollection key is String only, Value is List as i mentioned in question description.Look like when dataflow is executing and finding in List all object are same(Person object) so it is giving warning that it should return true,but Person object attribute values are different.I do not know how to resolves this warning. – akash kumar Jun 14 '20 at 10:25
  • My guess is that your equals method might not be implemented correctly. Here is a working [example](https://github.com/KevinGG/diary/blob/master/playground/src/main/java/com/google/dataflow/eou/diary/playground/Playground.java). When `equals` is not implemented, I do see that warning. But once `equals` is in-place, the warning is gone. – ningk Jun 15 '20 at 23:06
  • Thanks there was one small mistake in equals method.Comparing integers.I have accepted answer.. – akash kumar Jun 17 '20 at 08:52
  • getting another warning related to equals method.Can't verify serialized elements of type BoundedSource have well defined equals method.I have overrided equals method like if (obj instanceof Person){ Person otherPerson = (Person) other; return Objects.equals(this.businessDay,otherPerson.businessDay) && Objects.equals(this.departmentId,otherPerson.departmentId) && Objects.equals(this.companyId,otherPerson.companyId) } what is problem with equals method.can not use AutoValue, i will have to change at many places and even other team member are using this class. – akash kumar Jun 17 '20 at 12:10
  • When using `AutoValue`, if the members are primitive types, I think the `equals` method is automatically implemented appropriately, so you don't have to define any `equals` method. Just delete your `equals` method, the error should be gone. – ningk Jun 17 '20 at 18:08
  • I can not use AutoValue if i will use i have to change at many places.once i tried once but it was giving error like AutoValue_Person not found.I am not sure why apache beam does not allow equals method. – akash kumar Jun 18 '20 at 06:40
  • What are other options if i do not want to use Autovalue. – akash kumar Jun 18 '20 at 08:27
  • If you do not use AutoValue, you have to make sure all the `equals` and `hashCode` are implemented correctly in a deterministically way. – ningk Jun 19 '20 at 18:07