0

I am running a dataflow job (Apache Beam SDK 2.1.0 Java, Google dataflow runner) and I need to read from the Google DataStore "distinctly" on one particular property. (like the good old "DISTINCT" keyword in SQL). Here is my code snippet :

Query.Builder q = Query.newBuilder();
q.addKindBuilder().setName("student-records");
q.addDistinctOn(PropertyReference.newBuilder().setName("studentId").build());
pipeline.apply(DatastoreIO.v1().read().withProjectId("project-id").withQuery(q.build()));
pipelilne.run();

When the pipeline runs, the read() fails due to the following error:

java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: com.google.datastore.v1.client.DatastoreException: Inequality filter on key must also be a group by property when group by properties are set., code=INVALID_ARGUMENT

Could someone please tell me where I am going wrong.

Venky
  • 396
  • 4
  • 18
  • This seems like a Datastore error rather than a Dataflow error. Have you looked at a similar question https://stackoverflow.com/questions/32609684/error-using-filter-and-projection-in-objectify-google-datastore/32610795 ? – jkff Sep 25 '17 at 05:52
  • I tried adding "__key__" to the addDistinctOn function, but that didn`t seem to help . I thought the error message suggested that – Venky Sep 25 '17 at 07:54
  • Hmm I wonder if this is a bug in Datastore's query splitting. Are you able to execute the same query using plain Datastore API without Beam? – jkff Sep 25 '17 at 16:29
  • Can you try adding studentId in group by clause. Hope this helps. As I am using BQ facing same problem. So instead we use group by clause. And is working as expected for BQ. – Jack Sep 26 '17 at 07:55
  • I am not very familiar yet with the Datastore V1 API, outside the context of dataflow . Will try out the suggestions mentioned ... – Venky Sep 30 '17 at 22:08

0 Answers0