1

I want to build an aggregate query on my data. I have Patents class that have references of Paragraphs classes (paragraphs that have vectorized text), I want to count patents for each catagory (property of patent) that are near vector.

in psuedo SQL:

select (count distinct Patent)
from myweaviate
where Paragraph.nearVector(vector, certainty=0.9)
group by catagory

I tried using something like (which is also bad even if it worked because it counts paragraphs):

result = (client.query.aggregate("Paragraph") \
    .with_group_by_filter(["inPatent{... on Patent{publicationID}"]) \
    .with_fields('meta { count }') \
    .with_fields('groupedBy {value}') \
    .with_near_vector({'vector': vector, 'certainty': 0.8}) \
    .do())

and getting:

{'data': {'Aggregate': {'Paragraph': None}}, 'errors': [{'locations': [{'column': 12, 'line': 1}], 'message': "could not extract groupBy path: Expected a valid property name in 'path' field for the filter, but got 'inPatent{... on Patent{publicationID}'", 'path': ['Aggregate', 'Paragraph']}]}

I couldn't find any source in the docs or in the internet to do something like that, (aka use aggregate on reference property), additionally, doing a count distinct (but in this case the Patent class is distinct of course) can anyone help?

Adriaan
  • 17,741
  • 7
  • 42
  • 75
TechAlon
  • 11
  • 1

1 Answers1

1

unfortunately it is not possible to do grouping by cross-references. The error in your case means that you did not construct a valid path, that is because the path needs to be a list where each item is a valid configuration, i.e. the path should be like this: path: ["inPatent", "Patent", "publicationID"]. It goes property -> class name -> property -> class name -> ... til your desired field. Currently Weaviate does not support Aggregate.groupBy with cross references, if you run your query again with the correct path you should get something like this:

"message": "shard 9wKKa18SJOiM: identify groups: grouping by cross-refs not supported"

Note that it is possible to use the cross reference property as your groupBy path (since you want to Aggregate on the Patent ID, it means that the UUID (and beacon) of the Patent object are unique has a one-to-one mapping to the publicationID ), and it should look like this:

result = (client.query.aggregate("Paragraph") \
    .with_group_by_filter(["inPatent"]) \
    .with_fields('meta { count }') \
    .with_fields('groupedBy {value}') \
    .with_near_vector({'vector': vector, 'certainty': 0.8}) \
    .do())