I am trying to index a dataset of products. Each product has a category, and each product has a set of category attributes associated with it. I would like to be able to query on those attributes such that if a product has that attribute and matches the value, it returns a hit.
For example one category, desktop, might have attributes like this:
{
"Brand":"Nice Computers",
"Bus Speed":"8GT/s",
"Direct Media Interface (DMI) Revision":"4",
"Embedded Options":"No",
}
Whereas another category, embedded, might have attributes like this:
{
"Brand":"Nice Computers",
"Bus Speed":"8GT/s",
"Core Family":"Fast Stuff",
}
I tried modeling the data as a JSON, with the attributes and values being key-value-pairs in a JSON object. Then, I tried indexing it using the complex data type described here. The indexer runs fine, however, when I query the data, I run into some issues. For one, executing a query with queryType=full
returns a error, as the full search syntax interprets /
as starting/ending a regex expression, and searching for productAttributes/category: desktop
isn't a valid regex expression. I do need full querying capabilities for other reasons though. The other issue is that, even without full query type, Azure doesn't seem to recognize that my JSON data contains subfields. Searching the previously mentioned productAttributes/brand: nice
returns no proper hits, but rather hits on other fields that contain the text "nice".
Looking more at the documentation for complex data types, it seems I might need to specify the data structure at time of indexing, specifying fields as done in this question, which I could do, except that brings me back to my original issue: I don't necessarily know (or want to specify) all the potential product attributes that could be in the complex field.
I also tried looking at making a custom tokenizer to break apart the fields in other ways that would make it queryable, but had limited luck with that as well. Any ideas as to how I can make arbitrary fields indexable?