3

I'm trying to use the Appengine Search API to let users query over multiple datasets which each have their own schema. More specifically:

Users have access to multiple data sets. Each dataset has many rows. Each dataset has a set of columns. Each column has a name and a type. Columns across datasets may have name collisions and those collisions may have different types.

I would like users to be able to search across all of their data sets with one query to the search api. If I crete a document for each row in each dataset, I suspect there will be more than the 1000 different fields (where a field = a column) when you take the union of all the fields in each data set.

How can I get around this? Or will I have to build multiple indexes (one for each dataset) and issue multiple requests? Can these multiple requests happen in parallel? Whats the cons/costs of this approach?

aloo
  • 5,331
  • 7
  • 55
  • 94

1 Answers1

0

A good starting point is the python documents since the underlying framework is the same. Having said that YMMV here as the Java side is still experimental.

Not every record has to be of the same type in your document. All you need is a way to go from document_id to your object so <tableId>:<objectId> is fine here. However, according to the docs,

There is currently a limit of 1000 named fields for each a given index schema.

So that may be an issue for you. If you don't care about matching on columns you can simply translate your objects into a document with a single string and just do text search from there. Then everything can be in a single column. Then you can get your <tableId>:<objectId> from the document_id and fetch the data from your datastore.

Jason Tholstrup
  • 2,036
  • 3
  • 21
  • 25