0

We have 100 million rows in table storage, each row has about 4 items of metadata, we would like to search by the metadata, looking at the pricing it going to be very expensive.

The Basic option supports 1M documents, is that the same as 1 million rows with only 4 items of metadata? Or would we simply use up the storage which is 2GB which would be optomised so 2GB of metadata may not = 2GB of stroage.

enter image description here

And then the bigger size

enter image description here

We are also looking at Document DB and the standard pattern with table storage to allow search on other metadata.

Steve Drake
  • 1,968
  • 2
  • 19
  • 41
  • What types of searches do you need to support? If your searches should be able to (easily) handle variations of words (ie vehicle, vehicles, vehicular) and also take word occurence in documents in the ranking of results then you likely need Azure Search (or another search product). If you dont have these requirments then DocumentDB might well support you with som well-crafted queries. The load should not be a problem for DocDB. – yoape Jan 28 '17 at 13:10
  • Variations not needed yet but that's a very good point. – Steve Drake Jan 28 '17 at 13:33
  • @yoape - I don't understand your comment. The OP wasn't asking about changing database store. There's no discussion of database load here, and DocumentDB does not have built-in full-text search. – David Makogon Jan 28 '17 at 13:41
  • I have tried to focus this Q on azure search, but documentdb may do what we need, all though we feel that azure is more strategic. But yes my Q is more on pricing. On Monday I may just create a basic instance and index 5 million rows and see what happens. – Steve Drake Jan 28 '17 at 14:23
  • @DavidMakogon see the last sentence in the OP's question regarding DocDB, if they are considering DocDB for other purposes it might be worth considering it here if it fits the requirements. My point is, if you don't have actual search requirements then other products may solve the problem just as well at a better price performance. But since I am not offering an answer to the actual question, just a suggestion to the OP's situation, I left it as a comment, not an answer... – yoape Jan 28 '17 at 14:26

2 Answers2

3

Basic has a hard limit of 1M document, and you cannot add more partitions to increase that so you have to go for one of the Standard tiers, S1, S2 or S3 if you want to index all 100M entries. Each individual entry (row in your database) counts as a document. The maximum size of the documents you index is 16 MB, but it may be lower depending on how you update the index (https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity#document-size-limits).

The number of documents you need to store affects which tier you need, but also the maximum storage size and which throughput you want. You could do a quick estimate of how big your storage size needs to be, e.g. if your 4 metadata points are all string and each string is on average 30 chars of UTF-8 then you need a total of (100M x 4 x 30 b) ~ 11 GB so storage size will likely not be something you need to select more partitions for (both S1 and S2 can fit that within a single partition).

To fit 100M documents you could go with 7 S1 partitions (15M x 7 = 105M) at £1,304.21/mo, or 2 S2 partition (100M x 2) at £1,490.52/mo. The S2s will likely give you better throughput and it will give you more indexes to work with (even if you don't need them at the moment (since you only have 4 metadata points).

As noted before, the full capability of a search engine might be a lot more than what you need right now, but if it is a strategic decision to start working with it then at least you know why you are paying for it.

yoape
  • 3,285
  • 1
  • 14
  • 27
2

If your search scenarios don't require full text search (e.g. type ahead, suggestions, stemming and inflected forms of words in over 50 languages, facets, custom scoring), but instead need just numerical / datetime / geospatial comparisons and simple string matching, then DocumentDB would be a good choice. DocumentDB is also easy to efficiently integrate with Azure Search if / when you need full text search capabilities.

In terms of Azure Search pricing, look at S1 and S2 tiers. For example, you can store 100 million docs with 2 S2 partitions. The storage and document limits shown on pricing tiles are per partition. Each service can have up to 12 partitions, except Basic that only has 1.

Eugene Shvets
  • 4,561
  • 13
  • 19
  • Does a row count as a document? I can see huge value when each row has 10s of metadata items but for a high row count with low metadata its expensive. Also, I presume a moderate sized word doc with over a 1000 words counts the same as a table row with 4 word. Thanks – Steve Drake Jan 28 '17 at 18:52
  • 1
    One table row counts as one document, irrespective of number of properties. Of course, smaller documents consume less storage than larger documents. – Eugene Shvets Jan 28 '17 at 18:56
  • 1
    Requirements are simple now but this could change. Right now we are high row count our next batch of data will be less rows but may have more metadata. Search feature ad value to the overall solution and it seems that azure search offer a lot thus saving us writing a lot of code. – Steve Drake Jan 28 '17 at 18:57
  • Glad we can help. You can contact me eugenesh at the usual Microsoft domain if you have other questions about Azure Search. – Eugene Shvets Jan 28 '17 at 19:13