1

For example, there are 5 object stores. I am thinking of inserting documents into them, but not in sequential order. Initially it might be sequential, but if i could insert by using some ranking method it would be easier to know which object store to search to find the document. The goal is to reduce the number of object store searches. This can only be achieved if the insertion uses some intelligent algorithm.

One method i found useful is using the current year MOD N (number of object stores) to determine where a document goes. Could we have some better approaches to this?

John Saunders
  • 160,644
  • 26
  • 247
  • 397
jaunty_s
  • 93
  • 1
  • 3
  • Could you please clarify which technologies you're talking about? What's an "object store"? What kind of document? Why did you tag the question with [tag:enterprise-architect]? – John Saunders Mar 31 '12 at 18:17
  • Sorry for not being clear enough. This is a Filenet object store. Documents could be any type of documents. – jaunty_s Apr 03 '12 at 04:00
  • Great. I'm going to add the [tag:filenet] tag. I'm also going to remove the [tag:enterprise-architect] tag, which seems to be irrelevant. – John Saunders Apr 03 '12 at 04:57

3 Answers3

0

This is an old thread but the thinking is seriously faulty. An object_id is a unique db key within a given database/schema. You're proposing to create an external front end to a COTS application and then do searches across multiple databases? First, You shouldn't be storing over 4k in DB blobs so even if you had separate physical databases, the biggest latency will come from storage I/O. To distribute I/O across multiple storage subsystems, add multiple storage areas to the storage policy so they round-robin. You can use a filter to direct what goes where as paulsm was asking/implying. If retrieval performance is really a concern, the place to address that is in system sizing and design. Using Consistency Checker as a benchmark, a VM whose host had multipath fiber SAN networks ran around 80,000 docs/minute. In comparison, a VM using NFS for storage could barely achieve 80 docs/minute. That's 1/1000th the perfomance. If you're spending 7 figures for software licenses and hire the cheapest resource out there to design/build/admin your system, you are wasting your money.

SteveW
  • 1
  • 1
0

Your criteria for "what goes in a FileNet object store?" is basically "what documents logically belong together?".

paulsm4
  • 114,292
  • 17
  • 138
  • 190
  • In this case, similar documents go into all the object stores. Because of increased number of documents, i decided to split the one object store into multiple. Now, the problem was how to decide which is the current object store, and how to efficiently change the applications that were directly interfacing the one object store (which is now an array of object stores). – jaunty_s Apr 05 '12 at 19:03
0

If you want fast access there are a couple of criteria:

  1. The hash function has to be reproducible based on the data which is queried. This means, a lot depends on the queries you expect.

  2. You usually want to distribute your object as much evenly accross stores as possible. If you want to go parallel, you want to access each document for a given query from different stores, so they will not block each other. Hence your hashing function should spread as much as possible to different stores for similar documents. If you expect documents related to the same query to be from the same year, do not use the year directly.

This assumes, you want to be able to have fast queries which can be paralised. If you instead have a system in which you first have to open a potentially expensive connection to the store, then most documents related to the same query should go in the same store and you should not take my advice above.

LiKao
  • 10,408
  • 6
  • 53
  • 91
  • thanks. I'm a little new to the concept of filenet and object stores, but your comment is very helpful. – jaunty_s Apr 03 '12 at 04:14