We have a table of vocabulary items that we use to search text documents. The java program that uses this table currently reads it from a database, stores it in memory and then searches documents for individual items in the table. The table is brought into memory for performance reasons. This has worked for many years but the table has grown quite large over time and now we are starting to see Java Heap Space errors.
There is a brute force approach to solving this problem which is to upgrade to a larger server, install more memory, and then allocate more memory to the Java heap. But I'm wondering if there are better solutions. I don't think an embedded database will work for our purposes because the tables are constantly being updated and the application is hosted on multiple sites suggesting a maintenance nightmare. But, I'm uncertain about what other techniques are out there that might help in this situation.
Some more details, there are currently over a million vocabulary items (think of these items as short text strings, not individual words). The documents are read from a directory by our application, and then each document is analyzed to determine if any of the vocabulary is present in the document. If it is, we note which items are present and store them in a database. The vocabulary itself is stored and maintained in a MS SQL relational database that we have been growing for years. Since, all vocabulary items must be analyzed for each document, repeatedly reading from the database is inefficient. And the number of documents that need to be analyzed each day can at some of our installations be quite large (on the order of 100K documents a day). The documents are typically 2 to 3 pages long although we occasionally see documents as large a 100 pages.