I am new to elasticsearch but don't really know how to think about the disk space & memory usage involved in setting up a river (a mysql river in my case).
What is the overhead involved in a river?; especially regarding diskspace & memory usage? This has been asked but not answered.
In other words, assume I have a table with 3 columns: primary_key (integer), url (varchar) and document_text (text). Also, assume I am currently doing full-text search 100% in mysql (stupid, I know, but just for argument sakes). Each of the 3 columns has an index on it, with "document_text" index being a full-text index. This is a very large table and I want to minimize duplicate data.
How should I think about what is going on w/ a mysql river? With a river, would I simply remove the full-text index from the "document_text" column & move that over to elasticsearch (along with the primary_key from mysql)? Elasticsearch would not need to index the "url", since we aren't searching on that, correct? The data for document_text is stored in mysql but the index stored in elasticsearch so the there is effectively a zero increase in the disk-space used?
EDIT:
I guess my main question is will I be storing the underlying data twice or does elasticsearch just store the index?