I am using MongoDB as a temporary log store. The collection receives ~400,000 new rows an hour. Each row contains a UNIX timestamp and a JSON string.
Periodically I would like to copy the contents of the collection to a file on S3, creating a file for each hour containing ~400,000 rows (eg. today_10_11.log contains all the rows received between 10am and 11am). I need to do this copy while the collection is receiving inserts.
My question: what is the performance impact of having an index on the timestamp column on the 400,000 hourly inserts verses the additional time it will take to query an hours worth of rows.
The application in question is using written in Ruby running on Heroku and using the MongoHQ plugin.