My company will be coming into possession of a dataset consisting of approximately 200-300 million records. The source material is csv and is about 150GB uncompressed. We would need to perform an initial load of the data and then updated approximately 1% of the records on a daily basis. We would love to be able to keep the history of each record as well.
We currently use MySQL and it appears that some people are using MySQL and PostgreSQL for databases of this size, but I don't see much hard information about their experiences.
We could definitely get away without normalizing the data and I can envision distributing the information over a lot of servers. How about MongoDB or some other nontraditional datastore?
Does anyone have any thoughts on the feasibility of this sort of endeavor? I appreciate any help you might be able to give.