Scenario: Think you have got 90TB of text in 200 tables. This is structured related data. compareable to dbpedia only more data. Any really relational and distributed and performant database would do the job. Don’t expect as many updates as a social network but about 500read queries/s 20updates/s But main feature required besides those is running big analyses on the database in high speed since the data shall be reworked and improved with machine learning like apache mahout constantly.
Now the first issue is, which database technologies to start with (or to wait for them beeing relased) to first maintain all that data with a relativly low amount of webvisitors but a high demand on analysis/machine learning running fast? And second, which other databases to keep track of for special particular purposes that may occure and which to drop off the list or to put in pairs of which only one(/the better) should be applyed.
Cloudera/Brisk (Cassandra,Hive)
mysql(cluster), mariadb
Berkeley DB
drizzle, nimbusdb,
scidb (http://www.theregister.co.uk/2010/09/13/michael_stonebraker_interview/)
mongodb
datadraw
neo4j