1

I am entering a project to make a Opinion Mining (Data Mining -> Web Mining -> Opinion Mining) to get semantic orientation of the words contained. We will use a crawler to get the pages opinion. Now the question is, what type of DataBase should I use (OO, Relational, hierachycal, etc), is best to use in this type of project. I know this is a specific question, Im not expecting everybodies response but at least someone that already did it, that would help.

Regards!

MRFerocius
  • 5,509
  • 7
  • 39
  • 47

2 Answers2

0

If you need something large scale and responsive, you would probably need to go for Google's BigTable or something of that nature. At the prototype level, I am sure you can use traditional relational databases, but at certain point you'd hit the performance wall. See Brewer's CAP Theorem.

Eugene Yokota
  • 94,654
  • 45
  • 215
  • 319
  • 1
    Yes, it you are looking for such huge systems and data to analyze then certainly you are trying to do something that relational (+ row based) databases are not good at doing. In fact Facebook also has a column oriented database called Cassandra - http://incubator.apache.org/cassandra/ (which unlike Google's BigTable is open source) for utilizing in such kind of scenarios. – Aayush Puri Nov 22 '09 at 19:12
  • I doubt such a system will have the hard requirements to warrant a noSQL approach. – Vinko Vrsalovic Nov 22 '09 at 21:16
0

From my experience in such kind of scenarios a relational database can serve your purpose pretty well. You need to be extra careful when storing the web content part of it - whether you want to at all use a database to store it or will storing on as simple as a file system can do. BLOBs specially require extra care and they increase your maintenance work.

Also based on the nature of the project, you would certainly be using a lot of already built in components etc. many of which would already support/easy to extend to use a relational DB as a data store.

Aayush Puri
  • 1,789
  • 3
  • 15
  • 19