0

I like these a lot, and would like to use them for everything. Why would doing RDF triple stores for everyday programming (Entities/Tables), such as Contacts, Customers, Company etc. be a bad idea.

Are there short falls in the technology that I have not come across. I was concerned about retrieving data, but I think this is covered with SPARQL.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
WeNeedAnswers
  • 4,620
  • 2
  • 32
  • 47
  • 4
    Looks like my edit was partially reverted. How are the F# and C# tags related to this question? – Juliet Mar 23 '10 at 17:14
  • Juliet: presumably the author uses C# and F# in everyday programming (but not Python?). – Gabe Mar 23 '10 at 17:20
  • @Juliet, they are not, but I am not getting many bites with the rdf and semantic tags, they appear to be ....dare I say it.....too boring :) – WeNeedAnswers Mar 23 '10 at 17:21
  • 5
    Asking some actual and sensible question is also a good way to increase the visibility of your post (and it helps getting answers too). – Tomas Petricek Mar 23 '10 at 17:35
  • 1
    In case someone wants to know what an rdf triple store is: http://en.wikipedia.org/wiki/Triplestore – Development 4.0 Mar 23 '10 at 18:42
  • @Tomas, it is a good question. RDF triple stores are the future in my humble opinion. They are being used today with some vast data. I was curious to see if anyone had experience in using them in the business domain. My personal idea was to use it as a complete data store for the usual tripe that we all store such as customers, addresses and the such oh and the obligatory double entry book keeping system for accounts. – WeNeedAnswers Mar 23 '10 at 21:40
  • I think that its a great fix for data that has no definitive structure, such as addresses, personal data. In a triple store, you can store different amounts of data for each type and it will expand and grow as required, without adding the extra field in the tables that incur changes in code. My question is why should I not do this, is there a reason why this type of idea could course conflict/Problems in the future. – WeNeedAnswers Mar 23 '10 at 22:07
  • 2
    @WeNeedAnswers--adding random tags to your question to get more "bites"--maybe you want to rethink your whole approach to StackOverflow. I agree with @Tomas Petricek. – Onorio Catenacci Mar 24 '10 at 12:29
  • @Onorio, I refer to bites as active communications. The more people involved the better the end resultant answer. Some people who use RDF triple stores may not look at this post because they see it as an implementation issue. By adding the C# and F# tags, I am likely to get hits of people who may be unaware of such a question, but are doing RDF triple stores. – WeNeedAnswers Mar 24 '10 at 12:35
  • 5
    Don't waste people's time by tagging your question incorrectly. We use tags to categorize questions for a reason. – Bill the Lizard Mar 25 '10 at 02:01
  • 8
    @WeNeedAnswers - Bill's point is valid - the tags were confusing, and you stated that they were added to get more hits, which is entirely the wrong approach. And re "politely" - have you *read* what you wrote (comment above this)? – Marc Gravell Mar 25 '10 at 08:24
  • 1
    @WeNeedAnswers, see those little diamonds next to @Bill and @Marc's names? Tread lightly, for the force is with them. – Benjol Mar 26 '10 at 12:20
  • 2
    @Bejol, To live in fear is a terrible thing, cower no more for the times of evil men are gone and democracy will prevail. I don't fear the stars, I fear evil men. These people are doing their job and are citizens of StackOverFlow, I respect them when respect is earned not through badges of state. They thus far have done me none ill fate and so what qualms have I? – WeNeedAnswers Mar 26 '10 at 15:42

5 Answers5

2

Query times tend to be much slower than for conventional DBs, even with simple queries. Also, many RDF stores don't support standard DB features like transactions, crash recovery, ...

Carsten
  • 4,204
  • 4
  • 32
  • 49
  • Does that also go for the ones built on top of an RDBMS? – WeNeedAnswers Mar 25 '10 at 02:37
  • 1
    Regarding query speed in my experience: Yes. DBs are very good at exploiting well designed DB schemas for queries. Unless a RDF store does a very clever, continuing analysis of the triples and maps them to a clever schema on the DB layer it will never come close. Jena SDB e.g. does some clever caching of strings, but basically puts everything in a few simple tables. I'd expect them to be better at crash recovery and I think I remember some of them supporting transactions. – Carsten Mar 25 '10 at 23:01
  • Thanks for your help. Would you use one then for "everyday joe" programming, or only on real cases where they would come into their own? – WeNeedAnswers Mar 26 '10 at 12:41
  • 1
    Unless you have queries which are much easier in SPARQL than SQL I would not bother. If your main motivation is that you are not sure about what schema to use, have you considered a No-SQL DB? I personally have no experience with them, but they seem to be fashionable at the moment ... – Carsten Mar 28 '10 at 23:17
2

One of the shortcomings we have come across in using RDF triple stores for general programming is that most engines don't support aggregation in queries (min, max, group by).

A checklist we use to decide between RDBMS is the following

RDBMS if

  • static schema
  • very large amount of data
  • no RDF export needed
  • Lucene support needed (easy via Hibernate Search for example)
  • strong data consistency requirements (money involved etc)

RDF if

  • not fixed or dynamic schema
  • small to large amount of data
  • RDF export needed
  • loose data consistency requirements

Refactoring RDFBMS schemas for ongoing projects can be quite an overhead if you don't have the correct tools.

Lucene support is provided by some RDF engines as well, but is not as well documented and supported as in the case of Hibernate Search.

Scalability of RDF engines is also improving steadily, where ideas of the NoSQL side are incorporated into RDF engines, but if you go with the standard engines of Jena and Sesame, this division is still quite valid.

Timo Westkämper
  • 21,824
  • 5
  • 78
  • 111
  • 1
    A reasonable checklist, but I'd add two amendments: (i) many RDF stores support Lucene indexes as well, and (ii) scalability of RDF stores is improving steadily, so I'd characterise the division as between "large" and "very large", not between "small" and "lots". Finally, though it's still in development, the next release of SPARQL will include aggregate functions. This in turn will drive the provision of aggregate functions in the RDF stores (noting that some already do). See http://www.w3.org/TR/sparql11-query/#aggregateFunctions. – Ian Dickinson May 19 '10 at 13:27
2

One issue that has not been mentioned yet is that updating triplestores to reflect changing data is often more work than for a RDBMS or OODBMS, because there is no notion of an 'object' or 'row' - only triples and resources. Deleting a domain object therefore requires care or you will end up with a lot of garbage left in the triplestore. The absence of cascading deletes is a closely-related issue.

On the plus side, RDF can be helpful even for everyday applications because you can flexibly add new subclasses, relationships or sub-relationships between entities without necessarily breaking any code, and easily add annotations, comments etc to resources.

DNA
  • 42,007
  • 12
  • 107
  • 146
1

Further to Peteris's answer there are some key differences between how you model data for a Triple Store vs other techniques like OOP, relational databases, XML e.g. rows, classes, properties etc

It very much depends what you want to do whether they are appropriate and whether you can find one with the right performance characteristics for your application.

People have a tendency to characterise triple-stores as being schema-less databases but realistically unless you are using some form of schema/ontology then they aren't particularly useful. If you want to use SPARQL to get stuff out then there needs to be some schema patterns in the store that you can write queries against.

Personally I would still use relational databases for a lot of things and still do, while I'm using RDF and triple stores for an increasing amount of stuff that doesn't mean I'm ready to throw out what works well.

As a final point even if you go with a relational database for the time being there are technologies like DB2RDF which can convert relational databases to RDF so you can stick with a DB for now and then export your database to RDF in the future as desired

RobV
  • 28,022
  • 11
  • 77
  • 119
  • "Throwing out what works" - But isn't that what people are doing anyway, I mean, the increase of the ORM, Domains taking the place of ERD's and Entities. Why not look at the problem of the Impedance mismatch and grab it by the horns and choose option 3. I would always do a schema/domain Model for anything of a certain size. A lot of the RDF stuff though has been done already in such works as the Dublin Core and other well defined Schema's. You could come up with your own though, nothing stopping you. I probably would for a Domain solution. – WeNeedAnswers Mar 23 '10 at 22:53
  • i wouldn't characterise ORM as a replacement for relational data models, it seems to be primarily used just to provide a high level abstraction layer for developers which appears to be a more general trend in modern development – RobV Mar 24 '10 at 12:23
  • The ORM though gets embedded into the development language, then all your great work on Entities gets lost in the code. The ORM is certainly not modern, been around a long long time. The ORM is used today as a panacea to the Database Object problem not the higher idea of abstraction. I would offer up The triplets for an alternative approach, as you can still use the ORM on top of a triplet db. – WeNeedAnswers Mar 24 '10 at 16:28
  • point taken, linq2rdf being an example of an ORM on top of a triplestore – RobV Mar 24 '10 at 20:33
0

There are no one-size-fits-it-all tools. Triple stores are appropriate and usable today for some kinds of tasks and not for others.

A similar question was asked on semanticoverflow.com and the common answer was the same: "use whatever is appropriate".

Pēteris Caune
  • 43,578
  • 6
  • 59
  • 81
  • 3
    how many overflows are there? I think we need some buckets :) – WeNeedAnswers Mar 23 '10 at 21:36
  • No One size fits all. I tend to agree with that, although the attitudes of RDBMS is that there is only one way to store data, and to retrieve it efficiently. RDBMS though tend to be inflexible due to the accidental constraints placed upon it from the coding perspective. RDF would never be constrained in the same manor, if used correctly, although the speed issues might be a problem on retrievals. – WeNeedAnswers Mar 24 '10 at 11:51